Episode 2: SUNCAT library information

Questions about use cases were not really answered during the sprint, but we decided to gather information SUNCAT holds about contributing institutions together in a linked data format, and some more work is being done on use cases for SUNCAT linked and/or open data in the SUNCAT UK Discovery Project project.

SUNCAT uses the MARC organisational code for libraries when it available. I was introduced to the work of Adrian Pohl and Felix Ostrowski from hbz in Germany who have created an international directory of libraries and related organisations which covers the US codes from the Library of Congress and the German organisation codes. The information for the UK libraries is in a PDF http://www.bl.uk/bibliographic/pdfs/marc_codes.pdf at the moment, but it might be possible to collect the data from this format. Felix and Adrian presented their idea of adding RDFa to webpages containing information about libraries at ELAG2011 “Your Website is your API – How to integrate your Library into the Web of data using RDFa” and a representative from OCLC who attended the presentation directly started implementing this in the WorldCat registry.

The Talis Platform hosting and consultancy blog posts “Linking and Cleaning Data” were a very useful illustration of the use of org:Organization, org:hasSite, and v:VCard for specifying the links between an organisation and its sites and the site addresses.

An organisation ontology was used to describe SUNCAT contributing libraries. There was discussion about whether a “library” should be modelled to represent a single library in one building or be an umbrella term for all an institution’s libraries.

I found the examples on “Howto – Describing libraries, their collections and services in RDF” on the hbz Semantic web wiki very helpful.

Vocabularies used:

The RDF Vocabulary (RDF):
http://www.w3.org/1999/02/22-rdf-syntax-ns#

The RDF Schema vocabulary (RDFS):
http://www.w3.org/2000/01/rdf-schema#

Friend of a Friend (FOAF):
http://xmlns.com/foaf/0.1/

DCMI Metadata Terms (DCT):
http://purl.org/dc/terms/

An Ontology for vCards (V) for representing address and contact information:
http://www.w3.org/2006/vcard/ns#

WGS84 Geo Positioning (GEO):
http://www.w3.org/2003/01/geo/wgs84_pos#

XML Schema (XSD):
http://www.w3.org/2001/XMLSchema#

OWL
http://www.w3.org/2002/07/owl#

SKOS
http://www.w3.org/2004/02/skos/core#

Ordnance Survey Postcode Ontology
http://data.ordnancesurvey.co.uk/ontology/postcode/

The rdf:about RDF/Turtle validator and Converter was useful for checking Turtle files.

There is a JISC MU list of organisations which I used enrich the SUNCAT institution data with JISC MU organisation identifiers by querying the SPARQL endpoint forĀ  JISC MU institutions, also using the Perl CPAN module RDF::Query::Client.

Transforming the SUNCAT institution data into linked data has helped SUNCAT clean our data. The linked data can be used as internal source of data for various SUNCAT configuration files, web pages, and contact information.

Comments are closed.