Updated data in SPARQL and new SRU target for SUNCAT open data

If you’re interested in SUNCAT data then you’ll know that there’s been a lot of activity with the new SUNCAT service interface (http://suncat.ac.uk/).

There’s also been a lot of activity with the underlying database and some improvements to the data sitting behind that too.

With new update processes in place the Discover EDINA open data has benefitted and a new and up to date set of open data is now available on the SRU target, with a shiny new permanent base URL too!

The DiscoverEDINA SUNCAT Open Data SRU target is now based at

http://m2m.edina.ac.uk/sru/de_suncat

All the same SRU and CQL queries should run as before, with the updated data, and the addition of multiple records for the same SUNCAT ID (these represent the same journal but from multiple libraries).

Hope you find it useful!

Also see:
SPARQL endpoint for SUNCAT
SUNCAT Open Data – SRU

UK Research Reserve and SUNCAT – our story so far

UK Research Reserve (UKRR), a collaborative programme between the HE sector and the British Library (BL), aims to preserve research material for the community and build up a national research collection in a time of rapid change. By de-duplicating low-use research material together, UKRR members (29 HEIs, led by Imperial College London, details can be found on UKRR’s website: http://www.ukrr.ac.uk/what/default.aspx) are able to dispose of print journals and repurpose newly available space to better meet local demands, without losing access to the content. So far, UKRR members have released more than 70km of shelf space, or 10,700 square metres, and it is estimated that more than £21m capital savings and £8m estate management costs have been achieved.

The HE-BL partnership is supported by the Higher Education Funding Council for England (HEFCE). It started with an 18-month pilot phase which proved the concept and set the foundation for the programme. Its success resulted in HEFCE’s further investment of £10m to expand the scheme nationally and to encourage more HEIs to take part. UKRR has been further awarded an extension of 12 months to January 2015; it enables us to de-duplicate more material, improve our data, and explore potential opportunities.

The collaborative and coordinated approach we have adopted aims to identify one ‘access copy’ (normally held by the British Library) and two preservation copies distributed amongst the membership. To achieve this, one of the processes we put in place is called scarcity checking (i.e. a process designed to determine if an item offered by one member is available elsewhere within the membership, and if so, at which institutions ), and the collaboration with EDINA helps streamline work flow and save time and staff resources required in the process. Initially, the scarcity checking was conducted by each member institution as they offered holdings to UKRR for de-duplication. It was time consuming and prone to errors and inaccuracy.

A successful pilot was run between the University of Edinburgh (one of UKRR members) UKRR and EDINA. EDINA developed a script which triggered an automatic process to accept a file of records submitted by a UKRR member for searching against the whole SUNCAT database and the local catalogues of the two UKRR libraries which are not SUNCAT Contributing Libraries. This collaboration has contributed to the quality of the data we process, helped reduce associated risks and significantly reduced the time required for checking in member libraries. As a result we have more reliable data to base our final de-duplication decisions on.

UKRR’s work with its partners is key to the programme’s success, and such collaborations further demonstrate synergies that can be created when the community work together. We value the work that EDINA has done for us, and we look forward to continuing working with EDINA and to bringing benefits and values to the community.

Daryl Yang, UKRR Manager

Share

SPARQL endpoint for SUNCAT

As we explored how to extend access to the metadata contributed by a set of libraries using the SUNCAT service in order to promote discovery and reuse of the data, it soon became clear that Linked Data was one of the preferred format to enable this.

The previous phase of this project developed a transformation to express the information on holdings in a RDF model. The XSLT produced converts MARC-XML into RDF/XML. This XSLT transformation was used to process over 1,000,000 holdings records made available by the British Library, the National Library of Scotland, the University of Bristol Library, the University of Nottingham Library, the University of Glasgow Library and the library of the Society of Antiquaries of London in order to make them available through a Linked Data SPARLQ endpoint interface.

Setting up the Triplestore

We build on previous experience at EDINA on providing SPARQL endpoints to set up the interface for the SUNCAT Linked Data.

We chose the 4Store application which is fully open source, efficient, scalable, and provides a stable RDF database. Our experience is that it is also simpler to install than other products. We installed 4Store on an independent host in order to keep this application separate from other services for security and easy maintenance.

Loading the data

The data contributed by each library was processed separately. First, the data was extracted from SUNCAT following any given restrictions placed by the specific library. It was then transformed into RDF/XML and finally loaded in the triplestore. Each of these steps can be fairly time consuming according to the size of the data file. Once the data from each library has been added to the triplestore, queries can be made accross the whole RDF database.

APIs

A HTTP server is required to provide external acces and allow querying of the triplestore. 4Store includes a simple SPARQL HTTP protocol server which answers SPARQL 1.1 queries. Once the server is running, you can query the triplestore using:

  1. A machine to machine  API at http://sparql1.edina.ac.uk:8181/sparql/.
  2. A basic GUI is available at: http://sparql1.edina.ac.uk:8181/test/. 

GUI

The functionality of the basic test GUI is rather limited and only enables SELECT, CONSTRUCT, ASK and DESCRIBE operations. In order to customise the interface and provide additional information like example queries, we used an open source SPARQL frontend designed by Dave Challis called SPARQLfront and available on github. SPARQLfront is a PHP and Javascript based frontend and can be installed on top of a default Apache2/PHP server. It supports SPARQL 1.0.

An improved GUI is available at: http://sparql1.edina.ac.uk:8181/endpoint/.

The DiscoverEDINA SUNCAT SPARQL endpoint GUI provides four sample queries to help the user with the format and syntax required to compose correct SPARQL queries. For example, one of the queries is:

Is the following title (i.e. archaeological reports) held anywhere in the UK? 

SELECT ?title ?holder
WHERE {
        ?j foaf:primaryTopic ?pt.
        ?pt dc:title ?title;
            lh:held ?h.
        ?h lh:holder ?holder.

        FILTER regex(str(?title), "archaeological reports", "i")
      }

The user is provided with a box in which to enter queries. Syntax highlight is provided to help with composition.  The user can also select whether to display the namespaces in the box or not. There is a range of output formats that can be selected:

  • SPARQL XML (the default)
  • JSON
  • Plain text
  • Serialized PHP
  • Turtle
  • RDF/XML
  • Query structure
  • HTML table
  • Tab Separated Values (TSV)
  • Comma Separated Values (CSV)
  • SQLite database

The SPARQL endpoint GUI is ideal for running interactive queries, developing or troubleshooting queries to be run by the m2m SPARQL API or used in conjunction with the SRU target.

Making records from the SUNCAT database openly available: the experience with licensing

The background is explained in an earlier post (July 10 2012).  SUNCAT (Serials UNion CATalogue) aggregates the metadata (bibliographical and holdings information) for serials, no matter the physical format held in (currently) 89 libraries and it was planned (with the agreement of the Contributing Libraries) to make as much of this data as openly available as possible.

It was decided to adopt an opt in policy.  This approach was taken since it was felt that CLs needed to be fully aware of the commitment they were making and to have the opportunity to place any particular restrictions such as limiting the data which could be made open or restricting the number of formats in which the data would be made available.  In the event most of the participants availed themselves of the opportunity to specify, unambiguously, which data they were agreeing to being made open.

Legal advice was taken from the University solicitors and the licence format adopted was Open Data Commons Public Domain Dedication and Licence with reference to the ODC Attribution Share Alike Community Norms.  Staff in quite a number of institutions expressed interest but, in the event, only staff in 6 institutions proceeded as far as signing a licence with EDINA . A copy of the standard agreement may be viewed here.

Since many libraries have acquired some of the metadata records they use in OPACs from one or more third party commercial suppliers, there were very understandable concerns about giving permission for EDINA to make records from these sources openly available.  Accordingly, it was necessary to add an Appendix to the individual Agreements, specifying what particular restrictions should be applied.

The situation applying to each of the libraries is as follows:

British Library Permission was given to publish all serials records but they are not to be made available in MARCXML or MARC21 formats.
National Library of Scotland Permission was given to publish as open data, any NLS record that has ‘StEdNL’ in the MARC field  040$a

and

to publish as open data, the title, ISSN number and holdings data for any serials record in their catalogue.

University of Bristol Library Permission was given to publish as open data, any Bristol record that has “UkBrU-I� in the MARC field 040 $a.e.g.,

040   L $$aUkBrU-I

University of Nottingham Library Permission was given to publish as open data, any Nottingham record that has “UkNtU� in the MARC field 040 $a and $c.

e.g., 040   L $$aUkNtU$$cUkNtU

However, if there is an 035 tag identifying a different library, then do not use this record.

e.g.,

035   L $$a(OCoLC)1754614
035   L $$a(SFX)954925250111
035   L $$a(CONSER)sc-84001881-
040   L $$aUkNtU$$cUkNtU

University of Glasgow Library Permission was given to publish as open data any Glasgow record that is not derived from Serials Solutions as indicated in the MARC field 035 $$a (WaSeSS).
The library of the Society of Antiquaries of London. Permission was given to publish as open data  all serials records.

As mentioned above, staff in quite a few other libraries expressed interest in becoming involved but the short timescale of the project meant that there had to be concentration on those libraries able to sign the licence agreement quite quickly.

Subject to the availability of further funding it is planned to continue discussions with those libraries which have expressed interest but were not able to proceed to signing an agreement.

Negotiating the specific requirements for each of the libraries was a time consuming, although necessary process, and there are concerns about the resources which would be required to carry the negotiation for a rather larger number of libraries than participated in this phase.

Taken together the records which can be made openly available total in excess of 1,000,000; a considerable quantity of serials’ metadata.  Once the data has been released it will be most interesting to monitor the usages made of it.

Details about making the data openly available and the ways in which developers and others can access it are outlined in a separate blog entry.

That library staff have concerns about making available metadata which has been obtained from one or other third party has been well recognised for some time but to date there has been very little progress on resolving these issues at either a national or international level.  In the earlier blog post it was stated that:

“A number of librarians said that it would be a good idea if JISC/EDINA could come to an agreement with organisations such as OCLC and RLUK rather than individual libraries needing to approach them; this is an idea certainly worth pursuing�.

 JISC did commission work to be carried out in this area and there is a website available which provides guidance.  Whilst, clearly, this is very helpful the onus is placed upon staff in individual libraries to look carefully at their licence agreements with third party suppliers: even where this is done what is often found is that the licence agreements are not necessarily clear and unambiguous on what is possible and what is not.

RLUK recently commissioned work to scope the parameters of making RLUK data openly available and the results of that work should make helpful reading even if the focus is just on material in the RLUK database.

It certainly would be of considerable benefit to the HE community as a whole if national bodies including the JISC, SCONUL and RLUK could accept responsibility for initiating discussions with third party suppliers of records with a view to negotiating removing all restrictions on making metadata openly available.  Such an approach would remove the need for individual libraries to investigate their specific local circumstances and would be of enormous potential benefit to the user community.

SUNCAT Open Data – SRU

As part of the /open/ data strand of the SUNCAT bit of Discover EDINA, we have made available the individual library records that we have agreement to release.  At the time of writing this is:

National Library of Scotland, Glasgow University, British Library, Bristol University, Society of Antiquaries of London, Nottingham University.

In order to make these records available, we’ve opted for an SRU target, which is REST-ful.  In the first instance we’re intending users to use the SPARQL interface to run searches (see other post) and use the linked part of the data in the RDF incarnation of the records, and then use the SUNCAT ID to link through to the SRU target to extract the full MARC record (in most cases) should that be needed.

Since the target is a full blown SRU server there are actually a plethora of indices which are made over the MARC-XML records, but the one we anticipate being used most is that for the SUNCAT ID.  However, users are welcome to use the other indexes which will be detailed below.

In the first instance, the DiscoverEDINA SUNCAT SRU target can be found at

http://suncatdev.edina.ac.uk:31001/de_suncat

[EDIT 2014-05-13. The above URL should work but it is now preferred to use

http://m2m.edina.ac.uk/sru/de_suncat ]

so in order to get the MARC-XML format of a record with SUNCAT ID of “SC00374927310″ you should send a CQL query of sc.id=SC00374927310 which goes into an SRU searchRetrieve request as:

http://suncatdev.edina.ac.uk:31001/de_suncat?operation=searchRetrieve&version=1.1&startRecord=1&maximumRecords=1&query=sc.id%3DSC00374927310

Remember that the number of records released under the Open Data umbrella is limited, so you won’t find every SUNCAT ID here, but you will find every one that’s in the SPARQL endpoint.

The response will be a bunch of XML that is an SRU Response, and it may contain records (about the same item) from multiple libraries. These records can be found in the Xpath zs:searchRetrieveResponse/za:records/zs:record/zs:recordData. The number of records found is always sent in the zs:searchRetrieveResponse/zs:numberOfRecords element and you can specify which and how many records to retrieve by varying the startRecord and maximumRecords parameters in the HTTP query string.

By default, records will be returned in MARC-XML, with the exception of British Library records, which (due to licensing issues) will always be returned in the RDF transformed version of the record.

Okay, so that’s the basics of grabbing a full MARC-XML record with a SUNCAT ID.  Now for the fun stuff (I’m using ‘fun’ in quite a broad sense of the word).

You can grab a (non-BL) record in five (yes, five) different XML schemata!  To do so, just append the parameter recordSchema=X where X is one of marc (also the default), rdf, mods, mads, dc.  This transforms the MARC-XML into one of the other formats using an XSLT transform.  The rdf one was created in our previous project, and the mods, mads and dc ones are from Indexdata’s zebra software (freely available from http://www.indexdata.com/zebra).  These are relatively simple but might be useful.

Even more fun: obviously we’re making the records search-and-retriev-able on the SUNCAT ID since the perceived workflow is to use SPARQL to query the SPARQL endpoint, obtain the links in the RDF records (including a SUNCAT ID), use that SUNCAT ID to obtain the full records of anything you’re interested in from the SRU server.  However, since this is a full-blown SRU server, we’ve actually got a full set of indexes, and you can use any valid CQL query combining the lot of them!

These indexes are designed to be as close as possible to the existing SUNCAT service Z39.50 target indexes.  In the SRU server some are prefixed with the “bib1“namespace and the rest with the “sc” namespace.  Here is a table of the bib1 indexes and their equivalent Z39.50 BIB-1 index:

bib1.date/time-last-modified = Date/time-last-modified
bib1.lc-card-number = LC-card-number
bib1.isbn = ISBN
bib1.number-music-publisher = Number-music-publisher
bib1.name = Name
bib1.author = Author
bib1.author-name-personal = Author-name-personal
bib1.dewey-classification = Dewey-classification
bib1.issn = ISSN
bib1.lc-call-number = LC-call-number
bib1.nlm-call-number = NLM-call-number
bib1.place-publication = Place-publication
bib1.publisher = Publisher
bib1.title-series = Title-series
bib1.identifier-standard = Identifier-standard
bib1.subject-heading = Subject-heading
bib1.number-govt-pub = Number-govt-pub
bib1.title = Title
bib1.any = Any
bib1.server-choice = Server-choice
bib1.date = Date
bib1.date-of-publication = Date-of-publication
bib1.title = Title
bib1.name = Name
bib1.author = Author
bib1.author-name-personal = Author-name-personal
bib1.title-uniform = Title-uniform
bib1.code-institution = Code-institution
bib1.note = Note
bib1.code-language = Code-language
bib1.publisher = Publisher
bib1.place-publication = Place-publication
bib1.code-geographic = Code-geographic
bib1.subject-heading = Subject-heading

These are the sc ones mapped to their equivalent SUNCAT service index, which are not well documented here and some will be duplicates of the bib1 indexes, but you’re free to play!  Almost certainly the mainly useful two are the SUNCAT ID index, SC_ID and the contributing library code index, SC_WIS.  The values for SC_WIS can be:

StEdNL (National Library of Scotland)
StGlU (Glasgow University)
Uk (British Library)
UkBrU-I (Bristol University)
UkLSAL (Society of Antiquaries of London)
UkNtU (Nottingham University)

Here are all the other sc indexes:

sc.id = SC_ID
sc.005 = SC_005
sc.010 = SC_010
sc.020 = SC_020
sc.022 = SC_022
sc.028 = SC_028
sc.035 = SC_035
sc.049 = SC_049
sc.aut = SC_AUT
sc.awt = SC_AWT
sc.ddc = SC_DDC
sc.gvd = SC_GVD
sc.ismn = SC_ISMN
sc.issn = SC_ISSN
sc.lcc = SC_LCC
sc.nlm = SC_NLM
sc.pla = SC_PLA
sc.pub = SC_PUB
sc.sbd = SC_SBD
sc.sgn = SC_SGN
sc.sici = SC_SICI
sc.sid = SC_SID
sc.srs = SC_SRS
sc.ssn = SC_SSN
sc.stidn = SC_STIDN
sc.stmd = SC_STMD
sc.sub = SC_SUB
sc.sud = SC_SUD
sc.sul = SC_SUL
sc.sum = SC_SUM
sc.tit = SC_TIT
sc.ttl = SC_TTL
sc.wrd = SC_WRD
sc.wyr = SC_WYR
sc.wti = SC_WTI
sc.wau = SC_WAU
sc.wut = SC_WUT
sc.wur = SC_WUR
sc.wnc = SC_WNC
sc.wfm = SC_WFM
sc.wtp = SC_WTP
sc.wgo = SC_WGO
sc.wct = SC_WCT
sc.wid = SC_WID
sc.wsd = SC_WSD
sc.ntl = SC_NTL
sc.wis = SC_WIS
sc.wst = SC_WST
sc.wuc = SC_WUC
sc.wucx = SC_WUCX
sc.wuco = SC_WUCO
sc.wno = SC_WNO
sc.wln = SC_WLN
sc.wpu = SC_WPU
sc.wpl = SC_WPL
sc.wsrs1 = SC_WSRS1
sc.wsrs2 = SC_WSRS2
sc.wga = SC_WGA
sc.wsu = SC_WSU
sc.wsm = SC_WSM

SUNCAT open data

First problem: getting permission from contributing libraries to allow their data to be re-distributed.  Fortunately for me that’s not my problem, and some sterling work from other members of the team has allowed some data to be released without strings.

Libraries who allow some of their data out into the wild usually have a stipulation that it can be any record they’ve contributed that doesn’t originate from such-and-such source, or has been created by them, or similar.

In practice, this means using records from particular libraries that have a particular library code in 040$a or don’t have a particular code in 035$a.  These types of rules could be added automatically at a live filtering stage, but in order to be utterly sure nothing untoward is being released we have chosen to extract those data and build a separate database from those alone.

So, once you get past the problem of libraries allowing their data to be distributed freely (which we haven’t ;) ) you then need to allow clients to usefully connect and retrieve the data.  Two approaches are being taken for this.

The first, is to produce an SRU target onto the database of (permitted) records.  We have a lot of experience with IndexData’s open source Zebra product which is a database and Z39.50/SRU frontend all in one.  It can be quite fiddly to configure (which is where the experience comes in handy!) but its performance (speed and reliability) is excellent.  It also allows multiple output formats for the records using XSLT.

One of the most useful outcomes from the Linked Data Focus project was an XSLT produced by Will Waites that converts MARC-XML into RDF/XML.  We can use this as one of the outputs from the SRU target, alongside MARC-XML (although some libraries have a requirement that their records not be released in MARC-XML, in which case the XSLT just blanks these records when requested in MARC-XML), a rudimentary MODS transformation, and a JSON transformation might be a possibility too.

Perhaps more usefully for the RDF/XML data, the second approach is to feed these into a SPARQL endpoint.  This should allow anyone interested in the linked data RDF to query in a language more familiar to the linked data world.

We’ll be providing more information on how to connect to the SRU target and the SPARQL endpoint once we’ve polished them up a bit for you.

 

Licensing SUNCAT serials’ records

The reasons for making bibliographic metadata openly available have been well put by JISC in the Open Bibliographic Data Guide and the Open Knowledge Foundation but whilst many librarians are keen to support making their institutional library metadata available there are issues to be resolved. There can be copyright issues and contractual issues over records in library OPACS which inhibit the release of records. The records in many OPACs will have been obtained from one or more third party organisations (e.g. OCLC, British Library, Ex Libris, Serials Solutions) and even though often the records received from these third parties will have been modified, perhaps quite extensively, there are understandable concerns expressed about the possible repercussions of making them available under an open licence.

SUNCAT is an aggregation of serials’ metadata from (currently) 86 libraries (referred to as Contributing Libraries (CLs)). Whilst much of the metadata will have been created by local library staff and will, therefore be ‘owned’ by the library, some of it will have been purchased from a third party supplier. The metadata is essentially supplied to EDINA on the basis of goodwill and a common understanding about how the data is used and made available. EDINA reached agreement with third party record suppliers that records in MARC21 format could be made available for downloading, but only to staff in CLs.

In the initial project SUNCAT: exploring open metadata (funded under the JISC Capital funded RDTF participation) the decision was taken to adopt an ‘opt in’ approach and, accordingly, an invitation was sent to all the CLs inviting them to participate in making their SUNCAT contributed data openly available under an Open Data Commons Public Domain Dedication and Licence with reference to the ODC Attribution Share Alike Community Norms. Considerable interest was expressed by CLs in becoming involved but concerns, particularly to do with making third party records available, were raised. A number of librarians said that it would be a good idea if JISC/EDINA could come to an agreement with organisations such as OCLC and RLUK rather than individual libraries needing to approach them; this is an idea certainly worth pursuing.

Licences have now been signed by three organisations. They are the British Library (BL), the National Library of Scotland (NLS) and the Society of Antiquaries; discussions are well advanced with a number of additional organisations. After discussion with BL staff, it was agreed that it would be preferable to add an Appendix to an existing contract between EDINA and the BL, and this has been done. All the data supplied to EDINA by the BL can be made openly available, provided records are not made available in either MARC21 or MARCXML formats. In the case of the National Library of Scotland permission has been given to make all the fields available of all records which have been created by NLS (identified by the presence of ‘StEdNL’ in the 040$a field) or to make title, ISSN and holdings information available for the whole of the contribution to SUNCAT. The Society of Antiquaries has placed no restrictions on the use of their contributed records.

Glasgow University has asked for records from a third party supplier to be excluded from the records made available for open usage and this will be done.

Work is now being carried out to make the records from the initial three organisations freely available on the basis described in the licences and as other licences are signed by additional organisations more data will be published for open usage.

Supporting Discovery open metadata principles

What we have achieved so far:

SUNCAT received funding from the JISC Discovery Programme Phase 1, from February 1st to July 31st 2011,  to explore what might be done to extend access to the metadata from the contributing libraries already aggregated by EDINA for the SUNCAT service. This included:

  • establishing use cases
  • exploring metadata licensing issues
  • determining what metadata to make available
  • mechanisms for providing access to the metadata

Much was achieved during the 6 month project to extend access to the catalogue, including holdings’ information. The diagrams below describes the data in the scope of the project.

There was agreement from three libraries to use some of their data for open access during the short lifetime of the project. These were: the British Library, the National Library of Scotland and the Society of Antiquaries.

In the process of working on representing data aggregated by SUNCAT from various libraries across the UK in RDF, we found that existing vocabularies for describing bibliographic data are generally missing constructs for dealing with holding statements. As SUNCAT primarily contains information relating to holdings of journals in the contributing libraries, the primary value of this information is clearly in the holding statements. Since we have chosen a relatively flat model of catalogue records, as is natural in the MARC21 source data and is appropriate with the Bibliographic Ontology, there is no obvious way to express this information which might normally go at the Item level were we to use a more elabrate model like FRBR-RDF.

More on how we defined a Library’s holdings can be found here.

What we are going to do:

We are taking forward some of these in this project to further enhance the SUNCAT service:

  • Continue to increase the number of Contributing Libraries involved in the SUNCAT open metadata initiative
  • Implement a filtering mechanism to cater for different data being included in a particular format
  • Where the Contributing Libraries give agreement for release, implement an ‘on the fly’ filtering mechanism for their data
  • Explore provision of other record formats to support use within other applications (e.g. MODS, a simple DC) where use cases were identified
  • Explore further the status of RDF triples (regarding Copyright and database rights) that have derived from data that were part of a database provided by a third party.

Episode 2: SUNCAT library information

Questions about use cases were not really answered during the sprint, but we decided to gather information SUNCAT holds about contributing institutions together in a linked data format, and some more work is being done on use cases for SUNCAT linked and/or open data in the SUNCAT UK Discovery Project project.

SUNCAT uses the MARC organisational code for libraries when it available. I was introduced to the work of Adrian Pohl and Felix Ostrowski from hbz in Germany who have created an international directory of libraries and related organisations which covers the US codes from the Library of Congress and the German organisation codes. The information for the UK libraries is in a PDF http://www.bl.uk/bibliographic/pdfs/marc_codes.pdf at the moment, but it might be possible to collect the data from this format. Felix and Adrian presented their idea of adding RDFa to webpages containing information about libraries at ELAG2011 “Your Website is your API – How to integrate your Library into the Web of data using RDFa” and a representative from OCLC who attended the presentation directly started implementing this in the WorldCat registry.

The Talis Platform hosting and consultancy blog posts “Linking and Cleaning Data” were a very useful illustration of the use of org:Organization, org:hasSite, and v:VCard for specifying the links between an organisation and its sites and the site addresses.

An organisation ontology was used to describe SUNCAT contributing libraries. There was discussion about whether a “library” should be modelled to represent a single library in one building or be an umbrella term for all an institution’s libraries.

I found the examples on “Howto – Describing libraries, their collections and services in RDF” on the hbz Semantic web wiki very helpful.

Vocabularies used:

The RDF Vocabulary (RDF):
http://www.w3.org/1999/02/22-rdf-syntax-ns#

The RDF Schema vocabulary (RDFS):
http://www.w3.org/2000/01/rdf-schema#

Friend of a Friend (FOAF):
http://xmlns.com/foaf/0.1/

DCMI Metadata Terms (DCT):
http://purl.org/dc/terms/

An Ontology for vCards (V) for representing address and contact information:
http://www.w3.org/2006/vcard/ns#

WGS84 Geo Positioning (GEO):
http://www.w3.org/2003/01/geo/wgs84_pos#

XML Schema (XSD):
http://www.w3.org/2001/XMLSchema#

OWL
http://www.w3.org/2002/07/owl#

SKOS
http://www.w3.org/2004/02/skos/core#

Ordnance Survey Postcode Ontology
http://data.ordnancesurvey.co.uk/ontology/postcode/

The rdf:about RDF/Turtle validator and Converter was useful for checking Turtle files.

There is a JISC MU list of organisations which I used enrich the SUNCAT institution data with JISC MU organisation identifiers by querying the SPARQL endpoint for  JISC MU institutions, also using the Perl CPAN module RDF::Query::Client.

Transforming the SUNCAT institution data into linked data has helped SUNCAT clean our data. The linked data can be used as internal source of data for various SUNCAT configuration files, web pages, and contact information.

Syntax highlighting in vim for .ttl files

Fed up with plain black on white text when editing in vim?  I was, but not now!

Did you know Turtle syntax is really a subset of a broader syntax called Notation3?  I didn’t, but I do now! (proof at http://www.w3.org/DesignIssues/diagrams/n3/venn ).

There’s an N3 syntax highlighter for vim at http://www.vim.org/scripts/script.php?script_id=944 .

Just download the v1.1 file and put it in ~/.vim/syntax (you might need to make that directory) and then add the following to ~/.vim/filetype.vim

" RDF Notation 3 Syntax
augroup filetypedetect
au BufNewFile,BufRead *.n3  setfiletype n3
au BufNewFile,BufRead *.ttl  setfiletype n3
augroup END

 

(yes, including that spurious looking quotation mark at the beginning).

TA-DA!