About Ian Stuart

Code Gorilla, spanner monkey, fettler of grubby things.

From Project to Product

The OA-RJ aspect of the Linked Data Focus group has now finished…. and finished with a real honest-to-goodness product.

First, the OA-RJ project has finished, however the two aspects of the service have continued as distinct & separate services: Organisation & Repository Identification (ORI) and Repository Junction Broker (RJB).

ORI is the discovery service, and is the service that the OA-RJ Linked Data work related to. The LDFocus work created the idea of Linked Data as a viable product for the discovery service.

In terms of Linked Data, ORI provides:

We are still discovering new ways to improve what we provide to consumers of the data, and would be delighted to hear any suggestions (contact the us via the UK RepositoryNet+ Helpdesk contact form or send email direct to: support@repositorynet.ac.uk

Repository Fringe welcomes its big brother

After four years of successful Repository Fringe events, the same great team have grabbed The Big One: The International Open Repositories Conference!

Obviously it would be madness to try to put on an International Conference during the Edinburgh Festival and Edinburgh Festival Fringe, so Open Repositories will be in early July, and Repository Fringe will be there too!

From Monday 9th until Friday 13th, we will be running as a strand within the main conference, exploring the theme of Open Services for Open Content: Local In for Global Out.

 

Posted in Uncategorized

Possible future work based on OA-RJ Linked data

There is a agreed need to have an international collection of identifiable organisation… be this collated from national or local levels, or globally assigned.

There are several such lists available: the UK has a list of educational establishments at data.gov.uk, it is believed that the US has a similar one at data.gov. OA-RJ has an international list of institutions derived from repositories and the organisations that run them.

There are issues that need addressed, however:

  • Anything that is UK focused is, frankly, a waste of time: Even on a global scale, having 1961 countries each post their own list with no consolidation of reference between them gives rise to two main problems:
    1. Finding the lists becomes a problem, and people need to know where each list is
    2. There are a significant number of organisations are are not geographically restricted to a single country (or, indeed, geographically located at all!)
  • The history of organisations needs to be tracked: They are born, merge, split, rename, and even die.
  • They have complex parent/child relationships (there are research centres, funded by NERC, completely housed within larger organisations.
  • They have (multiple?) geo-spatial locations.
  • There needs to be some real-world examples of use for the data.

There have been a number of previous JISC (and other, overseas) projects in this area, which should be pulled together & combined into a greater whole.

[1] 196 countries is the 195 independent [sovereign] states recognised by the US State Department, plus Tiawan. This ignores the very real situation where England produces its own list, with the expectation that Scotland, Ireland & Wales will produce similar lists (so that’s 199). Add in the expectation that the US will produce lists at (possibly) State level, and you’re up to 250 lists. Add in larger countries like Russia or China devolving lists down, and the problem of isolation and “un-discoverability” get even worse! “Divide and Conquer” is definitely the way forward…. but not by secession and independent action – that would [in my view] be ignored by the larger community.

Resolvable IRIs

One of the tenants of linked data is that IRIs should be resolvable (it’s the 4th or 5th star, depending on which notation you are looking at)

There are two approaches to doing this:

  1. Create a server specifically to handle the linked data
    eg: http://opendata.opendepot.org/organisation/EDINA
  2. Create a resolver underneath an existing server
    eg: http://opendepot.org/opendata/organisation/EDINA

The main consideration is probably how many data sets you are resolving, and what association you want to promote. For example, the University of Southampton are exposing all their data at the University level – so having a central resolver (http://opendata.southampton.ac.uk) makes sense for them.

For OARJ, I can use the OpenDepot.org association…. thus it was easier for me to create a resolver within the opendepot.org server – so OARJ IRIs become something like http://opendepot.org/opendata/organisation/EDINA

The resolver script is http://opendepot.org/opendata/ and the standard Apache environment variable ‘PATH_INFO’ contains the rest if the IRI.

The code for the resolver is remarkably simple:

  use XML::LibXML;

  ## define $host
  ## get the full RDF document from the server: $dom
  ## get an XML Document that contains the RDF root element (complete with namespaces): $rdf

  # Get the <:RDF> element from $rdf
  $child = $rdf->firstChild;

  # for all XPAth stuff, we need to define the namespace
  $xpc = XML::LibXML::XPathContext->new;
  $xpc->registerNs('rdf',
                   'http://www.w3.org/1999/02/22-rdf-syntax-ns#');

  # We need the general "about" node with output
  $iri = "$host/reference/linked/1.0/oarj_ontology.rdf";

  # XPath queries are through the XML::XPath object
  @nodes = $xpc->findnodes("/rdf:RDF/rdf:Description[\@rdf:about=\"$iri\"]", $dom);
  $child->appendChild($nodes[0]) if $nodes[0];

  # and now find the specific rdf:Description we want
  $class = &get_first_pathinfo_item;  # will be "organisation" or "network" or....
  $t =     &get_second_pathinfo_item; # will be the name of the record **IRI encoded!**

  $iri = "$host/opendata/$class/$t";

  @nodes=();
  @nodes = $xpc->findnodes("/rdf:RDF/rdf:Description[\@rdf:about=\"$iri\"]", $dom);
  $child->appendChild($nodes[0]) if $nodes[0];

  print $dom->toString;

Obviously there are wrappers around that, but its a good basis.

Creating Turtle and RDF the easy way

One of the great things about Turtle format is that it is dead easy to write (see the blog post below on how easy it was.)

One of the great things about RDF format is that it is a well known format, rooted in XML, and very easily parsed….. but not fun to create.

What is needed is an easy way to create RDF from Turtle… and there is – Any23.org

(I’m a Perl-man, so my example code is in Perl – YMMV)

  use File::Slurp;
  use LWP::UserAgent;
  use HTTP::Request;

  ## create turtle text as before, in $t

  # Write the file into web-server space
  write_file("$datadir/$turtle_filename", $t);
  print "turtle written\n";

  # ping the whole thing off to any23.org to transform into RDF
  my $ua  = LWP::UserAgent->new();
  my $query = "http://any23.org/?format=rdfxml&uri=http://$host/$path/$filename";
  my $res = $ua->get($query);
  my $content = "";
  if ($res->is_success) {
    $content = $res->content;
    write_file("$datadir/$rdf_filename", $content);
    print "rdf written\n";
  } ## end if ($res->is_success)
  else { print $res->status_line; }

Et voila!