Structuring a Linked Data namespace for places

Thoughts on structuring a namespace for historic English places, for our prototype Linked Data version of the English Place Name Survey; how do others do it? Our options seem to be:

  1. give each placename a numeric identifier that can be part of the link
  2. create a more human-readable identifier based on the name, to use as part of the link.

Numeric identifiers for places look like common practise. Geonames.org uses numbers to create links for places – so http://sws.geonames.org/2656197/ “is”, or refers to, Baschurch in Shropshire. Though the coordinates of the point may change, the number is associated with the name, and it remains the same.

Ordnance Survey Linked Data also uses a numeric ID to create its link that stands for (the same) Baschurch – http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354.

The Linked Data Patterns online book has a set of patterns for identifying URIs. The patterns are focused on use with systems that are already database-based, with some design thought having gone into how IDs look, how they can be looked up, and how their persistence is guaranteed.

The point here is that the numeric identifiers still need careful curation – an organisational guarantee that the identifiers will stay the same for the predicatable future.

We’re using a relational database (PostGIS) rather than a triplestore, to hold the Chalice data (because the data model won’t really change or expand). We can’t just use IDs that are created automatically by the database when items are inserted into it, because those might change if the names are inserted in a different order.

During Chalice we’re not building a be-all-end-all system, but rather prototyping an approach to text mining and georeferencing places can be used to turn an amazing hand-created resource into a 21st century Linked Data gazetteer; leaving behind open source tools to make sure the process can be repeated again with more digitised text.

But we’re not building something to throw away; we want to make sure the links we create can be preserved – that they won’t be broken and won’t change their meanings. So it may be better for us to structure our namespace using the EPNS names themselves, and the order in which they occur in the printed volumes of EPNS.

The EPNS volumes are arranged county-by-county – each county has its own editor, and so may have different layout, style guidelines, level of detail for things like field-names, and the presence or absence of OS Grid coordinates, more or less according to the whims of the county editor. (We’ve focused on Cheshire, but LTG have been developing test parsers for samples of several different counties.)

So it makes sense to include the county name in our namespace. This also helps with disambiguation – which Walton is this Walton? But there will still be cases where several places, in quite different locations, but still within the same county, share a name. In this case, we’d also give the places a numeric identifier (Walton-1, Walton-2) in the order in which they appear in the EPNS text.

Some volumes of EPNS give us OS National Grid coordinates for the “major names”, others don’t. Where the “major name” exists in one or more gazetteers (geonames, OS Open Data), the LTG’s georesolver tool can create some of the missing links using the Unlock Places gazetteer cross-search.

More potentially useful context in the work of the UK Location Programme on Linked Data namespaces for places – a recent Guide to Linked Data and the UK Location Strategy, and last year’s guidance on Designing URI sets for Location.

One more potential complication, which is a fairly subtle issue of semantics – does a link identify a place, or a description of a place? Ordnance Survey Research try to make the difference clear by using a different namespace for ‘IDs for places’ and ‘IDs for documents describing places’.
So http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354 “is” Baschurch; and http://data.ordnancesurvey.co.uk/doc/50kGazetteer/16354 “is” the description of Baschurch. To make sure we’re properly confused, when a human looks up the /id/ link using a web browser, the browser is redirected to the human-readable /doc/. To actually get hold of the Linked Data description of Baschurch (including the coordinates for it in the 50K gazetteer), one has to specifically request the machine-readable, rather than human-readable, version of the link, like this:

curl -L http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354 -H "Accept: application/rdf+xml" :) - but now you know that!

This took me a little while, and some back-and-forth with John Goodwin from OS Research on “Twitter”, to figure out, which is why I thought it worth writing down here.

Comments are closed.