We’ve had some fitful conversation about modelling historic place-names extracted from the English Place Name Survey as Linked Data, on the Chalice mailing list.
It would be great to get more feedback from others where we have common ground. Here’s a quick summary of the main issues we face and our key points of reference, to start discussion, and we can go into more detail on specific points as we work more with the EPNS data.
Re-use, reduce, recycle?
We should be making direct re-use of others’ vocabularies where we can. In some areas this is easy. For example, to represent the containment relations between places (a township contains a parish, a parish contains a sub-parish) we can re-use the some of the Ordnance Survey Research work on linked data ontologies – specifically their vocabulary to describe “Mereological Relations” – where “mereological” is a fancy word for “containment relationships”.
Adapting other schemas into a Linked Data model
One project which provides a great example of a more link-oriented, less geometry-oriented approach to describing ancient places is the Pleaides collection of geographic information about the Classical ancient world. Over the years, Pleaides has developed with scholars an interesting set of vocabularies, which don’t take a Linked Data approach but could be easily adapted to do so. They encounter issues to do with vagueness and uncertainty that geographical information systems concerning the contemporary world, can overlook. For example, the Pleiades attestation/confidence vocabulary expresses the certainty of scholars about the conclusions they are drawing from evidence.
So an approach we can take is to build on work done in research partnerships by others, and try to build mind-share about Linked Data representations of existing work. Pleiades also use URIs for places…
Use URIs as names for things
One interesting feature of the English Place Name Survey is the index of sources for each set of volumes. Each different source which documents names (old archives, previous scholarship, historic maps) has an abbreviation, and every time a historic place-name is mentioned, it’s linked to one of the sources.
As well as creating a namespace for historic place-names, we’ll create one for the sources (centred on the five volumes covering Cheshire, which is where the bulk of work on text correction and data extraction has been done. Generally, if anything has a name, we should be looking to give it a URI.
Is there a rough consensus (based on volume of data published, or number of different data sources using the same namespace) on what namespace to use to describe dates and date ranges as Linked Data? At one point there were several different versions of iCal, hCal, xCal vocabularies all describing more or less the same thing.
We’ve also considered other ways to describe date ranges – talking to Pleiades about mereological relations between dates – and investigating the work of Common Eras on user-contributed tags representing date ranges. It would be hugely valuable to learn about, and converge on, others’ approaches here.
How same is the same?
We propose to mint a namespace for historic place-names documented by the English Place Name Survey. Each distinct place-name gets its own URI.
For some of the “major names”, we’ve been able to use the Language Technology Group’s georesolution tool to make a link between the place-name and the corresponding entry in geonames.org.
Some names can’t be found in geonames, but can be found, via Unlock Places gazetteer search, in some of the Ordnance Survey open data sources. Next week we’ll be looking at using Unlock to make explicit links to the Ordnance Survey Linked Data vocabularies. One interesting side-effect of this is that, via Chalice, we’ll create links between geonames and the OS Linked Data, that weren’t there before.
Kate Byrne raised an interesting question on the Chalice mailing list – is the ‘sameAs’ link redundant? For example, if we are confident that Bosley in geonames.org is the same as Bosley in the Cheshire volumes of English Place Name Survey, should we re-use the geonames URI rather than making a ‘sameAs’ link between the two?
How same, in this case, is the same? We may have two, or more, different sets coordinates which approximately represent the location of Bosley. Is it “correct”, in Linked Data terms, to state that all three are “the same” when the locations are subtly different?
This is before we even get into the conceptual issues around whether a set of coordinates really has meaning as “the location” of a place. Geonames, in this sense, is a place to start working out towards more expressive descriptions of where a place is, rather than a conclusion.
Finally, we want to make sure that any URIs we mint are going to be preserved on a really long time horizon. I discussed this briefly on the Unlock blog last year. University libraries, or cultural heritage memory institutions, may be able to delegate a sub-domain that we can agree to long-term persistence of – but the details of the agreement, and periodic renewal of it due to infrastructural, organisational and technological change, is a much bigger issue than i think we recognise.