At risk over the weekend of 19th-20th Feb

Posted on February 11, 2011 by jwalsh

There may be some disruptions to our service over the weekend of 19th-20th Feb 2011. We’re replicating our servers across two datacentres at the University of Edinburgh, and one of them is being switched off over that weekend.

Hopefully the running of Unlock won’t be affected – we survived a recent “at risk” weekend without downtime – but just so you know.

Unlock in use

Posted on January 28, 2011 by jwalsh

It would be great to hear from people about how they are using the Unlock place search services. So you’re encouraged to contact us and tell us how you’re making use of Unlock and what you want out of the service.

Here are some of the projects and services we’ve heard about that are making interesting use of Unlock in research applications.

The Molly project based at University of Oxford provides an open source mobile location portal service designed for campuses. Molly uses some Cloudmade services and employs Unlock for postcode searching.

Georeferencer.org uses Unlock Places to search old maps. The service is used by National Library of Scotland Map Library and other national libraries in Europe.
More on the use of Unlock Places by georeferencer.org.

CASOS at CMU has been experimenting the Unlock Text service to geolocate social network information.

The Open Fieldwork project has been georeferencing educational resources: “In exploring how we could dynamically position links to fieldwork OER on a map, based on the location where the fieldwork takes place, one approach might be to resolve a position from the resource description or text in the resource. The OF project tried out the EDINA Unlock service â€“ it looks like it could be very useful.”

We had several interesting entries to 2010′s dev8d developer challenge using Unlock:

Embedded GIS-lite Reporting Widget:
Duncan Davidson, Informatics Ventures, University of Edinburgh
“Adding data tables to content management systems and spreadsheet software packages is a fairly simple process, but statistics are easier to understand when the data is visual. Our widget takes geographic data â€“ in this instance data on Scottish councils â€“ passes it through EDINAâ€™s API and then produces coordinates which are mapped onto Google. The end result is an annotated map which makes the data easier to access.”

Geoprints, which also works with the Yahoo Placemaker API, by
Marcus Ramsden at Southampton University.
“Geoprints is a plugin for EPrints. You can upload a pdf, Word document or Powerpoint file, and it will extract the plain text and send it to the EDINA API. GeoPrints uses the API will pull out the locations from that data and send it to the database. Those locations will then be plotted onto a map, which is a better interface for exploring documents.”

Point data in mashups: moving away from pushpins in maps:
Aidan Slingsby, City University London
“Displaying point data as density estimation services, chi surfaces and ‘tagmaps’. Using British placenames classified by generic form and linguistic origin, accessed through the Unlock Places API.”

The dev8d programme for 2011 is being finalised at the moment and should be published soon; the event this year runs over two days, and should definitely be worth attending for developers working in, or near, education and research.

Exploring the Locator OS OpenData set

Posted on January 21, 2011 by jwalsh

Fiona Hemsley-Flint had a good look at the OS Locator dataset which is available from the Ordnance Survey Open Data portal. I thought a summary of her findings might be of use to others thinking about how to use this dataset.

Overview

OS Locator contains a list of all the road names in UK, “derived from a number of Ordnance Survey datasets [Meridian2, Road database, Locality dataset, Boundary-Line]. These include the roads database which contains information on road names and road numbers and is the latest generation of Ordnance Survey’s sophisticated and highly detailed geographic data”. OS recommend viewing it on top of mid-scale datasets such as 1:10k & 1:25k Raster and streetview (which is freely available via OS opendata).

Geometries

Each feature is geo-referenced by a centre point and a bounding box (although some of the bboxes are actually line features where the road segment of the feature is horizontal or vertical).

Figure 1. Multiple occurrences of Ferry Road, differentiated by their locality.

Attribution

The roads have a name and/or a classification, where the classification represents a road number, (e.g. ‘A1′ or ‘B1243′). They also have an associated settlement (town), locality, county/region and local authority; the latter two are derived from Boundary-Line, it is unclear what is used to form the â€˜Locality datasetâ€™. Locality and settlement are likely to be the most useful of these attributes when displaying result sets. For roads which cross locality boundaries, a point is assigned for each separate locality, therefore one road may have more than one point associated with it, distinguished by its locality.

Storage

851505 rows of data were added to a development server.
Multiple geometry columns have been added to take into account the different geometries available.
A â€˜tsvectorâ€™ column has also been added to implement Postgres text search functionality. An example query might be:
select name, classification, locality, settlement from os.locator_nov_10 where search @@ to_tsquery(‘high & street & edinburgh’);

Which returns the following result set:

Name	Classification	Locality	settlement
CORSTORPHINE HIGH STREET		Se Corstorphine	EDINBURGH
HIGH STREET		Musselburgh Central	EDINBURGH
HIGH STREET		Musselburgh North	EDINBURGH
HIGH STREET		Holyrood	EDINBURGH
HIGH STREET	A199	Musselburgh North	EDINBURGH
HIGH STREET	A199	Musselburgh Central	EDINBURGH
NORTH HIGH STREET		Musselburgh North	EDINBURGH
NORTH HIGH STREET	A199	Musselburgh West	EDINBURGH
PORTOBELLO HIGH STREET	B6415	Milton	EDINBURGH
PORTOBELLO HIGH STREET	B6415	Portobello	EDINBURGH
NORTH HIGH STREET	A199	Musselburgh North	EDINBURGH

Overall, the dataset contains a comprehensive list of the roads names within the UK. Decisions will need to be made about how to treat multiple features that actually refer to the same real world road.

The main limitation of this dataset is that it can only be used to show the user the general location of a road â€“ it canâ€™t be used as a precise address gazetteer since it only provides street names with no knowledge of building numbers.

Linked Data for places – any advice?

Posted on January 6, 2011 by jwalsh

We’d really benefit from advice about what Linked Data namespaces to use to describe places and the relationships between them. We want to re-use as much of others’ work as possible, and use vocabularies which are likely to be well and widely understood.

Here’s a sample of a “vanilla” rendering of a record for a place-name in Cheshire as extracted from the English Place Name Survey – see this as a rough sketch.

<RDF>
<chalice:Place rdf:about=”/place/cheshire/prestbury/bosley/bosley”>
<rdfs:isDefinedBy>/doc/cheshire/prestbury/bosley/bosley
</rdfs:isDefinedBy>
<rdfs:label>Bosley</rdfs:label>
<chalice:parish rdf:resource=”/place/cheshire/prestbury/bosley”/>
<chalice:parent rdf:resource=”/place/cheshire/prestbury/bosley”/>
<chalice:parishname>Bosley</chalice:parishname>
<chalice:level>primary-sub-township</chalice:level>
<georss:point>53.1862392425537 -2.12721741199493</georss:point>
<owl:sameAs rdf:resource=”http://data.ordnancesurvey.co.uk/doc/50kGazetteer/28360″/>
</chalice:Place>
</rdf:RDF>

GeoNames

We could re-use as much as we can of the geonames ontology. It defines a gn:Feature to indicate that a thing is a place, and gn:parentFeature to indicate that one place contains another.

Ordnance Survey

Ordnance Survey publish some geographic ontologies: there are some within data.ordnancesurvey.co.uk, and there’s some older work including a vocabulary for mereological (i.e. containment) relations includes isPartOf and hasPart. But the status of this vocabulary is unclear – is its use still advised?

The Administrative Geography ontology defines a ‘parish‘ relation – this is the inverse of how we’re currently using ‘parish’. (i.e. Prestbury contains Bosley) (And our concepts of historic parish and sub-parish are terrifically vague…)

For place-names found in the 1:50K gazetteer the OS use the NamedPlace class – but it feels odd to be re-using a vocabulary explicitly designed for the 50K gazetteer.

Or…

Are there other wide-spread Linked Data vocabularies for places and their names which we could be re-using? Are there other ways in which we could improve the modelling? Comments and pointers to others’ work would be greatly appreciated.

Reflections on the second Chalice scrum

Posted on January 6, 2011 by jwalsh

We had a second two-week Scrum session on code for the Chalice project. This was a followup to the first Chalice scrum during which we made solid progress.

During the second Scrum the team ran into some blocks and progress slowed. The following is quite a soul-searching post, in accordance with the project documentation instructions: “don’t forget to post the FAIL(s) as well: telling people where things went wrong so they don’t repeat mistakes is priceless for a thriving community.”

Our core problem was the relative inflexibility of the relational database backend. We’d chosen to use an RDBMS rather than an RDF triplestore mainly for the benefits of code-reuse and familiarity, as this enabled us to repurpose code from a couple of similar EDINA projects, Unlock and Addressing History.

However, when the time came to revise the model based on updated data extracted from EPNS volumes, this created a chain of dependencies – updates to the data model, then the API, then the prototype visualisation – progress slowed, and not much changed in the course of the second sprint.

A second problem was lack of really clearly defined use cases, especially for a visual interface to the Chalice data. Here we have a bit of a chicken-and-egg situation; the work exploring how different archive projects can re-use the Chalice data to enhance their collections, is still going on. This is something which we have more emphasis on during the latter part of the project.

So on the one hand there’s a need for a working prototype to be able to integrate Chalice data with other resources; and on the other, a need to know how those resources will re-use the Chalice data to inform the prototype.

So what would we do differently if we did it again?

More of a design phase before the Scrum proper starts – with time to experiment with different data storage backends
More work developing detailed use cases before software development starts
More active collaboration between people talking to end users and people developing the backend (made more difficult because the project partners are distributed in space)

Below are some detailed comments from two of the Scrum team members, Ross and Murray.

Ross: I found Scrum useful, efficient, great for noticing both what others are doing and when your heading down the wrong path and identifying when you need further meetings, as was the case a few times early in the process. The whiteboard idea developed later on was also very useful. I don’t think the bottlenecks where anything to do with the use of Scrum, just in the amount of information and quality of data we had available to us, maybe this is due partially to the absence of requirements gathering in Scrum.

The data we received had to be reverse engineered to some respect. As well as figuring out what everything in the given format was for (such as regnal dates, alternative names, contained places and their location relative to parent) and what parts where important to us (such as which of the many date formats we were going to store i.e. start, end and/or approximations) we also had no direct control over it.

In order for the database, interface and API to work we had to decide on a structure quickly and get data in the database meaning learning how to install and operate a triple store (the recommend method) or spend time figuring out how to get hibernate to work with the decided
structure (a more adaptable database access technology) would have delayed everything so a trade off was made to manually write code to
parse the data from XML and enter it into a familiar relational database which caused us more problems later on. One of these was that the data was to continue to change on every generation; elements being added and removed or completely changed meant changing the parsing, then the domain objects, then the database and lastly the database insertion code.

Lack of use cases: From the start we were developing an app without knowing what it should look like or how it should function. We were unsure as to what data we should or would need to store and how much control users of the service would have over the data in the database. We were unsure how to query the database and display API request responses so as to best fit the
needs of the intended users in an efficient, useful way. We are slightly more clear on this but more information on how the product will be used would be greatly helpful.

And as for future development… If we are sticking with the relational database model I definitely think it’s wise to get rid of all the database reading/writing code in favour of a hibernate solution, this would be tricky with our database structure however but more adaptable and symmetrical; so that changes to the input method are also made to the output and only one change needs
to be made. Some sort of XML-POJO relational tool may also be useful
although would make new dataset importing more complex (perhaps using
xslt) to further improve adaptability.
As well as that, some more specific use cases mentioning inputs and
required outputs would be very useful.

Murray: My comment, would be that we possibly should have worked on a hibernate
ORM first, before creating the database. As soon as we had natural keys,
triggers and stored procs in the database, it became too cumbersome to
reverse engineer them.

If we had created a ORM mapping first we could automatically generate
the db schema from that, rather than the other way round.
I presume we could write the searches even the spacial ones in hibernate
rather than stored procs.
Then it would be easier to cope will all the shifts in the xml
structure. Propagating to changes through the tiers would be case of
regenerating db and domain objects from the mappings rather than by hand.

The generated domain objects could be reused across the dataloading, api
and search. The default lazy loading in hibernate would have been good
enough to deal with the hierarchical nature of the data to a
indiscriminate depth.

Using source identifiers to link data

Posted on November 29, 2010 by jwalsh

In the Chalice project we’ve used Unlock Places to make links across the Linked Data web, using the source identifier which appears in the results of each place search. As this might be useful to others, it’s worth walking through an example.

This search for “Bosley” shows us results in the UK from geonames and from the Ordnance Survey 50K gazetteer: http://unlock.edina.ac.uk/ws/nameSearch?name=Bosley&country=uk

Here’s an extract of one of the results, the listing for Bosley in the Ordnance Survey 1:50K gazetteer:

<identifier>11083412</identifier>
<sourceIdentifier>28360</sourceIdentifier>
<name>Bosley</name>
<country>United Kingdom</country>
<custodian>Ordnance Survey</custodian>
<gazetteer>OS Open 1:50 000 Scale Gazetteer</gazetteer>

The sourceIdentifier shown here is the identifier published by each of the original data sources that Unlock Places is using to cross-search.

Ordnance Survey Research re-uses these identifiers to create its Linked Data namespace. For any place in the 50K gazetteer, we can reconstruct the link that refers to that place by appending the source identifier to this URL, which is the namespace for the 50K gazetteer: http://data.ordnancesurvey.co.uk/id/50kGazetteer/

So our reference to Bosley can be made by adding the source identifier to the namespace:

http://data.ordnancesurvey.co.uk/id/50kGazetteer/28360

The same goes for source identifiers for places found in the geonames.org place-name gazetteer.

<sourceIdentifier>2655141</sourceIdentifier>
<name>Bosley</name>
<gazetteer>GeoNames</gazetteer>

Geonames uses http://sws.geonames.org/ as a namespace for its Linked Data links for places. So we can reconstruct the link for Bosley using the source identifier like this:

http://sws.geonames.org/2655141/

Note that the link needs the forward slash on the end to work correctly. If one looks at either of these links with a web browser, one is redirected to a human-readable page describing that place. To see the machine-readable, RDF version of the link’s contents, look at it with a command-line program such as curl, asking to “Accept” the RDF version:

curl -L http://data.ordnancesurvey.co.uk/id/50kGazetteer/28360 -H "Accept: application/rdf+xml"

I hope this is useful to others. We could add the links directly into the default search results, but many users may not be that interested in seeing RDF links in place-name search results. Thoughts on how we could offer this as a more useful function would be much appreciated.

Linking historic places: looking at Victoria County History

Posted on November 19, 2010 by jwalsh

Stuart Dunn mentioned the Victoria County History in his writeup of discussions with the Clergy of the Church of England Database project. Both resources are rich in place-name mentions and historic depth; as part of the Chalice project we’re investigating ways to make such resources more searchable by extracting historic place-names and linking them to our gazetteer.

Here’s a summary of some email conversation between Stuart, Claire Grover, Ross Drew at EDINA and myself while looking at some sample data from VCH.

The idea is to explore the possibilities in how Chalice data could enhance / complement semi-structured information like VCH (or more structured database-like sources such as CCED).

It would be very valuable, I think, to do an analysis of how much effort and preparation of the (target) data is needed to link CHALICE to VCH, and a more structured dataset like CCED. By providing georeferences and toponym links, we’re bringing all that EPNS documentary evidence to VCH, thus enriching it.

It would be very interesting if we were able to show how text-mining techniques could be used to add to the work of EPNS (extracting place references that aren’t listed, and suggesting them to editors along with suggested attestations (source and date).

In the more immediate future; this is about adding links to Chalice place-references to other resources, that would allow us to cross-reference them and search them in interesting ways.

Text mining isn’t absolutely necessary to map the EPNS place names to the VCH text. On the other hand, LTG have all the processing infrastructure to convert formats, tokenise the text etc. so we could put something in place very quickly. It wouldn’t be perfect but it would demonstrate the point. I’ve not seen the CCED data, so don’t know how complex that would be.

Here’s a sample reference to a volume of VCH that may have some overlap with the Shropshire content we have in “born-digital” form from EPNS. There’s the intriguing prospect of adding historic place-name text mining/search in at the digitisation phase, so resources can be linked to other references as soon as they’re published.

More on the use of Unlock Places by georeferencer.org

Posted on November 19, 2010 by jwalsh

Some months back, Klokan Petr Pridal, who maintains OldMapsOnline.org and works with libraries and cartographic institutes across Europe, wrote with some questions about the Unlock Places service. We met at FOSS4G where I presented our work on the Chalice project and the Unlock services.
Petr writes about how Unlock is used in his applications, and what future requirements from the service may be:

It was great to meet you at FOSS4G in Barcelona and discuss with you
the progress related to Unlock and possible cooperation with
OldMapsOnline.org and usage in Georeferencer.org services.

As you have mentioned, the most important thing for us would be to
have in Unlock API/database the bounding boxes (or bounding polygons) for places as direct part of the JSON response.
We need that mostly for villages, towns and cities and for areas such
as districts or countries – all over the world. We need something like
“bounds” as provided by the Google geocoding API.

The second most important feature is to have the chance to install the
service in our servers – especially in case you can’t provide
guarantees for it in a future.

It would be also great to have chance to improve the service for non-English languages, but right now the gazetteers and text processing is not primary target of our research.

In this moment the Unlock API is in use:

As a standard gazetteer search service to zoom the base maps to a place people type in the search box in our Georeferencer.org service – a
collaborative georeferencing online service for scanned historical
maps. It is in use by National Library of Scotland and a couple of other libraries.

Here’s an example map (you need to register first).

The uniqueness of Unlock is in openness of the license (primarily GeoNames.org CC-BY and also OS OpenData) and also so far very good availability of the online service (EDINA hardware and network?). We are missing the bounding box to be able to zoom our base maps to the correct area (determine the appropriate zoom level). Unlock API replaced Google Geocoder, which we can’t use, because we are displaying also non-google maps (such as Ordnance Survey OpenData) and we are potentially deriving data from the gazetteer database (the control points on the old maps), which is against Google TOS.

In the future we are keen to extend the gazetteer with alternative
historical toponyms (which people can identify on georeferenced old
maps too), or participate on such work.

The other usage of Unlock API is:

As a metadata text analyzer, in a service such as our
http://geoparser.appspot.com/, where we automatically parse existing
library textual metadata to identify place names and locate the
described maps including automatic approximation of their spatial
coverage (by identifying map scale and physical size in the text and
doing a simple math on top of it). This service is in a prototype
phase only, we are using Yahoo Placemaker and I was testing Unlock Text API
with it too.

Here the huge advantage of Unlock would be primarily the possibility
to add custom gazetteers (with Geonames as the default one), language detection (for example via Google Language API or otherwise) and also possibility to add into the workflow other tools, such as lemmatizator for particular language – the simplest available via hun/a/ispellu
database integration or via existing morphological rule-based software
such as:

The problem is that without returning the lemmatization of the text the geoparser is almost unusable in non-English languages – especially Slavic
one.

We are very glad for availability of your results and of the reliable
online services you provide. We can concentrate on the problems we
need to solve primarily (georeferencing, clipping, stitching and
presentation of old maps for later analysis) and use your results of
research as a component solving a problem we are touching and we have to practically solve somehow.”

Very glad that Petr wrote at such length about comprehensive use of Unlock. pushing the edges of what we are doing with the service.

We have some work in the pipeline adding bounding boxes for places worldwide by making Natural Earth Data searchable through Unlock Places. Natural Earth is a generalised dataset intended for use in cartography, but should also have quite a lot of re-use value for map search.

Structuring a Linked Data namespace for places

Posted on November 10, 2010 by jwalsh

Thoughts on structuring a namespace for historic English places, for our prototype Linked Data version of the English Place Name Survey; how do others do it? Our options seem to be:

give each placename a numeric identifier that can be part of the link
create a more human-readable identifier based on the name, to use as part of the link.

Numeric identifiers for places look like common practise. Geonames.org uses numbers to create links for places – so http://sws.geonames.org/2656197/ “is”, or refers to, Baschurch in Shropshire. Though the coordinates of the point may change, the number is associated with the name, and it remains the same.

Ordnance Survey Linked Data also uses a numeric ID to create its link that stands for (the same) Baschurch – http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354.

The Linked Data Patterns online book has a set of patterns for identifying URIs. The patterns are focused on use with systems that are already database-based, with some design thought having gone into how IDs look, how they can be looked up, and how their persistence is guaranteed.

The point here is that the numeric identifiers still need careful curation – an organisational guarantee that the identifiers will stay the same for the predicatable future.

We’re using a relational database (PostGIS) rather than a triplestore, to hold the Chalice data (because the data model won’t really change or expand). We can’t just use IDs that are created automatically by the database when items are inserted into it, because those might change if the names are inserted in a different order.

During Chalice we’re not building a be-all-end-all system, but rather prototyping an approach to text mining and georeferencing places can be used to turn an amazing hand-created resource into a 21st century Linked Data gazetteer; leaving behind open source tools to make sure the process can be repeated again with more digitised text.

But we’re not building something to throw away; we want to make sure the links we create can be preserved – that they won’t be broken and won’t change their meanings. So it may be better for us to structure our namespace using the EPNS names themselves, and the order in which they occur in the printed volumes of EPNS.

The EPNS volumes are arranged county-by-county – each county has its own editor, and so may have different layout, style guidelines, level of detail for things like field-names, and the presence or absence of OS Grid coordinates, more or less according to the whims of the county editor. (We’ve focused on Cheshire, but LTG have been developing test parsers for samples of several different counties.)

So it makes sense to include the county name in our namespace. This also helps with disambiguation – which Walton is this Walton? But there will still be cases where several places, in quite different locations, but still within the same county, share a name. In this case, we’d also give the places a numeric identifier (Walton-1, Walton-2) in the order in which they appear in the EPNS text.

Some volumes of EPNS give us OS National Grid coordinates for the “major names”, others don’t. Where the “major name” exists in one or more gazetteers (geonames, OS Open Data), the LTG’s georesolver tool can create some of the missing links using the Unlock Places gazetteer cross-search.

More potentially useful context in the work of the UK Location Programme on Linked Data namespaces for places – a recent Guide to Linked Data and the UK Location Strategy, and last year’s guidance on Designing URI sets for Location.

One more potential complication, which is a fairly subtle issue of semantics – does a link identify a place, or a description of a place? Ordnance Survey Research try to make the difference clear by using a different namespace for ‘IDs for places’ and ‘IDs for documents describing places’.
So http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354 “is” Baschurch; and http://data.ordnancesurvey.co.uk/doc/50kGazetteer/16354 “is” the description of Baschurch. To make sure we’re properly confused, when a human looks up the /id/ link using a web browser, the browser is redirected to the human-readable /doc/. To actually get hold of the Linked Data description of Baschurch (including the coordinates for it in the 50K gazetteer), one has to specifically request the machine-readable, rather than human-readable, version of the link, like this:

curl -L http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354 -H "Accept: application/rdf+xml" - but now you know that!

This took me a little while, and some back-and-forth with John Goodwin from OS Research on “Twitter”, to figure out, which is why I thought it worth writing down here.

Linked Data choices for historic places

Posted on November 5, 2010 by jwalsh

We’ve had some fitful conversation about modelling historic place-names extracted from the English Place Name Survey as Linked Data, on the Chalice mailing list.
It would be great to get more feedback from others where we have common ground. Here’s a quick summary of the main issues we face and our key points of reference, to start discussion, and we can go into more detail on specific points as we work more with the EPNS data.

Re-use, reduce, recycle?

We should be making direct re-use of others’ vocabularies where we can. In some areas this is easy. For example, to represent the containment relations between places (a township contains a parish, a parish contains a sub-parish) we can re-use the some of the Ordnance Survey Research work on linked data ontologies – specifically their vocabulary to describe “Mereological Relations” – where “mereological” is a fancy word for “containment relationships”.

Adapting other schemas into a Linked Data model

One project which provides a great example of a more link-oriented, less geometry-oriented approach to describing ancient places is the Pleaides collection of geographic information about the Classical ancient world. Over the years, Pleaides has developed with scholars an interesting set of vocabularies, which don’t take a Linked Data approach but could be easily adapted to do so. They encounter issues to do with vagueness and uncertainty that geographical information systems concerning the contemporary world, can overlook. For example, the Pleiades attestation/confidence vocabulary expresses the certainty of scholars about the conclusions they are drawing from evidence.

So an approach we can take is to build on work done in research partnerships by others, and try to build mind-share about Linked Data representations of existing work. Pleiades also use URIs for places…

Use URIs as names for things

One interesting feature of the English Place Name Survey is the index of sources for each set of volumes. Each different source which documents names (old archives, previous scholarship, historic maps) has an abbreviation, and every time a historic place-name is mentioned, it’s linked to one of the sources.

As well as creating a namespace for historic place-names, we’ll create one for the sources (centred on the five volumes covering Cheshire, which is where the bulk of work on text correction and data extraction has been done. Generally, if anything has a name, we should be looking to give it a URI.

Date ranges

Is there a rough consensus (based on volume of data published, or number of different data sources using the same namespace) on what namespace to use to describe dates and date ranges as Linked Data? At one point there were several different versions of iCal, hCal, xCal vocabularies all describing more or less the same thing.

We’ve also considered other ways to describe date ranges – talking to Pleiades about mereological relations between dates – and investigating the work of Common Eras on user-contributed tags representing date ranges. It would be hugely valuable to learn about, and converge on, others’ approaches here.

How same is the same?

We propose to mint a namespace for historic place-names documented by the English Place Name Survey. Each distinct place-name gets its own URI.

For some of the “major names”, we’ve been able to use the Language Technology Group’s georesolution tool to make a link between the place-name and the corresponding entry in geonames.org.

Some names can’t be found in geonames, but can be found, via Unlock Places gazetteer search, in some of the Ordnance Survey open data sources. Next week we’ll be looking at using Unlock to make explicit links to the Ordnance Survey Linked Data vocabularies. One interesting side-effect of this is that, via Chalice, we’ll create links between geonames and the OS Linked Data, that weren’t there before.

Kate Byrne raised an interesting question on the Chalice mailing list – is the ‘sameAs’ link redundant? For example, if we are confident that Bosley in geonames.org is the same as Bosley in the Cheshire volumes of English Place Name Survey, should we re-use the geonames URI rather than making a ‘sameAs’ link between the two?

How same, in this case, is the same? We may have two, or more, different sets coordinates which approximately represent the location of Bosley. Is it “correct”, in Linked Data terms, to state that all three are “the same” when the locations are subtly different?
This is before we even get into the conceptual issues around whether a set of coordinates really has meaning as “the location” of a place. Geonames, in this sense, is a place to start working out towards more expressive descriptions of where a place is, rather than a conclusion.

Long-term preservation

Finally, we want to make sure that any URIs we mint are going to be preserved on a really long time horizon. I discussed this briefly on the Unlock blog last year. University libraries, or cultural heritage memory institutions, may be able to delegate a sub-domain that we can agree to long-term persistence of – but the details of the agreement, and periodic renewal of it due to infrastructural, organisational and technological change, is a much bigger issue than i think we recognise.

EDINA Blogs

A Blogs.edina.ac.uk weblog

Author Archives: jwalsh