Claire showed us some early results from the work of the Language Technology Group, text mining volumes of the English Place Name Survey to extract geographic names and relations between them.
What you see here (or in the full-size visualisations – start with files *display.html) is the set of names extracted from an entry in EPNS (one town name, and associated names of related or contained places). Note there is just a display, the data structures are not published here at the moment, we’ll talk next week about that.
The names are then looked up in the geonames place-name gazetteer, to get a set of likely locations; then the best-match locations are guessed at based on the relations of places in the document.
Looking at one sample, for Ellesmere – five names are found in geonames, five are not. Of the five that are found, only two are certainly located, e.g. we can tell that the place in EPNS and place in geonames are the same, and establish a link.
What will help improve the quantity of samenesses that we can establish, is filtering searches to be limited by counties – either detailed boundaries or bounding boxes that will definitely contain the county. Contemporary data is now there for free re-use through Unlock Places, which is a place to start.
Note – the later volumes of EPNS do provide OS National Grid coordinates for town names; the earlier ones do not; we’re still not sure when this starts, and will have to check in with EPNS when we all meet there on September 3rd.
How does this fit expectations? We know from past investigations with mixed sets of user-contributed historic place-name data that geonames does well, but not typically above 50% of things located. Combining geonames with OS Open Data sources should help a bit.
The main thing i’m looking to find out now is what proportion of the set of all names will be left floating without a georeference, and how many hops or links we’ll have to traverse to connect floating place-names with something that does have a georeference. How important it will be to convey uncertainty about measurements; and what the cost/benefit will be of making interfaces allowing one to annotate and to correct the locations of place-names against different historic map data sources.
Clearly the further back we go the squashier the data will be; some of the most interesting use cases that CeRch have been talking to people about, involve Anglo-Saxon place references. No maps – not a bad thing – but potentially many hops to a “certain” reference. Thinking about how we can re-use, or turn into RDF namespaces, some of the Pleiades Ancient World GIS work on attestation/confidence of place-names and locations.