More on the use of Unlock Places by georeferencer.org

Some months back, Klokan Petr Pridal, who maintains OldMapsOnline.org and works with libraries and cartographic institutes across Europe, wrote with some questions about the Unlock Places service. We met at FOSS4G where I presented our work on the Chalice project and the Unlock services.
Petr writes about how Unlock is used in his applications, and what future requirements from the service may be:


It was great to meet you at FOSS4G in Barcelona and discuss with you
the progress related to Unlock and possible cooperation with
OldMapsOnline.org and usage in Georeferencer.org services.

As you have mentioned, the most important thing for us would be to
have in Unlock API/database the bounding boxes (or bounding polygons) for places as direct part of the JSON response.
We need that mostly for villages, towns and cities and for areas such
as districts or countries – all over the world. We need something like
“bounds” as provided by the Google geocoding API.

The second most important feature is to have the chance to install the
service in our servers
– especially in case you can’t provide
guarantees for it in a future.

It would be also great to have chance to improve the service for non-English languages, but right now the gazetteers and text processing is not primary target of our research.

In this moment the Unlock API is in use:

As a standard gazetteer search service to zoom the base maps to a place people type in the search box in our Georeferencer.org service – a
collaborative georeferencing online service for scanned historical
maps. It is in use by National Library of Scotland and a couple of other libraries.

Here’s an example map (you need to register first).

The uniqueness of Unlock is in openness of the license (primarily GeoNames.org CC-BY and also OS OpenData) and also so far very good availability of the online service (EDINA hardware and network?). We are missing the bounding box to be able to zoom our base maps to the correct area (determine the appropriate zoom level). Unlock API replaced Google Geocoder, which we can’t use, because we are displaying also non-google maps (such as Ordnance Survey OpenData) and we are potentially deriving data from the gazetteer database (the control points on the old maps), which is against Google TOS.

In the future we are keen to extend the gazetteer with alternative
historical toponyms
(which people can identify on georeferenced old
maps too), or participate on such work.

The other usage of Unlock API is:

As a metadata text analyzer, in a service such as our
http://geoparser.appspot.com/, where we automatically parse existing
library textual metadata to identify place names and locate the
described maps including automatic approximation of their spatial
coverage (by identifying map scale and physical size in the text and
doing a simple math on top of it). This service is in a prototype
phase only, we are using Yahoo Placemaker and I was testing Unlock Text API
with it too.

Here the huge advantage of Unlock would be primarily the possibility
to add custom gazetteers
(with Geonames as the default one), language detection (for example via Google Language API or otherwise) and also possibility to add into the workflow other tools, such as lemmatizator for particular language – the simplest available via hun/a/ispellu
database integration or via existing morphological rule-based software
such as:

The problem is that without returning the lemmatization of the text the geoparser is almost unusable in non-English languages – especially Slavic
one.

We are very glad for availability of your results and of the reliable
online services you provide. We can concentrate on the problems we
need to solve primarily (georeferencing, clipping, stitching and
presentation of old maps for later analysis) and use your results of
research as a component solving a problem we are touching and we have to practically solve somehow.”


Very glad that Petr wrote at such length about comprehensive use of Unlock. pushing the edges of what we are doing with the service.

We have some work in the pipeline adding bounding boxes for places worldwide by making Natural Earth Data searchable through Unlock Places. Natural Earth is a generalised dataset intended for use in cartography, but should also have quite a lot of re-use value for map search.

Connecting archives with linked geodata – Part II

This is part two of a blog starting with a presentation about the Chalice project and our aim to create a 1000-year place-name gazetteer, available as linked data, text-mined from volumes of the English Place Name Survey.

Something else i’ve been organising is a web service called Unlock; it offers a gazetteer search service that searches with, and returns, shapes rather than just points for place-names. It has its origins in a 2001 project called GeoCrossWalk, extracting shapes from MasterMap and other Ordnance Survey data sources and making them available under a research-only license in the UK, available to subscribers to EDINA’s Digimap service.

Now that so much open geodata is out there, Unlock now contains an open data place search service, indexing and interconnecting the different sources of shapes that match up to names. It has geonames and the OS Open Data sources in it, adding search of Natural Earth data in short order, looking at ways to enhance what others (Nominatim, LinkedGeoData) are already doing with search and re-use of OpenStreetmap data.

The gazetteer search service sits alongside a placename text mining service. However, the text mining service is tuned to contemporary text (American news sources), and a lot of that also has to do with data availability and sharing of models, sets of training data. The more interesting use cases are in archive mining, of semi-unusual, semi-structured sets of documents and records (parliamentary proceedings, or historical population reports, parish and council records). Anything that is recorded will yield data, *is* data, back to the earliest written records we have.


Place-names can provide a kind of universal key to interpreting the written record. Social organisation may change completely, but the land remembers, and place-names remain the same. Through the prism of place-names one can glimpse pre-history; not just what remains of those people wealthy enough to create *stuff* that lasted, but of everybody who otherwise vanished without trace.

The other reason I’m here at FOSS4G; to ask for help. We (the authors of the text mining tools at the Language Technology Group, colleagues at EDINA, smart funders at JISC) want to put together a proper open source distribution of the core components of our work, for others to customise, extend, and work with us on.

We could use advice – the Software Sustainability Institute is one place we are turning for advice on managing an open source release and, hopefully, community. OSS Watch supported us in structuring an open source business case.

Transition to a world that is open by default turns out to be more difficult than one would think. It’s hard to get many minds to look in the same direction at the same time. Maybe legacy problems, kludges either technical, or social, or even emotional, arise to mess things up when we try to act in the clear.

We could use practical advice on managing an open source release of our work to make it as self-sustaining as possible. In the short term; how best to structure a repository for collaboration, for branching and merging; where we should most usefully focus efforts at documentation; how to automate the process of testing to free up effort where it can be more creative; how to find the benefits in moving the process of working, from a closed to an open world.

The Chalice project has a sourceforge repository where we’ve been putting the code the EDINA team has been working on; this includes an evolution of Unlock’s web service API, and user interface / annotation code from Addressing History. We’re now working on the best way to synchronise work-in-progress with currently published, GPL-licensed components from LTG, more pieces of the pipeline making up the “Edinburgh geoparser” and other things…