finalProjectPost

We put together this long list of different things that happened during the Chalice project, for our last bi-weekly project meeting, on 28th April 2011. The final product post offers an introduction to Chalice.

Tangibles

These are pieces of work completed to form the project:

Corrected OCR for 5 EPNS volumes (*not* open licensed)
Quality assessment of the OCR
Extracted data in XML
Report on the text-mining and georeferencing process
RDF representation of extracted data, Open Database License
Searchable JSON API for the extracted data
Two prototype visualisations
Source code for the preceding 4 items
Two use case assessments
Supporting material for the use case assessments
Simple web service for alt-names for ADS
Sample Integration with GBHGIS data

Intangibles

These are less concrete but equally valuable side-effects of the project work:

A set of sameAs assertions for Cheshire names between geonames and Ordnance Survey 50K gazetteer to go to sameas.org
Historic place-name data to enhance geonames.org with, potentially.
Improvements to the Edinburgh Geoparser and the Unlock Text service
Pushed forward open source release of the Geoparser
Refactoring of the Unlock Places service
Discussions and potential alignment with other projects (SPQR, Pleiades, GBHGIS)
Discussions with other place-name surveys (SPNS – Wales?)

Talks / Dissemination

JISC Place-names workshop Sept 2011
FOSS4G Sept 2010
All Hands Meeting (poster) Sept 2011
WhereCamp UK, Nottingham, Nov 2010
Pelagios #jiscgeo workshop March 2011
Classical Association Conference April 2011
Linked Data workshop at GISRUK April 2011

This is our “final product post” as required by the #jiscexpo project guidelines. Image links somehow got broken, they are fixed now, please re-view.

Chalice – Past Places

Chalice is for anyone working with historic material – be that archives of records, objects, or ideas. Everything happens somewhere. We aimed to provide a historic place-name gazetteer covering a thousand years of history, linked to attestations in old texts and maps.

Place-name scholarship is fascinating; looking at names, a scholar can describe the lay of the land, see political developments. We would like to pursue further funding to work with the English Place-Name Survey on an expert-crowdsourced service consuming the other 80+ volumes and extracting the detailed information – etymology, field-names.

Linked to other archival sources, the place-name record has the potential to reveal connections between them, and in turn feed into deeper coverage in the place-name survey.

There is a Past Places browser to help illustrate the data and provide a Linked Data view of the data.

Stuart Dunn did a series of interviews and case studies with different archival sources, making suggestions for integration. The report on our use case for the Clergy of the Church of England Database may be foundÂ here; and that on our study of the Victoria County History isÂ here. We also have valuable discussions with the Archaeology Data Service, which were reported in a previous post.

Rather than a classical ‘user needs’ approach, targeting groups such as historians, linguists and indeed place-name scholars, it was decided to look in detail at other digitalÂ resources containing reference material. This allowed us to start considering various ways in which a digitized, linkable EPNS could be automatically related to such resources. The problems are not only the ones we anticipated, of usability and semantic crossover between the placename variants listed in EPNS and elsewhere; but also ones of dataÂ structure, domainÂ terminologyÂ and the relationship of secondary references acorss such corpora. We hope theseÂ considerationsÂ will help inform future development of placename digitization.

Project blog

This covers the work of the four partners in the project.

CeRch at KCL developed use cases through interviews with maintainers of different historic sources. There are blog descriptions of conversations with:

Victoria County History

Archaeology Data Service

Clergy of the Church of England Database

LTG did some visualisations for these use cases, and more seriously text mining the semi-structured text of different sample volumes of the English Place Name Survey.

First results from the “georesolution” process against extracted structured data

Later analysis of the coverage of geo-referenced places against contemporary different gazetteer sources

The extraction of corrected text from previously digitised pages was done by CDDA in Belfast. There is a blog report on the final quality of the work, however the full resulting text is not open licensed nor distributed through Chalice.

EDINA took care of project management and software development. We used the opportunity to try out a Scrum-style “sprint” way of working with a larger team.

Reflections on the first sprint and the second sprint.
Linked data: general overview of the concerns, on how we structure URIs, an initial sketch being put out for feedback and later a more final version (still with missing namespace elements)

TOC to project blog –here is an Atom feed of all the project blog posts and they should be categorised / describe project partners

Project tag: chaliced

Full project name: Connecting Historical Authorities with Links, Contexts and Entities

Short description: Creating and re-using a linked data historic gazetteer through text mining.

Longer description:Text mining volumes of the English Place Name Survey to produce a Linked Data historic gazetteer for areas of England, which can then be used to improve the quality of georeferencing other archives. The gazetteer is linked to other placename sources on the Linked Data web via geonames.org and Ordnance Survey Open Data. Intensive user engagement with archive projects that can benefit from the open data gazetteer and open source text mining tools.

Key deliverables: Open source tools for text mining archives; Linked open data gazetteer, searchable through JISC’s Unlock service; studies of further integration potential.

Lead Institution: University of Edinburgh

Person responsible for documentation: Jo Walsh

Project Team: EDINA: Jo Walsh (Project Manager), Joe Vernon (Software Developer), Jackie Clark (UI design), David Richmond (Infrastructure), CDDA: Paul Ell (WP1 Coordinator), Elaine Yates (Administration), David Hardy (Technician), Karleigh Kelso (Clerical), LTG: Claire Grover (Senior Researcher), Kate Byrne (Researcher), Richard Tobin (Researcher), CeRch: Stuart Dunn (WP3 Coordinator).

Project partners and roles: Centre for Data Digitisation and Analysis, Belfast – preparing digitised text, Centre for e-Research, Kings College London – user engagement and dissemination, Language Technology Group, School of Informatics, Edinburgh – text mining research and tools.

This is the Chalice project blog and you can follow an Atom feed of blog posts (there are more to come).

The code produced during the Chalice project is free software; it is available under the GNU Affero GPL v3 license. You can get the code from our project sourceforge repository. The text mining code is available from LTG – please contact Claire Grover for a distribution…

The Linked Data created by text mining volumes of the English Place Name Survey – mostly covering Cheshire – is available under the
Open Database License – a share-alike license for data by Open Data Commons.
.

The contents of this blog itself are available under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

CC-BY-SA

GNU Affero GPL v3 license.

Affero GPL v3

Link to technical instructional documentation

Project started: July 15th 2010
Project ended: April 30th 2011
Project budget: Â£68054

Chalice was supported by JISC as a project in its #jiscexpo programme. See its PIMS project management record for information about where responsibility fits in at JISC.

EDINA Blogs

A Blogs.edina.ac.uk weblog

Category Archives: finalProjectPost

List of outcomes of the Chalice project

Tangibles

Intangibles

Talks / Dissemination

Final Product Post: Chalice: past places and use cases

Chalice – Past Places

Project blog