Today we are liveblogging from our one day event looking at the use of geospatial data and tools in the cultural heritage domain, taking place at Maughan Library, part of Kings College London. Find out more on ourÂ eventbrite page:Â http://geocult.eventbrite.com/
If you are following the event online please add your comment to this post or use theÂ #geocultÂ hashtag.
This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.Â
Good morning! We are just checking in and having coffee here at the Weston Room of the Maughn Library but we’ll be updating this liveblog throughout the day – titles for the presentations are below and we’ll be filling in the blanks throughout the day.
Stuart Dunn, from Kings College London is just introducing us to the day and welcoming us to our venue – the beautiful Weston Room at the Maughn Library.
James Reid, from GECO is going through the housekeeping and also introducing the rational for today’s events. In 2011 JISC ran a geospatial programme and as part of that they funded the GECO project to engage the community and reach out to those who may not normally be focused on geo. Those projects, 11 of them in total, cover a huge range of topics and you can read more about them on the GECO blog (that’s here). We will be liveblogging, tweeting, sharing the slides, videoing etc. and these materials will be available on the website. Many of you I know will have directly or indirectly received from JISC for projects in the past so hopefully you will all be familiar with JISC and what they do.
And now on with the presentations…
Michael Charno, ADS,Â Grey Literature at the ADS
I’m an application developer for the Archeology Data Service and I’m going to talk a bit about what we do.
We are a digital archive based at the University of York. We were part of the Arts and Humanities Data Service but that has been de-funded so now we sit alone specialising on Archeology data. And we do this in various ways. This includes disseminating data through our website.
The kind of maps we use in Archeology, there’s a long tradition of using maps to describe locations of events, for places, for communities etc. We use GIS quite a bit for research in the discipline. We have a big catalogue of finds and events – we have facets of What, Where and When. Where is specifically of interest to us. We mainly use maps to locate points on a map to locate times, events, finds etc. We also have context maps – just show people the location in which an item was found. We also have a “clicky map” but actually this is just a linked image of a map to allow drill down into the data.
One step up from that we use a lot of web maps, some people call them web GIS. You can view different layers, you can drill down, you can explore the features etc. But this is basic functionality – controlling layer view, panning, zooming etc. With all of these we provide the data to download and use in desktop GIS – and most people use the data this way, primarily I think this is because of usability.
And more recently we’ve been looking to do more with web maps. But we haven’t seen high use of these, people still tend to download data for desktop GIS if they are using it for their research. We have done a foolblown web GIS for the Framework Stansted project – there was a desktop standalone ESRI version. But they wanted a web version and we therefore had to replicate lots of that functionality which was quite a challenge. But again we haven’t seen huge usage, people mainly use the data on their desktop applications. I think this is mainly because of the speed of using this much data over the web. But the functionality is web.
We have found that simplicity is key. But we think that Web GIS isn’t realistic. We aren’t event sure Web Mapping is that realistic. If people are really going to use this data they are going to want to do this on their own machines. We thought these tools would be great for those without an ESRI licence, but there are now lots of good open source and free to use GIS – Quantum in particular – so we increasingly discourage people NOT to give us money to great web GIS. Instead we’re looking at an approach ofÂ GeoServer spatial database in Oracle to disseminate this data.
Issues facing the ADS now is the long term preservation of data and mapping (ARCIMS is no longer supported by ESRI for instance); usability – we can upgrade these interfaces but making changes also changes the usability, can be frustrating for users; proprietary technology – concern is around potential lock in of data so we are moving to make sure our data is not logged in; licensing – this is a can of worms, talk to Stuart Jeffrey at the ADS if you want to know more about our conerns here; Data – actually we get a lot of poor quality or inconsistent data and that
ARENA project – a portal to search multiple datasets. This was a What Where When key terms. The what was fine – we used standard method here. When was challenging was OK. But Where was a bit of an issue, we used a Â box to select areas. We tried the same interface for the TAG – Transatlantic Archaeology Gateway service – but this interface really didn’t work for North America. So we wanted to be able to search via multiple boxes so we want to do this in the future
ArcheoTools – we wanted to analyse texts including grey literature. There was spatial information we could easily pull out and plot. Modern texts OK but older texts – such as those of the Society of Antiquaries of Scotland – were more challenging. The locations here include red herrings – references to similar areas etc. Â We partnered with the Computer Science Department at the University of Sheffield for the text mining. Using KT/AT extension and CDP matching we had about 85% matches on the grey literature. We also tried EDINA’s GeoCossWalk was even better accuracy – only 30 unresolved place names. I think we didn’t use the latter in the end because of disambiguation issues – a challenge in any work of this type. For instance when we look at our own data it’s hard to disambiguate Tower Hamlets from any Towers in any Hamlets…
Going back into our catalogue Arcsearch – you can drill through area sizes – we were able to put this grey literature into the system at the appropriate level. We also have new grey literature being added all the time, already marked up. So this lets us run a spatial search of grey literature in any area.
What we saw when we rolled out the ability to search grey literature by location – we saw a spike in the download in grey literature reports. Although Google was certainly trawling us and that will throw the figures. But definitely useful for our users too and a spike in their use as well.
Again looking at ArcSearch. One of the issues we have is the quality of the records. We have over 1 million records. We ingest new records from many suppliers – AH, county councils etc. and add those to our database. We actually ran a massive query over all of these records to build out own facet tree to explore records in more depth. We want to capture the information as added but also connect it to the correct county/parish/district layout appropriate.Â We also have historical counties – you can search for it but it can be confusing, for instance Avon doesn’t exist as a county anymore but you will find data for it.
The other issue we fine is that the specific coordinates can end up with points being plotted in the wrong county because the point is on the border.Â Another example was that we had a record with a coordinate for Devon but it had an extra “0″ and ended up plotted off the coast of Scotland!
I know that Stuart will be taloking about DEEP later which is great, we would love to have a service to resolve placenames for our future NLP so that we can handle historical placenames, spatial queries and historic boundaries. It would be nice to know we remain up to date/appropriate to date as boundaries change regularly.
The future direction that we are going in is WMS publishing and consumption. For instance we are doing this for the Heritage Gateway. Here I have an image of Milton Keynes – not sure if those dots around are errors or valid. We are putting WMS out there but not sure anyone’s ready to consume that yet. We also want to consume/ingest data via WMS to enrich our dataset, and to reshare that of course.
And finally we are embarking on a Linked Data project. We currently have data on excavations as Linked Data but we hope to do more with spatial entities and Linked Data and GeoSPARQL type queries. Not quite sure what we want to do with that because this is all new to us right now.
Find out more:
Q1: It seems like your user community is quite heterogenous – have you done any persona work on those users? And are there some users who are more nieve?
A1: We’ve just started to do this more seriously. Registration and analytics let us find out more. Most are academics, some are commercial entities but the largest group are academics. I think both groups are equally nieve actually.
Q2: Why ORacle?
A2: Well the University has a license for it. We would probably use PostGRES if we were selecting from scratch.
Claire Grover, University of Edinburgh,Â Trading Consequences
This is a new project funded under the Digging Into Data programme. Partners in this are the University of Edinburgh Informatics Department, EDINA, York University in Canada and University of St Andrews.
The basic idea is to look at the 19th century trading period and commodity trading at that time, specifically for economic and environmental historical research. They are interested in investigating that increase in trade at this time and the hope is to help researchers in this work, to discover novel patterns and explore new hypothesis.
So if we look at a typical map a historian would be interested in drawing. So if we look at Cinchona, it is the plant from which Quinine derives and it grows in South America but they began to grow it in India to meet demand at the time. Similarly we can look at another historians map of the global supply routes of West Ham factories. So we want to enable this sort of exploration across a much larger set of data than the researchers could look at themselves.
We are using a variety of data sources, with a focus on Canadian natural resource flows to test reliability and efficacy of our approach and using digitised documents around trading within the British Empire. We will be text mining these and we will populate a georeferenced database hosted by EDINA, and with St Andrews building the interface.
Text mining wise we will be using the Edinburgh GeoParser which we have developed with EDINA and which are also used in the Unlock Text service. It conducts Named entity recognition – place names and other entities – and we will be adding commodities for Trading Consequences – and then there is a Gazetter look up using Unlock, Geonames, and Pleides+ which has been developed as part of the PELAGIOS project. The final stage is georesolution which selects the most likely interpretation of place names in context.
So to give you some visuals here is some text from Wikipedia on the Battle of Borosa (a random example) as run through the Edinburgh GeoParser. You can see the named entity recognition output colour coded here. And we can also look at the geo output – both the point it has determined to be most accurate and the other possible candidates.
So what exactly are we digging for in Trading Consequences? Well we want to find instances of where text of trade-related relationships between commodity entities, location entities, and date entities – what was imported/exported from where and when. Ideally we also want things like organisations, quantities and sums of money as part of this. And ultimately the historians are keen to find information on environmental impact of that trade as well.
Our sources are OCR textual data from digitised datasets. We are taking pretty much anything relevant but our primary data sets are the House of Commons Parliamentary Papers, Canadiana.org and the Foreign and Commonwealth office records at JTOR. Our research partners are also identifying key sources for inclusion.
So next I am going to show you some very very early work from this project. So we’ve down some initial explorations of two kinds of data using our existing text mining toolset – primarily for commodity terms to assist in the creation of ontological resources – we want to build a commodity ontology. And we’ve also looked at sample texts from our three main datasets. So we we have started with WordNet as a basic commity ontology to use as a starting point. So in this image we have locations marked up in purple, commodities in green. We’ve run this on some Canadiana data and also on HCCP as well.
So from our limited starting sample we can see the most frequent location-commodity pairs. The locations look plausible on the whole. The commodities look OK but “Queen” appears there – she’s obviously not a commodity. Similarly “possum” and “air” but that gives you a sense of what we are doing and the issues we are hoping to solve.
The issues and challenges here: we want to transform historias’ understanding but our choie of sources may be biased just by what we include and what is available. The text mining won’t be completely accurate – will there be enough redundancy in the data to balance this? And we have specific text mining isues: loq level text quality issues, isolating referencing issues, French language issues etc. And we have some georeferencing issues.
So looking at a sample of data from Canadiana we can see the OCR quality challenges – we can deal with consistent issues – ‘”f” standing in for “ss” for instance – but can’t fix gobbledegook. And tables can be a real nightmare in OCR so issues there.
Georeferencing wise we will be using GeoNames as a gazeteer as it’s global but some place names or their spellings have changes – is there an alternative? We also have to segment texts into appropriate units – some data is provided as one enormous OCR text, some is page by page. Georesolution assumes each text is a coherant hole and each place name contributes to the disambiguation context for all of the others. And the other issue we have is the heuristics of geoparsing. For modern texts population information can be useful for disambiguation. But that could work quite badly/misleadingly if applying this to 19th Century texts – we need to think about that. And we also need to think about coastal/port records perhaps being weighted more highly than inland ones – but how do you know that a place is/was a port. We’ve gone someway towards that as James has located a list of historical ports with georeferences but we need to load that in to see how that works as part of the heuristics.
Humphrey Southall, University of Portsmouth,Â OldMapsonline.org
I wanted to do something a big controversial. So firstly how many of us have a background in academic discipline of geography? [it’s about five of those in the room]. A lot of what’s going on is actually about place, about human geography. I think GIS training warps the mind so I wanted to suggest this issue of Space vs. Place.
There is a growing movement towards using maps both for resource discovery and visualisation. But it does lead to inappropriate use of off-the-shelf GIS solutions. There are 3 big problems: Map based interfaces are almost entirely impenitrable to search engines but they are how most people use information and discover things – the interface is a barrier, but that doesn’t mean scrapping them; mapping can force us into unjustificable certainty about historical locations; and this isn’t actually how most people think about the world – people are confused by maps, they can handle textual meaning of place.
So, looking at locational uncertainty in the past. Cultural Heritage information does not include co-ordinates. They have geographical names. Even old maps are highly problematic as a source of coordinates. But converting toponyms to coordinates gets more problematic as we move back in time. 19th and 20th century parishes have well-defined boundaries that are well-mapped – but still expensive to computerise, my Old Maps project has just spent Â£1million doing this. Early modern parishes had clear boundaires but few maps so we may know only the location of he admin center, earlier than that and things become much more fuzzy.
If we look a county records th
Geographical imprecision in the 1801 census – it’s a muddle, it’s full of footnotes.
Geo-spatial versus Geo-semantic approaches. GIS/Geo-spatial approaches privilege coordinated is all about treating everything as attributes of coordinate data. By comparison Geo-semantic approaches, descriptive of place, seem
Examples of sites with inappropriate use of geo-spatial technology: Scotland’s Places: has a search box for coordinates – who on earth does that! But you can enter placenames. Immediately we get problems, 6 matches and first 2 are for small Glasgow – city only, and then there are 4 for Glasgow as a wider area. this is confusing for the user, which do we pick? Once we pick the city we get a list of parishes which is confusing too, and we encounter an enormous results set, and most of what we get isn’t information about Glasgow but for specific features near Glasgow. This is because at the heart this system has no sense of place – it just finds items geolocated near features of Glasgow. I could show the same for plenty of other websites.
For an example of an appropriate sense of space – HistoryPin, who are speaking later, as images have an inherent sense of location. Another example is Old Maps Online.
Geo-semantics – geography as represented as a formal set of words. This is about expressing geographic traits formally – IsNear, IsWithin, IsAdministrativelyPartOf, Adjoins. Clearly GIS can express some of these relationships more fully Â – but only sometimes and assuming we have the information we need there.
One problem we had on the Vision of Britain project was how to digitise this material. We really had to deliver to the National Archives. Frederick Youngs’ Guide to the Local Administrative Units of England – no maps, no coordinates, two volumes – is a fantastic source of geographical information. This is used in Old Maps Online. There is a complex relationship. Using visualisation software on the structure we built from Youngs you can find out huge amounts about that place. One point to note is that this is not simply one academic project. I’ve shown you some of the data structure of the project but it’s not about just one website. But we do have huge amounts of traffic – up at 140k ish unique users a month. So lets do a search for a village in Britain – suggestion from the crowd is “Flushing” apparently… Google brings back Vision of Britain near the top of the list for any search of “History of… ” for any village in Britain. I’m aware of very few cultural heritage sector websites that do this. We did this partly by having a very clear very semantically structured information behind the site there and available for crawling. We will be relaunching the site with some geospatial aspects added, but we also want to make our geosemantic information more available for searchers. We use a simple GeoParser service, mainly for OldMapsOnline and the British Library. We will be making that public. And we rank that based on frequency of place name, a very different approach to that outlines.
Q1) I suspect that the reason Flushing didn’t get you to the top of the list is because the word has another meaning. What happens with somewhere like Oxford where there are many places with the same name?
A1) Well it’s why I usually include a county in the search – also likely to help with Oxford but of course for bigger places we have much more competition in Google. I think the trick here is words – Vision of Britain includesÂ 10 million words of text.
Q2) Is this data available as an API? Or are all maps rastorised?
A2) Most of our boundaries are from UK Borders free facility for UK HE/FE. We have historic information. In terms of API we are looking at this. JISC have been funding us reasonably well but I’m not entirely happy with the types of projects that they choose to fund. We have put that simple GeoCoder live as we needed it. Some sort of reverse geocoder wasn’t too hard.
James: we support an internal WFS of all of the UK Borders data and data from Humphrey
Comment: We’ve used OS data from EDINA for our data. I was hoping there was something like that we could use over the web
James: I think it’s very much about licencing in terms of the OS data, for Humphrey’s data it’s up to him.
Humphrey: We haven’t been funded as a service but as a series of digitisation projects and similar, we make our money through advertising and it’s unclear to me how you make money through advertising for a Web service.
Stuart Nicol, University of Edinburgh, Visualising Urban Geographies
I’m going to be talking about the Visualising Urban Geographies project which was a collaborative project between the University of Edinburgh and the National Library of Scotland funded by the AHRC.
The purpose of the project was to create a set of geo-referenced historical maps of Edinburgh for student learning purposes, to reach a broader public through the NLS website, to develop tools for working with abd visualising research on maps and to trial a number of tools and technologies that could be used in the future.
The outputs were 25 georeferenced maps of Edinburgh from 1765-1950 (as WMS, TMS and downloadable JPG, JGW) as well as a suite of digitised boundary polygons (ShapeFiles and KML), We have used various individual maps as exemplars to see what might be possible – 3D boundaries etc. We also documented our workflows. and finally we created a series of web tools around this data.
The web tools are about quick wins for non GIS specialist – ways to find patterns and ideas to build on, not mission critical systems. To do this quickly and easily we inevitably have a heavy reliance on Google. A note on Address based history – researchers typically gather a lot of geographic data as addresses, as text. And it can be hard to visualise that data geographically so anything that helps here is useful.
So looking at our website – this is built on XMaps with Google Maps API and tile map service for historic maps. You can view/turn on/off various layers, you can access a variety of tools and basemaps. This includes usual Gooogle Map layers, also the Microsoft Virtual Earth resources as well as OpenStreetMap. So you can view any of these maps over any of these layers. You can also add user generated data for this – you just need xml or kml or rss link to use in the tool. The Google Street View data can be very useful as many buildings in Edinburgh are still there. We have a toolbox that lets you access a variety of tools to use various aspects of the map, again just using the Google Address API. We use the Elevation API to get a sense of altitude. We’ve also been looking at the AddressingHistory API – geocoding historical addresses. So here I’m looking in the 1865 directory for bakers. And I can plot those on the map.
One of the main tools we wanted to provide was a geocode tool for their research. Our researchers have this long list of addresses from different sources. So they simply copy from their spreadsheet into the input field in our tool, the API will look for locations, and you get a list and also get a rough plot for those addresses. Â And we’ve built in the ability to customise that interface. This uses Google Spreadsheets and your own account. So you can create your own sets of maps. To edit the map we have the same kind of interface on the web. You can also save information back to your own Google account. And we also have an Add NLS data facility – using already digitised and georeferenced maps from the NLS collections.
You can publish this data via the spreadsheets interface and that gives you a URL that you can share which takes you to the tool.
So we went to a very lightweight mashup idea. We use Google Maps, Geocoding, Elevation, Visualisation, Docs & Spreadsheets, Yahoo geocoding, NLS Historic Mapping, AddressingHistory as our APIs – a real range combined here.
But there are some issues around sustainability and licensing here. We use Google Maps API V2 and that’s being depreciated. What are the issues related to batch geocoding rom Google? Google did stop BatchGeo.com from sharing batch coded data as it broke third party terms so that’s a concern. There is a real lack of control over changes to APIs – the customise option broke a while ago because the Google Spreadsheet API changed. It was easy to fix but it took a while to be reported, you don’t get notified. Should we use HTTP or API? Some of the maps we use are sitting on a plain HTTP server – that means anyone can access it, speed can be variable if heavily used. The NLS have an API which forces correct attribution but that would take a lot of work to put in place. And also TMS of WMS? We have used TMS but we know that WMS is more flexible, more compliant.
And we face issues around resources and skills. We can forget that we have benefitted from our partnership with NLS with access to their collection, skills, infrastructure and all those maps. One of our more ambitious aims was that our own workflow might help other researchers do the same thing in other locations. But this isn’t a easy as hoped. We have a colleague in Liverpool, and a colleague in Leicester both using the tools but both constrained by access to historical maps in usable formats. And they don’t have skills to deal with that themselves. Who should be taking the lead here? National libraries? Researchers?
In terms of what we have learned in the project we have found it useful to engage with the Google tools and APIs as it allowed us to build functional tools very quickly but aware that there are big drawbacks here and limitations. But we have successfull engaged researchers and the wider community – local history groups, secondary schools, local history groups etc.
Jamie McLauglin, University of Sheffield,Â Locating Londons Past
Locating London’s Past was a six month JISC project taking a 1746 map and georeferencing it and visualising data from textual sources and data from the period on this map. We also ran a gazeteer derived from the 1746 map, and it was also rectorised for us as well so you can view all the street networks etc. Our data sources contained textual descriptions of places and we regularised these for spellings and compound names. And then these were georeferenced to show on our map.
What’s interesting exploring the data is to search, say, or Murders – Drury Lane has a lot which is perhaps not surprising. But murders
We used Google Maps as it was so well known, it seemed like the default choice. We didn’t think too deeply about that. It does do polygons and custom markers. And it does let you do basic GIS – you can measure distance, polygons etc. And it’s well known as the Google conventions. Like the previous presention this was a “light weight mash ups” approach. What can’t be underestimated is the usefulness of the user community – huge group to ask if you have a question. The major downside of course is the usage limit – 25k uploads a day for free, after that you have to pay. These new terms came in just at the end of the project. It’s a reasonable thing and you have to have 90 days at that level so spikes are OK. But it’s expensive if you go over your limit:Â $4 for additional 1000 loads. At really high levels it’s $8 for additional 1000 loads. there is a very vague/sketchy educational programme which we’d hope we’d quality for.
The three big lessons from us: Keep it simple – we tried to do too much, too many data sets for the time, when the design was kept simple it was successful; garbage in = garbage out – geocoding isn’t magical! They are much much stupider than a human no matter how good they are; Use open platforms – the API terms are worrying, we should have used Open solutions.
James: Perhaps the Google bubble has burst – even FourSquare has moved to other mapping. APIs can be changed whenever the provider likes. And I should add that EDINA runs an open web service, OpenStream, that will let you access contemporary mapping information.
Ashley Dhanani and David Jeevendrampillai,UCL,â€œClassifying historical business directory data: issues of translation between geographical and ethnographic contributions to a community PPGIS projectâ€�
We are trying to focus on the the place of suburbs and the link between suburbs and socio economic change. Why are suburbs important? Well around 84% of British people live in suburbs, we’ve seen the London Mayoral election focusing on suburbs and the Queen’s spending some of her jubilee in the suburbs.
We see small relationships, small changes in functionality etc. in suburbs that can easily be missed. We will talk about material cultural heritage – shapes of houses, directions of roads, paths and routes taken etc. We will relate the very material heritage to socio economic use of buildings/places over time. And we look at meaning – what does that mean socially – to use the post offices at different times in the last 200 years perhaps.
We wanted to do various analyses here. a network analysis to consider the accessibility of particular spaces. And the changes in how people live in these spaces. So if we look at Kingston in a rather manual mapping process looking at network structure. Here we can see in 1875 what is the core area, what was it like to be in these spaces? Again we can see change over time. And we can see the relationality to the rest of the city. This is just part of the picture of these places through time. So from a material perspective we can see how the buildings change – from large semi detached houses to small terraced rows for instance. So we want to bring these information together and analyse them. So here we need to turn these historic structures into something more than a picture, to be able to look at our
We are using software – cheap for academic use – that allows you to batch proof TIFF files and do 80-90% of the work on a good underlying map. You can then really start doing statistics and exploring the questions etc. You can basically make MasterMap for historic periods!
Back to David. We also wanted to relate these networks, roads and buildings to the actual use, what was going on in these buildings at the time. So we just talk the Business Directory Information and georeferenced it to provide points on the map. We need to categorise the types of use in the business information. So we get these rather Damien Hirst style pictures – coloured dots on the road. We had a bit of a debate, me being an anthropologist, of probloematising those categorisations… what is a Post Office? Is it a Financial Service? Is it a Depot? Is it Retail? Is it a Community Service? And the answer obviously is what do you want to get from this data, why are you looking at it in the first place.
So we wanted to know what these elements of the build environment meant. What does a relocated post office mean socially? We wanted to add another layer of information. Archives, memories, photos etc. We are taking the archive and making it digital. But I want to talk a bit about limitations here. Trying to understand a place through point information, looking at a top down map doesn’t include that ephemeral information – the smell of a building perhaps. What we’re doing in this project is bringing in lots of academics from different disciplines and you get very different use of the same data sources. What we’ve found, the gaps that we’ve found between understandings of the data have been very productive in terms of understanding our data, place, and what place means for policy based outcomes. And rather than come to a coherant sense of place, actually the gaps, the debates are very productive in themselves. We are one year in – we have 5 years funding in total – but those gaps have been the most interesting stuff so far.
And this kicking up of dust in the archives has only happened since we’ve been able to turn materials into digital form – they can be digitised, layered up, used together. Whilst this is very productive we will have gaps and slippages of categorisation and highlight our ways of understanding what goes on in place.
Q1) What software did you use for this project?
A1) RX Spotlight [not sure I’ve got that down right – comment below to correct!]
Q2) Interesting to hear about the issues with Google Maps – are any of the Open Source, truly free services, better with mobile?
A2) There is an expectation on mobile phones – there’s a project we’re working on with LSE on the Charles Booth property maps – which is hampered by the available zoom levels. There are workarounds, other data providers are part of this option. You have CloudMade based on OpenStreetMap data. We have OpenStream for HE projects.
Humphrey: We planned to use Google geocoder for Old Maps Online but they changed the terms and we expected high usage. We went for OpenStreetMap as truly free, but it’s problematic. And so we have implemented our own API from VisionOfBritain. We do use Google Basic and again we are concerned about going over our limits. Using a geocoder does let you mark up data for use with other maps. But if you are using linked data and identifyers and it was Google or similar providing that it would be very concerning.
James: Especially with mobile phones there is a presumption of very large scale. We were involved in the Walking Through Time project and the community wanted Google – the zoom levels killed it. There are issues around technical implementations. Think large scale for mobile. I do know that Google have been thinking of georeferencing as context for other information. Place is something else but implies some geography.
Comment: Leaflet works well on mobile.
James: We will come back to this later – discussing what we are using, what we need, etc.
And now for lunch… we’ll be back soon!
And we’re back…
Chris Fleet, National Library of Scotland, Developments at the NLS
I’m going to be talking about our historic mapping API which we launched about 2 years ago. This project was very much the brain child of Petr Pridal who now has this company Klokan Technologies. The API is very much a web mapping service.
So to start with let me tell you a bit more about the National Library of Scotland. We aim to make our collections available and with maps most of our collection is Scottish but we also have international maps in the collection. There areÂ 46k maps as ungeoreferenced images with zoomable viewer. The geo website offers access via georeference search methods. We’ve been a fairly low budget organisation so we’ve been involved in lots of joint projects to fund digitisation. And there is even less funding for georeferencing so we have joined up with specific projects to enable this. For instance we have digitised and georeferenced theÂ Roy Military survey map of late 18th century, town plans of Ordnance Survey, aerial photographs of the 1940s, and Bartholomew mapping – we are fortunate to have a very large collection of these.Â And we’ve been involved in various mashup projects including providing maps for the Gazeteer project for Scotland.
So in early 2000 Petr had this idea about providing a web mapping service. There were several maps already georeferenced – 1:1 million of the UK from 1933 and we had several other maps at greater detail from similar areas.Â Although we use open source GIS and Cube GIS we have found that ArcGIS is much easier for georeferencing, adding lots of control points, and dynamic visualising of georeferenced maps.Â We used Petrs MapTiler (this has now been completely rewritten in C++ and is available commercially and runs much faster) and TileServer. These tools allow you to provide coordinates that allow you to spherisize your map for use with tools like Google Maps or Bing.
We launched in May 2010 with examples for how to use the maps in other places and contexts. We put the maps out under Creative Commons Attribution license – more liberal than the NLS normally licences content.
Usage to date took a while to take off, most of our users are from a UK domain – unlike most of our maps collection – and most of our use has been in the last year or so. I’ve divided usage into several categories – recreation, local history, rail history, education etc.
Bill Chadwick run the Where’s the Path website and they use a lot of data – they display our historic maps and other users used the link through the site for other big websites and there’s where lots of the hits have come from. A lot of our phone use has been for leisure – with the maps as a layer in another tool for instance.
Looking at how our maps have been used the variety has been enormous – leisure walkers, cyclers, off-road driving, geocaching as well! We also have lots of photographers using our maps. And metal detecting – I had underestimated just how big a users they would be, including the Portable Antiquities Scheme website. And there are many family history users of these maps – for instance the Borders Family History society links to resources for each county in Scotland. There is also the area of specialist history: SecretWikiScotland – security and military sites; airfield information exchange; Windmill World; steamtrain history sites etc. And another specialist area: SABRE – the group for road history, if you’ve ever wondered about the history of the B347 say, they are the group for you. They have a nice web map service to ingest multiple maps including our maps API. And finally Stravaiging is to meandor and you’ll find our maps there too.
Education was quite a small user of our maps. EDINA and others already cater to this group. But there was a site called Juicy Geography aimed at secondary school children that uses them. And the Carmichael Watson project, based at Edinburgh University, shows georeferenced transcripts against our historic maps.
We know OpenStreetMap has been using out maps though they don’t show up in our usage data.Through them we’ve connected to a developer in Ireland. This is one of those examples where sharing resources and expertise has been useful for our own benefit and that of the OpenStreetMap Ireland coverage.
The NLS is also now using Geo MappinG Service and GeoReferencing, etc. And we now have a mosaic viewer for these maps.Â Through the API and other work we’ve been able to develop a lot of map, including 10 inch to a mile series for the UK. And we are working on the 1:25k maps. We hope to add these to our API in due course.
In terms of sustainability the NLS has and continues to support the API. We are looking at usage logging for large/commercial users – some users are huge consumers so perhaps we can licence these types of use. Ads perhaps?
Top Tips? Well firstly don’t underestimate how large and diverse the “geo” community is. Second don’t overestinate the technical competance of the community – it is very variable. And finally don’t underestimate the time required the administer and sustain the application properly – we could have worked much harder to get attention through blogs, tweets, etc. but it requires more serious time than we’ve had.
Q1) One of your biggest users are outside recreation – why using historic mapping?
A1) I think generally they are using both, using historic maps as an option. But there could be something cleverer going on to avoid API limitations. If you are interested in walking or cycling you can get more from the historic maps from 60 years ago than from modern maps.
Rebekkah Abraham, We Are What We Do,Â HistoryPin
I am the content manager for HistoryPin. HistoryPin, as I’m sure you will be aware is to let people to add materials to the map. It was developed by We Are What We Do and we specialise on projects that have real positive social impact. The driver was the growing gap between different generations. Photographs can be magical for understanding and communicating between generations. A photograph is also a piece of recorded history, rich in stories and photographs – they belong to Â a particular place at a particular time. If you then add time you create really interesting layers and perspectives of the past. And you can add the present as an additional layer – allowing compelling comparisons of the past and the present.
So historypin.com is the hub for a set of tools for sharing historical content in interesting ways and engage people with it. It’s based on Google Maps, you can search by place and explore by time. You can add stories, material, appropriate copyright information etc. and the site is global. We have around 80k pieces of content and are working with various archives such as UK National Archives, National Heritage etc. And we are also starting to archive the present as well.
Photographs can be combined with audio and video – you can pin in events, audio recordings, oral history, etc. We’re also thinking about documents, text, etc. and how this can be added to records. You can also curate, you can create talks through materials and tour others through. And here you can see the mapping and timeline tools can be very nice here. Again you can include audio as well as images and video.
We also have a smartphone app for iPhone, Android and Windows and that lets you go into the streetview to engage with history, you can add images and memories to a place you currently are. And you can fade between present camera view and historic photographs, and you can choose to capture a modern version of that area – great if an area lacks street view but you are also archiving the present as well.
At the end of the March we will launch a project called HistoryPin Channels – this will let you customise your profile much more, to create collections and tools, another way to explore the materials. Â And to see stories on your content. This will also work with the smartphone app and be embeddable on your own website.
And we want to open HistoryPin to the crowd, to add tags, correct location, etc. so that people can enhance HistoryPin. You could have challenges and mysteries – to identify people in an image, find a building etc. Ideas to start conversations. A few big questions for us: how do you deal with objects from multiple places and multiple times; and how do you deal with precision
Pinning Reading’s History – we partnered with Reading Museum to create a hub and an exhibition to engage the local community. Over 4000 items were pinned, we had champions out engaging people with HistoryPin. The value is really about people coming together in small meaningful ways.
Q1) We’ve been discussing today that a lot of us work with Google APIs but don’t communicate with them. I understand that HistoryPin have a more direct relationship
A1) Google gave us some initial seed funding and technical support, everything else is ownd and developed with We Are What We Do.
Q2) Who does uploaded content belong to?
A2) That’s up to the contributors – they select the licence at upload so owneship remains theirs.
Q3) Will HistoryPin Channels be free?
A3) Yes. Everything around HistoryPin will be free to use. We are committed to being not for profit.
Q4) Have you don’t any evaluation on how this works as a community tool/social impact
A4) Yes, there will be a full evaluation of the Reading work on the website in the next few weeks but initial information suggests there have been lasting relationships out of the HistoryPin hub work.
Stuart Macdonald, University of Edinburgh,Â AddressingHistory
This project came out of a community content strand of a UK Digitisation programme funded by JISC. The project was done in partnership with the National Library of Scotland and with advice from the University of Edinburgh Social History Department and Edinburgh City Council’s Capital Collections. This was initially a 6 month project.
The idea was to create an online crowdsourcing tool which will combine data from historical Scottish Post Office Directories (PODs) with contemporaneous maps. These PODs are the precursors to phone directories/Yellow Pages. They offer fine-grained spatial and temporal view on social, economic and demographic circumstances. They provide residential names, occupations and addresses. They have several sub directories – we deal with the General Directory in our project. There are also some great adverts – some fabulous social history resources.
Phase 1 of this work focused on 3 volumes for Edinburgh (1784-5, 1865, 1905-6) and historic Scottish maps geo referenced by the NLS. W
The tool was built with OpenLayers as web-based mapping client and it allows you to move a map pin on the historical map to correct/add a georeference for entries. Data is held in PostGres database and uses the Google georeferencer to find the location of points on the map.
The tool had to be usable for users of various types – though we mainly aim at local historians etc. We wanted a mechanism to check user generated content such as georeferences, name or address editsannotations. And it was deemed that it would be useful to have the original scanned directory page. Amplification of both tool and API via Social Media channels – blog, Twitter, Flickr etc.
So seeing a screenshot here of the tool you can see the results, the historic map overlay options, the editing options, the link to view the original scanned page and three download options – text, KML,
Phase 2 sought to develop functionality and to build sustainability by broadening geographic and temporal coverage. This phase took place from Feb-Sept 2011. We have been adding new content or Aberdeen, Glasgow, Edinburgh all for 1881 and 1891 – those are census years and that’s no coincidence. But much of phase 2 was concerned with improving the parser and improving performance. Our new parser has a far improved success rate. Additional features added in phase 2: spatial searching via a bounding box; associate map pin with search results; search across multiple addresses; and we are aiding searching by applying Standard Industrial Classifications (SIC) to professions.
We have also recently launched an Augmented Reality access via the Layer phone app. This allows you to compare your current location with AddressingHistory records – people, professions etc – from the past. This is initially launched for Edinburgh but we hope to also launch for Aberdeen and Glasgow as well as other cities as appropriate. You can view the points on a live camera feed, or view a map. Right now you can’t edit the locations yet but we’re looking at how that could be done. You can also search/refine for particular trade categories.
Lessons learned. I mentioned earlier that this sort of project is like GalaxyZoo have 60k galaxies, we only have 500k people in Edinburgh. That means we’ve really begun thinking carefully about what content has interest to our potential “crowd” and the importance of covering multiple geographic locations/cities. In this phase we have been separating the parsing from interface and back end storage – this allows changes to be implemented without effecting the live tool. We’ve been externalising the configuration files – editable XML-based files to accomodate repeated OCR and content inconsistencies, run with the POD parser to refine parsed content. Persing and refining process is almost unending – a realistic balance needed to be struck between what should be done by machine in advance. And we need to continue to consult with others interested in this era, and using the PODs already.
In terms of sustainability the tool is ioenly available. There are some business models we’ve been considering: revenue generation via online donations, subscription models, freemium possibilities, academic advertising. We welcome your suggestions.
Phase 2 goes live very soon.
Success of these projects is about getting traction with the community – continued and extended use by that community. Hopefully adding new content will really help us gain that traction.
James: It’s worth saying that before the project we looked at the usage of the physical PODs – they are amongst the most used resources in the city libraries, this stuff is being used for research purposes which was one of our driving motivations.
Q1) Presumably you have genealogists are using this – what feedback have you had?
A1) I think population and having multiple years – to track people through time. We had really good feedback but usage has been modest so far.
Nicola) Genealogists want a particular area at a particular time and that’s when you capture their interest. It’s quite tricky because that’s the one thing they are interested in and all that material is available potentially but you need their engagement to be worth the labour intensive process of adding new directories, but they want their patch before they engage, so there is a balance to be struck there.
And with that we are onto the next session – we are going to grab a coffee etc. and then join a wee breakout session. I’ll report back from their key issues but won’t be live blogging the full discussions.
1. GAP Analysis
- UseÂ google geo products if you must but beware
- Think twice about geo referencing
- There are other geocoding tools
- There are text parsing tools
2. Mobile futures
- Do I want to go native or not? Theres a JISC report from EDINA on mobile apps and another set of guidance coming out soon.
Kate Jones, University of Portsmouth,Â Stepping Into Time
I am a lecturer in Human Geography at Portsmouth but I did my PhD at UCL working on health and GIS. But I’m going to talk today about data on bomb damage in London and how that can be explored and clustered with other data to make a really rich experience.
And I want to talk to you first about users, and the importance of making user friendly mapping experiences as that’s another part of my research.
I’m only two months into this project but it’s already been an interesting winding path. When you start a geography degree you learn “Almost everything that happens, happens somewhere and knowing where something happens is critically important” (Longley et al 2010). So this project is about turning data into something useful, creating information that can be linked to other information and can become knowledge.
For user centred design you start by designing a user story. So we have Megan, a student of history, and Mark, a geography undergraduate, or Matthew, an urban design post-graduate. For each user we can identify the tools they will be familiar with – they will know their own softwares etc. But they all use Google, Bing, Web 2.0 type technology. Many of them have smartphones. Many have social networking accounts. I was really surprised that this generation would be really IT literate – they are fine with Facebook but really quite intimidated by desktop GIS. Important to have appropriate expectations of what knowledge they have and what they want to do. This group also learn best with practical problems to solve, and they love visual materials. And they can find traditional lectures Â quite boring.
There are challenges faced by the user:
(1) determining available data – how do we make sure we only do one thing once, rather than replicating effort
(2) understanding the technology, concepts and methods required to process and integrate data
(3) implementing the technical solutions – some solutions are very intimidating if you are not a developer. I used an urban design student on a previous usability project – he downloaded the data from Digimap but couldn’t deal with even opening the data in a GIS, eventually did it in Photoshop which he knew how to use, and hand colouring maps etc.
So we want to link different types of data related to London during the Blitz. It’s aimed at students, researchers and citizen researchers – any non commercial use. We want to develop web and mobile tools so that you can explore and discover where bombs fell and the damage caused – and the sorts of documents and images linked to those locations. For the first time this data will be available in spatially referenced form, allowing new interpretations of the data.
We will be creating digital maps of the bomb census – the National Archive is scanning these and we will make these spatially references. We will add spatial data for different boundaries – street/administrative boundaries etc. And then exploring linkage to spatially referenced images. Creating a web mapping application for a more enriched and real sense of the era.
So what data to use? Well I’m a geographer not a historian but my colleague on this project at the National Archive pulled out all of the appropriate mapping materials, photographs etc. It’s quite overwhelming. We will address this data through two types of maps:
1) Aggregate maps of Nightly Bomb Drops during Blitz
2) Weekly records – there are over 500 maps for region 5 (central london), so we are going to look at the first week of the Blitz and look at 9 maps of region 5.
So here is a map of the bomb locations – each black mark on the map is a bomb – when there are a lot it can be hard to see exactly where the bomb landed. We will be colour coding the maps to show the day of the week the bomb felt and will show whether it’s a parachute or an oil bomb, drawn from other areas of the archive.
The project has six workpackages and the one that continues across the full prokect is understanding and engaging users – if you want to be part of this usability work do let me know.
We have been doing wireframes of the interface using a free tool called Pencil. We will use an HTML prototype with users to see what will work best.
So our expected project outcome is that we will have created georeferenced bomb maps – a digital record of national importance. This data will be shared with the National Archives – reducing the use of the original fragile maps and aid their preservation. We are also opening up the maps so that we remove the specialist skills to prepare and process data – only need to do one thing once. We’ll be sharing the maps through ShareGeo. And there will then be some research out of these maps – opportunities to look at patterns and compare data to social information etc.
Learning points to date will hopefully be useful for other
Before the National Archives I had a different project partner who pulled out of the project as they were not happy with the licenceing arrnagements etc – I’ve blogged suggestions on how to avoid that in the future: http://blitzbomcensusmaps.wordpress.com/2012/02/09/.
Scanning and Digitising Delayes – because lots of JISC projects were requesting jobs from the same archive! But I negotiated 2 scans to use as sample data for all other work, final data can then be slotted in when scanned in June. Something to bear in mind in digitisation projects, especially where more than one project in the same stream with the same archive/partner.
Summary: Linking historic data using the power of location. If you are interested in being part of our user group – please contact me via the blog or as @spatialK8
Natalie Pollecutt and Deborah Leem, Wellcome Library,Â Putting Medical Officer of Health reports on the map:Â MOH Reports for London 1848-1972
Nathalie: This is a new project. I heard about a tool called mapalist as a tool, I ended up using Google Fusion Tables – it was free, easy to use, lots of support information, and felt easy to use. I started off by doing a few experiments with the Google Fusion Table. So this first map is showing registered users, then I tried it out with photography requests to the library – tracking orders and invoice payments. So I showed this off around the office and someone suggested our Medical Officer of Health Reports as something that we should try mapping.
These Reports are discreet – 3000 in total – but they are a great historical record. Clicking on a point brings back the place, the subjects, and a link to view the catalogue record – you can order the original from there.
Deborah: The reports are the key source on public health from mid 19th to mid 20th century. They were produced by Medical Officers of Health in each local authority who produced annual reports. Covering outbreaks of disease, sanitation, etc. Lots of ice cream issues at one point in the 19th century – much concern of health of friends due to poor quality ice cream. They vary in length but the longest are around 350 pages.
Nathalie: On our shelves these are very much inaccessible bundles of papers. I wanted to talk more about the tools I was considering. I tried out mapalist.com (addresses); maptal.es (search for a location); mapbox.com (not free); mashupforge.com; targetmap.com; Unlock (EDINA); Recollector (NYPL); Google Maps API; Google Fusion Table API. In the future I will be trying Google Maps API, Google Fusion Tables and also batch geocoding which you can do from the tables.
Deborah: This is our catalogue records. Our steering committees want to search materials geographically so we are trying to enhance our catalogue records for each report that we are digitising – about 7000 for the London collection in scope. We needed to add various fields to allow search by geographic area and coverage date. And what we are trying to think about is the change in administrative boundaries in London. Significant changes in the 19th century and also changes in 1965 to boroughs. Current areas will be applied but we are still working on the best way to handle historic changes so we hope to learn from today on that.
Nathalie: One of the things we’ve begun to realise, especially today, is that catalogue record isn’t the best place for geographic information. Adding fields for geographic information, and to draw this out of other fields, like the 245 title field, is helpful but we need to find a way to do this better, how do we associate multiple place names?
This was very much an experiment for us. But we need to rethink how to geocode the data from library catalogue records – Google will give you just one marker for London even if there are ten records and that’s not what we’d want as cataloguers. We have learnt about mapping our data – and about how to think about catalogue records as something that can be mapped in some way. Upgrade of catalogue records for Medical Officers of Health Reports – very useful for us to do anyway.
Top tips from us:
- Test a lot, and in small batches, before doing a full output/mapping – makes it easier to make changes. 3000 is too many to test things really, need to trial on smaller batch.
- Know where you’ll put your map – it was an experiment. I blogged about it but it’s not on the website, it’s a bit hidden. You need to know what to do with it
- Really get to know your data source before you do anything else! Unless you do that it’s hard to know what to expect.
Deborah: Our future plan is to digitise and make freely available the 7000 MOH reports via the Wellcome Digital Library by early 2013. And we hope to enhance the MOH catalogue records as well.
Nathalie: Initial feedback has been really positive, even though this was a quick, dirty, experiment.
James: There are people here you can tap – looking at Humphrey re: nineteenth century – and we have some tools that might be useful. We can chat offline. This is what we wanted out of today – exchange and new connections.
Stuart Dunn, KCL,Â Digital Exposure of English Place-Names (DEEP)
I’m going to talk a bit about the DEEP project, funded under the recent JISC Mass Digitisation call. It’s follow on work from a project with our colleagues on this project at EDINA. This is a highly collaborative project between Kings College Lonon, the University of Edinburgh Language Technology Group, EDINA, and the National Place Names Group at Nottingham.
DEEP is about placenames, specifically historic placenames and changes over time. Placenames are dynamic. And the way places are attested also changes to reflect those changes. The etomological and social meaning of placenames really change over time. Placenames are contested, there is real disagreement over what places should be called. They are documented in different ways. There are archival records of all sorts, from Domesday onwards (and before). And they have been researched already. The English Place Names Society has already done this for us – they produced the English Place Name Survey, there are 86 (paper) volumes in total and these are organised by county. There is currently no hard and fast editorial guidelines in how this was produced so the data is very diverse.
There are around 80 years of scholarship, covering 32 English counties, 86 volumes, 6157 elements, 30517 pages, and about 4 million individual place-name forms but noone yet know how many bibliographic references.
Contested interpretations and etymologies – and some obscene names, like “Grope lane”, help show how contested these are. So we are very much building a gazeteer that will connect and relate appropriate placenames.
The work on DEEP is follow up to the CHALICE project which was led by Jo Walsh and was a project between EDINA and the Language Technology Group at Edinburgh. This extracted important places from OCR text and marked them up in xml. We are adopting a similar approach in DEEP. The University of Belfast is to digitise the Place Names Survey, then the OCR text will be parsed, and eventually this data will go into the JISC UNLOCK service.
We have been trying to start this work by refining the xml processing of the OCR. Belfast’s tagging system feeds the parser that helps identify historic variants, etc. The data model does change from volume to volume which is very challenging for processing. In most cases we have Parish level grid references but the survey goes to township, settlement, minor name and field name levels. And we challenges of varying countries, we have administrative terminology variance. So we are putting data into a Metadata Authority Description Service (MADS) so that we don’t impose a model but retain all the relevant information.
Our main output for JISC will be point data for Unlock. Conceptually it will be a little but like GeoNames – we are creating Linked Data so it would be great to have a definitive URI for one place no matter what the variants in name.
Not only is Google problematic but so are the geographic primatives of points, lines and polygons. Pre-OS there is very little data on geographic associations of place-names; points are arbitraru and dependent on scale; administrative geographies change over time; even natural features can mislead – rivers move over time for instance.
We are talking to people like Vision of Britain both to see if we can feed into that site and if we can use that data to check ours. One of the projects I am very interested in is the Pleides project which has digitised the authoritive map for ancient roman and greek history. This is available openly as Linked Data. That’s what I’d like to see happening with our project, which would include varying names, connectings, bibliographic references, and a section of that data model from MADS classification.
Another important aspect here is crowdsourcing. So we will be working with the Nottingham partners in particular will be working with the enthusiastic place names community to look at correcting errors and omissions in the digitisation and the NLP; to validate our output with local knowledge; add geographic data where it is lacking – such as field names; identify crossovers with other data sources. etc. We will be discussing this at our steering group meeting tomorrow.
And finally a plug for a new AHRC project! This is a scoping study under the Connected Communities on crowdsourcing work in this area,
Comment: I would be interested to see how you get on with your crowdsourcing – we work on Shetland Place Names with the community and it would be really interesting to know how you cope with the data and what you use.
James: Are you aware of SWOP in Glagow ? You might be interested in the tools they use might be applicable or useful.
Q1) I would be interested in seeing how we can crowdsource place names from historic maps as well – linking to Geo Rereferncer project maps or Old Maps Online – that could be used to encourage the community to look at the records and some sort of crowdsourcing tool around that.
A1) As you know the British Library’s GeoReferencer saw all 700+ maps georeferenced in four days, there is clearly lots of interest there.
Humphrey: We are proposing a longer term project to the EU more in this area. We haven’t been funded for an API, we’ve done much of what has been discussed today but they are not accessible because of what/how we’ve been funded in the past.
And with that we are done for the day! Thank you to all of our wonderful speakers and very engaged attendees.