Following on from our teleconference we here at Edina thought it was a good opportunity to provide a blog post on our findings so far, and try and give a brief outline of the information we have been able to extract from the data provided to us by LSE.
The first task was to try and locate the map images that have been utilised by the old web application and we came across the Booth map and a more modern Bartholomew map of London. Both maps were in the MrSID format (interestingly developed in the USA for the storing of fingerprints for the FBI), but importantly for us is proprietary in nature and we needed to convert the images to a different format.Â In the GIS world a commonly used set of tools for translating and transforming geospatial data is offered by GDAL; the â€˜Geospatial Data Abstraction Libraryâ€™. (http://tinyurl.com/yw34w4 ). These tools provide a variety of command-line utilities and using â€˜gdal-translateâ€™ we converted the MrSID maps to tiff files, a more commonly used format in GIS.
Now that we had a usable tiff image of the Booth map, the first major issue we came across was that unfortunately the map did not appear to be georeferenced, meaning that there were no accompanying files, or indeed any imbedded information that would establish its location in terms of map projections or coordinate systems. Â This is a major problem and without this information overlaying the image over modern maps would be impossible. Â (For more information on map projections take a look here: Â http://tinyurl.com/25gltb ). From a little digging around into the old application it seems that the georeferencing had been applied via the use of Perl scripts and it would be difficult for us to extract the required information, so we would have to try and georeference the map ourselves manually.Â The process of georeferencing an image involves selecting a set of control points and relating them to a map with a known coordinate system and projection – thankfully most modern GIS software programs handle this process easily and take away the pain of some of the complicated mathematics involved.Â (A good overview of the georeferencing process in ESRI ArcMap can be found here:Â http://tinyurl.com/2whflza).
- The projection that you select when georeferencing has a strong baring on the quality of the result. If you choose a map projection similar to that used by the initial map (Booth map), you have a much better chance of matching the control map. As it appears that the Booth maps were derived from a base of the old â€˜county seriesâ€™ of Ordnance Survey (OS) maps, it is highly likely that they were created using the â€˜Cassini Projectionâ€™ as this was the projection most commonly used at the time. (The Ordnance Survey give a brief history here: Â http://tinyurl.com/84pjb4d). Â This leads us to make the selection of the OSGB 1936 projection otherwise known as the â€˜British National Gridâ€™.Â In many ways this was the successor to the Cassini projection and is still used today. Choosing this projection will also make things easier later on when we try to integrate any new service with items such as gazetteers and additional mapping.
- At Edina we have access to all modern Ordnance Survey base maps and these were used as our control maps. WeÂ didn’tÂ select the Bartholomew map simply because it is slightly dated and would probably not be used in any application we develop.
Ultimately the software provides a service to stretch the Booth map and try and fit it against the backdrop of the more modern mapping. Â The results so far have been encouraging but will still need some tweaking. It is also important to note that with the Booth maps being produced over 100 years ago, they used old surveying techniques and didnâ€™t have the benefit of modern technology such as GPS so in some cases it may be impossible to get a â€˜perfect matchâ€™, the hope is to get the best you possibly can.
An additional finding while digging through the data included a postgres database schema containing geographically referenced point information, with one table alone containing 47,000 points!! The tables include a variety of information such as postcodes, streets, wards, landmarks, parishes and walks. Â The first thing we noticed is that the data is stored in simple straight tables and is not spatially enabled.Â PostGIS is an open source software program that adds support for geographic objects in the postgres database and the data will need to be processed to fit the new data model. Once the data fits the PostGIS model, GIS web applications can interact with the data more easily and adds additional facilities such as coordinate transformation support, advanced indexing, the ability to store polygon and line data (as well as points) and also offers the killer ability to perform queries against the data via spatial SQL. Utilising Spatial SQL will allow us to answer questions such as: Â â€˜show me all the notebook entries within one mile of my locationâ€™, or â€˜show me all the notebook entries in a certain postcodeâ€™.Â There are many more advantages of spatially enabling your database and more information can be found here: http://tinyurl.com/27aqmor).
From an initial look at the data held in the tables the quality or accuracy can be improved upon and we will be investigating what our options are here. Importantly though, there is a clear link between geographical locations and notebook pages.
Other discoveries were of the many scanned images of the Booth notebooks and additionally Boothâ€™s family magazine called â€˜The Colonyâ€™. For many of these there appears to be multiple copies in different formats: a â€˜gifâ€™ thumbnail preview, and a larger version in â€˜jpegâ€™ and â€˜DjVuâ€™ formats.
So now that we have:
- A georeferenced Booth map (may need some tweaking)
- A postgres database schema Â with geographically referenced point information (needs translating into PostGIS)
- Scanned images of the Booth Notebooks (needs linking to the database tables)
we have the building blocks to start developing a GIS web application for initial demonstration purposes that possesses all the tools familiar to the users of products such as Google maps. Â This will allow us to examine the relative success of our georeferencing process and also give us the ability to test out some of the core and optional functionality defined in a previous blog post.