The County Surveys Search Engine

One of our key aims in building the interface for our collection was to allow people to explore and “play with” the data. It’s hard to get a sense of the extent of the series and the relationships between the surveys without some kind of overview: once you can see the surveys all together and look at them in different ways, it’s much easier to grasp their logic. So we wanted a tool that would aggregate all of the information we have gathered and then allow people to look at that information in flexible ways, to filter and explore it according to their interests.

Flexibility was also a priority in technical terms: we’re making this data available for the first time in this format, so we are aware that we don’t really know what people will want to do with it. We don’t see what we have done with the demonstrator as being the last word but rather the first. Based on this, we can start to understand the data better and start to understand how people might want to access it.  We expect to have to adapt the data and the ways of accessing it as we go along and we learn what we can most usefully provide to the community.

The Data

The process of gathering data has been described in another post, but from the demonstrator’s point of view what was important was to try to keep things as general and adaptable as possible. Nevertheless, this kind of historical data presents certain peculiarities and challenges. One of the most obvious is how to present the survey data. The surveys are arranged by county but the counties that were used are not the counties as they are today. Indeed, the counties used in the first and second phases of the county surveys are not the same. So we needed a mechanism which would allow people to make sense of the data without being restrictive. We’ve achieved this by providing a canonical list of counties taken from Ordinance Survey Data from the early 19th century. We then map this to the actual counties as surveyed. There’s not a perfect match here but we take a “permissive” view of the data – we’d rather show you slightly too much than too little. So the user gets presented with the canonical list in the search facility and we then map that to the county data to decide what to show. The same holds for the author data. We hold a canonical list of authors and map these to the real authors. This allows us to adjust the data in future as we discover more about it.

The Data Model

This mapping then gives rise to the data model. We have surveys which have a county associated with them. Then we have a list of counties which we present to the user which may map to more than one of the underlying counties. That can get a bit confusing but if we look at an example, it becomes clear. If we want to look at the surveys for Shetland then in the filter list we have “Zetland or Shetland” which is how it is listed in the Ordinance Surveys. In the first phase of the surveys, Shetland was included under “Northern Counties and Islands” but in the second phase it has a survey of its own. The implication of this for the data model is that we have to have a one-to-many mapping from entries in the search list to the entries in the surveys. In fact, the same county survey might appear under more than one search term e.g. the first phase “Northern Counties and Islands” needs to appear under Shetland, Orkney, Caithness and Sutherland. So we have to have a many-to-many mapping between the search counties in the interface and the counties as specified in the surveys themselves. To do this we adopt the standard database approach of having a mapping table i.e.ccounty_county

So ccounty is the list of counties as it appears in the search list and county is as they appear in the surveys and the mapping table allows us to relate these two to each other in any way we want. Each Survey can have many publications and each publication can be held in multiple places. This explains why we have separated out surveys from publications from holdings in the data model.

database schema

Database Schema (click to open in new tab)

This model might seem a little complex but it gives us a great deal of flexibility in how we handle counties and authors and makes it fairly easy to add new information about publications and holdings as it becomes available to us.

The Technology Chosen

In line with the ethos of flexibility, we decided to work with standard technology components. At the back end is a relational database. Sitting on top of that is a Web Application built using a standard MVC framework. This approach has advantages in terms of the flexibility but also in terms of getting up and running quickly. The MVC approach (Model-View-Controller) separates out the storage of the data (the Model) from the logic of the application (the Controller) and how the data is displayed (the View). This means that changing one part of it has less impact because it is isolated from the other components. A good example of this flexibility is the change we made to the interface which was covered in a previous post.

The MVC approach to web applications is one of the standard development techniques for web applications these days and when it comes to implementing this you have a wide choice of languages and MVC systems. In our case, it’s all written in Perl using Postgres for the DB with a Catalyst Application on top. So the application takes the standard Catalyst approach of using DBIx::Class to implement the Model and interface to the database and Template Toolkit for the front end. The choice of specific MVC implementation doesn’t matter so much – there are plenty to choose from! It’s really the flexibility this approach gives which is the main thing. Using standard technologies gives us the adaptability we need to be able to do this easily, so that we can get the data available and we can adapt to whatever changes come out of that down the line.

Evolution by Use

So this demonstrator gives people access to look at the data. We’re hoping people will find it helpful in “playing with” the data. But it’s very much the first draft. We expect it to evolve over time as we and any one else interested in the Surveys gets to know the data better and we start to understand more about how to make this data available to people.

Working with the Royal Botanic Gardens Edinburgh

Palm_House,_Royal_Botanic_Garden_EdinburghOver the last few weeks we have been working in partnership with the Royal Botanic Gardens Edinburgh, who hold an excellent collection of County Surveys as part of their impressive collections. The RBGE is currently in the process of having their rare books comprehensively catalogued by the Rare Book Cataloguer from the Centre for Research Collections (CRC) at the University of Edinburgh, and we are pleased to be able to contribute to this process by assisting in the cataloguing of the County Survey holdings. Once they are complete, we hope that these new electronic records will from the basis of another data set for our online demonstrator.

The RBGE also has state of the art equipment and digitisation specialists in house: although they are currently involved in an extensive project to digitise specimens from the internationally renowned herbarium, staff have generously shared their knowledge and allowed us to use their equipment to digitise a few of the surveys. We are pleased to report this work is going very well and we should be able to make the digitised copies available soon, so watch this space.

County Surveys Search Tool Goes Live

We are delighted to announce that our bibliographic search tool is now live and accessible from the ‘Search‘ tab in the menu above.

Our demonstrator includes bibliographic data from some of the best collections of the surveys and, where possible, provides links to library catalogue entries and  digital editions. Researchers can search by modern county name, by series, by county and by author. Results are presented in a new tab after each search, so that you can compare multiple search results by toggling between pages. There are also detailed analyses of collections, revealing the extent of holdings and coverage, and indicating which surveys would be needed to complete each collection.

demonstrator2

 

We hope that the demonstrator will be a useful finding aid and discovery tool for those interested in the County Surveys, the history of statistical reporting and British history more broadly. We would welcome any feedback on the tool, and would be very keen to hear about how it is used or whether it could usefully offer other features and information. If you have ideas, please get in touch with us at edina@ed.ac.uk.

Titles

In a previous post, I mentioned that we are currently reviewing what information users will be able to see in the results page produced by searching our bibliographic database. The current fields displayed are country, county, author, phase and publication date. The most obvious omission here is of course title. However, for a number of reasons, it’s been impractical to include the titles during development and may not be practical in the online tool. This post explains why, and outlines some of the challenges presented by the survey titles.

A generic title page for the County Surveys

A generic title page for the County Surveys

Firstly, there is the issue of length. Some of the survey titles extent to half a page, and if they were shown in full, they would significantly limit the number of results that could be shown on one screen. In addition there is the issue of repetition, as most follow a generic format. Here, for instance, is one typical title:

General View of the Agriculture of the Hebrides, or Western Isles of Scotland: with observations on the means of their improvement, together with a separate account of the principle islands; comprehending their resources, fisheries, manufactures, manners, and agriculture. Drawn up under the direction of the Board of Agriculture. With several maps.

Like all the other titles, it begins with the generic form ‘General View of the Agriculture of… with observations on the means of their improvement….’ Listing many titles in such a form is potentially confusing visually and means that a reader has to work harder than usual to scan and identify the different content. It also means that presenting a shortened title is difficult without reducing the title to the county name. In which case, why not simply list country name? This is what we have done throughout the development process. But, as this example also shows, there is also quite a lot of useful additional information which varies from title to title and which may attract slightly different groups of readers. Here for instance, the promise of an account of the manners of the islands makes the socio-historical interest of this volume explicit. The question is how we can we format the title in such a way as to reveal that information without creating redundancy and repetition.

Unexpected title variations have also created challenges in gathering bibliographic data. The Irish surveys (which are mentioned but not detailed in our master bibliographies) have a significantly different title format.  Rather than ‘general views’ they are titled ‘Statistical Survey of the Country of… with observations on the means of their improvement’. We discovered this late in the process, which meant that we had to go back to the sources we had harvested information from and repeat the process. To complicate matters, even these variations are not consistent: anomalies such as the General view of the agriculture and mineralogy, present state and circumstances of the County Wicklow exist, making it very difficult to be sure we have identified all the relevant publications and holdings.

We will be experimenting with the format of the results page over the next few weeks, and hope to find a way to present the titles to include some of these interesting variations.