10 things we learned at the Trading Consequences project meeting…

On Thursday 17th and Friday 18th May we held a Trading Consequences project meeting in Edinburgh where the whole team finally got to meet each other after months of virtual meetings. Here are the 10 awesome things we found out…

  1. Visualisation isn’t about pretty pictures it’s about insight. Take for example the  London Underground map and a New York Subway map… you will see some seriously different stylings (you can see both in Aaron’s presentation here). The London Underground Map is all about key points on the routes, the map isn’t a literal representation of distance but a conceptual take on London’s origins as a network of villages. In New York, where residents are used to walking above ground and are particularly used to the grid system for roads the map reflects this in order to make it easier to conceptualise the combination of Subway and walking routes. And that’s the key thing… visualisations are about representing different world views, different conceptions of information, specific mental maps of the data. A good visualisation reflects a particular world view rather than trying to loyally mirror reality.
  2. Image of a banana

    Moved banana by Flickr user ungard | dave ungar

    Yes, we have no bananas! Well, actually, we might have some bananas today but in London in 1905 did you know that you were allowed to steal bananas if they were brown or blackened? There is an oral history description of being allowed to steal these bananas as they couldn’t be sold. So, can we find evidence to back this up? If we are going to then we need to leave as much information in the ontology we are building to ensure we can find and access that sort of detail. Of course we know what we want to look for here – banana-bread ready fruit is a bit of a known unknown – but what about the things we don’t know about yet? The unknown unknowns we may want to find in the future? Not being able to find something in the data we have gathered doesn’t necessarily mean it’s not there, it just means we can’t confirm that it’s there.

  3. The 19th Century take on “animal, vegetable, or mineral?” was “from the sea“, “from the farm“, or “from the forest”?  This is all about ontologies again… So what is an ontology? Well it’s a way to understand the world, a conceptual model that allows you to structure, sort, classify, connect and understand each item within its immediate and wider context. In an era of trading raw materials and early manufactured items “from the sea” made sense, “from the farm” added useful context… similarly we might be used to understanding trees by their genus but historically qualities such as whether it can be sawn or hewn were important classifications. We’ve been thinking about this since the meeting and you can read about some of the issues around ontologies on Ewan’s blog.
  4. Image of artificial eyes

    Eyes (NOT FOR SALE) by Flickr User fumikaharukaze | Fumika Harukaze

    The eyes have it… and that can be a real problem as us humans are quite a lot better built for reading visual information than machines. When we are looking at sources for Trading Consequences we are seeing digitised materials that have been scanned then OCRed (put through Optical Character Recognition). Printing presses used to be pretty quirky – the letter “a” might look squiffy in every print, or a mark might appear on every page, ink may have smudged, etc. Scanning and OCR technology might look much more high tech but they too have quirks – digital cameras and scanners get better all the time and OCR engines improve each year… that means materials we are working with that were digitised years back look noticibly different from those that have been recently scanned and OCRed. That can be pretty challenging… and then we get to the many tables of traded goods. The human may see a very attractive pattern of columns and rows but the computer just doesn’t see it that easily and we have to try to guide it to read the data in so that it makes sense to the machine, to us humans, and that it reflects what was in the original document.

  5. Image of turkey red cotton

    "Turkey red floral patterns." by the National Museum of Scotland's Feastbowl Blog (click through to read a full post on Turkey Red)

    Wild turkey and rubber demands…. Turkey Red is a type of dyed cotton – named after the place not the bird – which was exported in huge amounts, much of it from Aberdeen But Turkey Red was a complicated and expensive die to make and the process was incompatible with the new textile printing processes that were emerging. There was a shift from natural dyes to synthetic materials and demand for Turkey Red plummeted. The project team has been in discussion with Edinburgh University’s Stana Nenadic and her Colouring the Nation project, which specifically looks at the history of Turkey Red. However, this is just one great example of changes in society being echoed by the consequence of trade and we hope this project will help us explore more of these Big changes generally take place at key pivotal dates due to shifts in economic, political and environmental factors and historians will look for these peaks and sharp changes. Changes such a huge increase in demand for rubber because of the bicycle craze!

  6. Lost in translation? With academic historians, informatics researchers, visualisation experts, specialists in geospatially enabled databases and a social media specialist gathered together in one small room with a lot of coffee we knew we’d have to do a lot of talking to explain our very different positions. For a start our informatics researchers are used to beginning with a hypothesis whilst our historical researchers are much more likely to take a grounded research approach. This is a really different way to plan and conduct work and we need to understand where we’re all coming from. The tools this project creates need to enable historians in their processes and we must be careful to build something that meets specific needs and appropriate expectations. At the same time, as a project team, we also need to be working together to ensure our publications schedules make sense so we needed to spend some time getting up to speed on which conferences matter in each discipline, where we can work collaboratively on papers and publications, and what types of research outputs are most important for the project partners.
  7. Image of tape storage.

    The History of Tape Storage by Flickr user Pargon

    Storage solutions: a database is not just “a database”, just like furniture from a certain Swedish home furnishing chain you need to know the measurements, the aesthetic needs, the future extensibility before you buy. And just like a house you need the right foundations to build something stable, fit for purpose and ready to use. What questions we will be asking of our data are the essential starting point here (see also Aaron’s blog, “The question is key in Trading Consequences” ) – knowing these and some sort of suitable ontology early on helps us ensure we can design the right structure for our database.

  8. History in a changeable climate – part of the the Trading Consequences project is to consider the impact, the consequences, of historical trades. That means looking at different resources and seeing what the most likely environmental impacts of timber trade, cattle trade and so on might be. That means users may want to query our data based on those impact – looking up the kind of trades that might contribute to flooding, that may be reflected in famine, that might be affected by draught, etc. That requires a whole separate ontology for environmental impact that can somehow account for these very interconnected factors – and that is a lot harder than it looks!
  9. Image of a lab

    Harvey W. Wiley conducting experiments in his laboratory by DC Public Library Commons | DCPL Commons on Flickr Commons (click for more information)

    Shipping drugs – no, not a sinister diversification for the project but a reflection of the complexity of trading data. We can look for records of trading particular types of medicines and drugs but sometimes that’s not the right data to look at. Botanical trades also reflects the trading of drugs as some plant material was shipped for later use or processing into pharmaceuticals (for an idea of the type of plants involved take a look at the Alnwick Poison Garden). The same issue applies to leather goods for instance – you might trade the hides, specific goods like leather gloves, perhaps even the whole cow. All of those trades may reflect leather trade but understanding, combining and querying that data poses some challenges.

  10. Pithy headings! They matter! Part of our project meeting was considering how we communicate the project. As well as learning to use pithy headings, images, bullet points and other web-friendly formatting, we also found out that blog posts should usually be no more than 200-300 words. We also discussed how people access this site on other devices, particularly mobiles. Although we are working on historical data a lot of us are using smart phones and they have smaller screens and differing requirements. We agreed to apply a new mobile theme – so do try reading this blog on your phone and let us know if you like it!

We hope that gave you a flavour of our kick off meeting. It took place over two days so we’ve obviously trimmed it down a lot but if you have any questions, comments or suggestions do add it here and we’ll get back to you.

Share

Linked Data for places – any advice?

We’d really benefit from advice about what Linked Data namespaces to use to describe places and the relationships between them. We want to re-use as much of others’ work as possible, and use vocabularies which are likely to be well and widely understood.

Here’s a sample of a “vanilla” rendering of a record for a place-name in Cheshire as extracted from the English Place Name Survey – see this as a rough sketch.

<RDF>
<chalice:Place rdf:about=”/place/cheshire/prestbury/bosley/bosley”>
<rdfs:isDefinedBy>/doc/cheshire/prestbury/bosley/bosley
</rdfs:isDefinedBy>
<rdfs:label>Bosley</rdfs:label>
<chalice:parish rdf:resource=”/place/cheshire/prestbury/bosley”/>
<chalice:parent rdf:resource=”/place/cheshire/prestbury/bosley”/>
<chalice:parishname>Bosley</chalice:parishname>
<chalice:level>primary-sub-township</chalice:level>
<georss:point>53.1862392425537 -2.12721741199493</georss:point>
<owl:sameAs rdf:resource=”http://data.ordnancesurvey.co.uk/doc/50kGazetteer/28360″/>
</chalice:Place>
</rdf:RDF>

GeoNames

We could re-use as much as we can of the geonames ontology. It defines a gn:Feature to indicate that a thing is a place, and gn:parentFeature to indicate that one place contains another.

Ordnance Survey

Ordnance Survey publish some geographic ontologies: there are some within data.ordnancesurvey.co.uk, and there’s some older work including a vocabulary for mereological (i.e. containment) relations includes isPartOf and hasPart. But the status of this vocabulary is unclear – is its use still advised?

The Administrative Geography ontology defines a ‘parish‘ relation – this is the inverse of how we’re currently using ‘parish’. (i.e. Prestbury contains Bosley) (And our concepts of historic parish and sub-parish are terrifically vague…)

For place-names found in the 1:50K gazetteer the OS use the NamedPlace class – but it feels odd to be re-using a vocabulary explicitly designed for the 50K gazetteer.

Or…

Are there other wide-spread Linked Data vocabularies for places and their names which we could be re-using? Are there other ways in which we could improve the modelling? Comments and pointers to others’ work would be greatly appreciated.

Structuring a Linked Data namespace for places

Thoughts on structuring a namespace for historic English places, for our prototype Linked Data version of the English Place Name Survey; how do others do it? Our options seem to be:

  1. give each placename a numeric identifier that can be part of the link
  2. create a more human-readable identifier based on the name, to use as part of the link.

Numeric identifiers for places look like common practise. Geonames.org uses numbers to create links for places – so http://sws.geonames.org/2656197/ “is”, or refers to, Baschurch in Shropshire. Though the coordinates of the point may change, the number is associated with the name, and it remains the same.

Ordnance Survey Linked Data also uses a numeric ID to create its link that stands for (the same) Baschurch – http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354.

The Linked Data Patterns online book has a set of patterns for identifying URIs. The patterns are focused on use with systems that are already database-based, with some design thought having gone into how IDs look, how they can be looked up, and how their persistence is guaranteed.

The point here is that the numeric identifiers still need careful curation – an organisational guarantee that the identifiers will stay the same for the predicatable future.

We’re using a relational database (PostGIS) rather than a triplestore, to hold the Chalice data (because the data model won’t really change or expand). We can’t just use IDs that are created automatically by the database when items are inserted into it, because those might change if the names are inserted in a different order.

During Chalice we’re not building a be-all-end-all system, but rather prototyping an approach to text mining and georeferencing places can be used to turn an amazing hand-created resource into a 21st century Linked Data gazetteer; leaving behind open source tools to make sure the process can be repeated again with more digitised text.

But we’re not building something to throw away; we want to make sure the links we create can be preserved – that they won’t be broken and won’t change their meanings. So it may be better for us to structure our namespace using the EPNS names themselves, and the order in which they occur in the printed volumes of EPNS.

The EPNS volumes are arranged county-by-county – each county has its own editor, and so may have different layout, style guidelines, level of detail for things like field-names, and the presence or absence of OS Grid coordinates, more or less according to the whims of the county editor. (We’ve focused on Cheshire, but LTG have been developing test parsers for samples of several different counties.)

So it makes sense to include the county name in our namespace. This also helps with disambiguation – which Walton is this Walton? But there will still be cases where several places, in quite different locations, but still within the same county, share a name. In this case, we’d also give the places a numeric identifier (Walton-1, Walton-2) in the order in which they appear in the EPNS text.

Some volumes of EPNS give us OS National Grid coordinates for the “major names”, others don’t. Where the “major name” exists in one or more gazetteers (geonames, OS Open Data), the LTG’s georesolver tool can create some of the missing links using the Unlock Places gazetteer cross-search.

More potentially useful context in the work of the UK Location Programme on Linked Data namespaces for places – a recent Guide to Linked Data and the UK Location Strategy, and last year’s guidance on Designing URI sets for Location.

One more potential complication, which is a fairly subtle issue of semantics – does a link identify a place, or a description of a place? Ordnance Survey Research try to make the difference clear by using a different namespace for ‘IDs for places’ and ‘IDs for documents describing places’.
So http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354 “is” Baschurch; and http://data.ordnancesurvey.co.uk/doc/50kGazetteer/16354 “is” the description of Baschurch. To make sure we’re properly confused, when a human looks up the /id/ link using a web browser, the browser is redirected to the human-readable /doc/. To actually get hold of the Linked Data description of Baschurch (including the coordinates for it in the 50K gazetteer), one has to specifically request the machine-readable, rather than human-readable, version of the link, like this:

curl -L http://data.ordnancesurvey.co.uk/id/50kGazetteer/16354 -H "Accept: application/rdf+xml" :) - but now you know that!

This took me a little while, and some back-and-forth with John Goodwin from OS Research on “Twitter”, to figure out, which is why I thought it worth writing down here.

Linked Data choices for historic places

We’ve had some fitful conversation about modelling historic place-names extracted from the English Place Name Survey as Linked Data, on the Chalice mailing list.
It would be great to get more feedback from others where we have common ground. Here’s a quick summary of the main issues we face and our key points of reference, to start discussion, and we can go into more detail on specific points as we work more with the EPNS data.

Re-use, reduce, recycle?

We should be making direct re-use of others’ vocabularies where we can. In some areas this is easy. For example, to represent the containment relations between places (a township contains a parish, a parish contains a sub-parish) we can re-use the some of the Ordnance Survey Research work on linked data ontologies – specifically their vocabulary to describe “Mereological Relations” – where “mereological” is a fancy word for “containment relationships”.

Adapting other schemas into a Linked Data model

One project which provides a great example of a more link-oriented, less geometry-oriented approach to describing ancient places is the Pleaides collection of geographic information about the Classical ancient world. Over the years, Pleaides has developed with scholars an interesting set of vocabularies, which don’t take a Linked Data approach but could be easily adapted to do so. They encounter issues to do with vagueness and uncertainty that geographical information systems concerning the contemporary world, can overlook. For example, the Pleiades attestation/confidence vocabulary expresses the certainty of scholars about the conclusions they are drawing from evidence.

So an approach we can take is to build on work done in research partnerships by others, and try to build mind-share about Linked Data representations of existing work. Pleiades also use URIs for places…

Use URIs as names for things

One interesting feature of the English Place Name Survey is the index of sources for each set of volumes. Each different source which documents names (old archives, previous scholarship, historic maps) has an abbreviation, and every time a historic place-name is mentioned, it’s linked to one of the sources.

As well as creating a namespace for historic place-names, we’ll create one for the sources (centred on the five volumes covering Cheshire, which is where the bulk of work on text correction and data extraction has been done. Generally, if anything has a name, we should be looking to give it a URI.

Date ranges

Is there a rough consensus (based on volume of data published, or number of different data sources using the same namespace) on what namespace to use to describe dates and date ranges as Linked Data? At one point there were several different versions of iCal, hCal, xCal vocabularies all describing more or less the same thing.

We’ve also considered other ways to describe date ranges – talking to Pleiades about mereological relations between dates – and investigating the work of Common Eras on user-contributed tags representing date ranges. It would be hugely valuable to learn about, and converge on, others’ approaches here.

How same is the same?

We propose to mint a namespace for historic place-names documented by the English Place Name Survey. Each distinct place-name gets its own URI.

For some of the “major names”, we’ve been able to use the Language Technology Group’s georesolution tool to make a link between the place-name and the corresponding entry in geonames.org.

Some names can’t be found in geonames, but can be found, via Unlock Places gazetteer search, in some of the Ordnance Survey open data sources. Next week we’ll be looking at using Unlock to make explicit links to the Ordnance Survey Linked Data vocabularies. One interesting side-effect of this is that, via Chalice, we’ll create links between geonames and the OS Linked Data, that weren’t there before.

Kate Byrne raised an interesting question on the Chalice mailing list – is the ‘sameAs’ link redundant? For example, if we are confident that Bosley in geonames.org is the same as Bosley in the Cheshire volumes of English Place Name Survey, should we re-use the geonames URI rather than making a ‘sameAs’ link between the two?

How same, in this case, is the same? We may have two, or more, different sets coordinates which approximately represent the location of Bosley. Is it “correct”, in Linked Data terms, to state that all three are “the same” when the locations are subtly different?
This is before we even get into the conceptual issues around whether a set of coordinates really has meaning as “the location” of a place. Geonames, in this sense, is a place to start working out towards more expressive descriptions of where a place is, rather than a conclusion.

Long-term preservation

Finally, we want to make sure that any URIs we mint are going to be preserved on a really long time horizon. I discussed this briefly on the Unlock blog last year. University libraries, or cultural heritage memory institutions, may be able to delegate a sub-domain that we can agree to long-term persistence of – but the details of the agreement, and periodic renewal of it due to infrastructural, organisational and technological change, is a much bigger issue than i think we recognise.