Official Launch of Trading Consequences!

Today we are delighted to officially announce the launch of Trading Consequences!

Over the course of the last two years the project team have been hard at work to use text mining, traditional and innovative historical research methods, and visualization techniques, to turn digitized nineteenth century papers and trading records (and their OCR’d text) into a unique database of commodities and engaging visualization and search interfaces to explore that data.

Today we launch the database, searches and visualization tools alongside the Trading Consequences White Paper, which charts our work on the project including technical approaches, some of the challenges we faced, and what and how we have achieved during the project. The White Paper also discusses, in detail, how we built the tools we are launching today and is therefore an essential point of reference for those wanting to better understand how data is presented in our interfaces, how these interfaces came to be, and how you might best use and interpret the data shared in these resources in your own historical research.

Find the Trading Consequences searches, visualizations and code via the panel on the top right hand side of the project website (outlined in orange).

Find the Trading Consequences searches, visualizations and code via the panel on the top right hand side of the project website (outlined in orange).

There are four ways to explore the Trading Consequences database:

  1. Commodity Search. This performs a search of the database table of unique commodities, for commodities beginning with the search term entered. The returned list of commodities is sorted by two criteria (1) whether the commodity is a “commodity concept” (where any one of several unique names known to be used for the same commodity returns aggregated data for that commodity); or (2) alphabetically. Read more here.
  2. Location Search. This performs a search of the database table of unique locations, for locations beginning with the search term entered. The returned list of locations is sorted by the frequency that the search term is mentioned within the historical documents. Selecting a location displays: information about the location such as which country it is within, population etc; A map highlighting the location with a map marker; A list of historical documents and an indication of how many times the selected location is mentioned within each document. Read more here.
  3. Location Cloud Visualization. This shows the relation between a selected commodity and its related location. The visualization is based on over 170000 documents from digital historical archives (see list of archives below).The purpose of the visualization is to provide a general overview of how the importance of location mentions in relation to a particular commodity changed between 1800 and 1920. Read more here.
  4. Interlinked Visualization. This provides a general overview of how commodities were discussed between 1750 and 1950 along geographic and temporal dimensions. They provide an overview of commodity and location mentions extracted from 179000 historic documents (extracted from the digital archive listed below). Read more here.

Please do try out these tools (please note that the two visualizations will only work with newer versions of the Chrome Browser) and let us know what you think – we would love to know what other information or support might be useful, what feedback you have for the project team, how you think you might be able to use these tools in your own research.

Image of the Start page of the Interlinked Visualization.

Start page of the Interlinked Visualization.

We are also very pleased to announce that we are sharing some of the code and resources behind Trading Consequences via GitHub. This includes a range of Lexical Resources that we think historians and those undertaking historical text mining in related areas, may find particularly useful: the base lexicon of commodities created by hand for this project; the Trading Consequences SKOS ontology; and an aggregated gazeteer of ports and cities with ports.

Bea Alex shares text mining progress with the team at an early Trading Consequences meeting.

Bea Alex shares text mining progress with the team at an early Trading Consequences meeting.

Acknowledgements

The Trading Consequences team would like to acknowledge and thank the project partners, funders and data providers that have made this work possible. We would particularly like to thank the Digging Into Data Challenge, and the international partners and funders of DiD, for making this fun, challenging and highly collaborative transatlantic project possible. We have hugely enjoyed working together and we have learned a great deal from the interdisciplinary and international exchanges that has been so central to to this project.

We would also like to extend our thanks to all of those who have supported the project over the last few years with help, advice, opportunities to present and share our work, publicity for events and blog posts. Most of all we would like to thank all of those members of the historical research community who generously gave their time and perspectives to our historians, to our text mining experts, and particularly to our visualization experts to help us ensure that what we have created in this project meets genuine research needs and may have application in a range of historical research contexts.

Image of the Trading Consequences Project Team at our original kick off meeting.

Image of the Trading Consequences Project Team at our original kick off meeting.

What next?
Trading Consequences does not come to an end with this launch. Now that the search and visualization tools are live – and open for anyone to use freely on the web – our historians Professor Colin Coates (York University, Canada) and Dr Jim Clifford (University of Saskatchewan) will be continuing their research. We will continue to share their findings on historical trading patterns, and environmental history, via the Trading Consequences blog.

Over the coming months we will be continuing to update our publications page with the latest research and dissemination associated with the project, and we will also be sharing additional resources associated with the project via GitHub, so please do continue to keep an eye on this website for key updates and links to resources.

We value and welcome your feedback on the visualizations, search interfaces, the database, or any other aspect of the project, website or White Paper at any point. Indeed, if you do find Trading Consequences useful in your own research we would particularly encourage you to get in touch with us (via the comments here, or via Twitter) and consider writing a guest post for the blog. We also welcome mentions of the project or website in your own publications and we are happy to help you to publicize these.

Image of Testing and feedback at CHESS'13.

Testing and feedback at CHESS’13.

Explore Trading Consequences

Comparing Apples with Oranges

This Friday we will officially launch Trading Consequences this Friday (21st March), with publication of our White Paper and the launch of our visualization and search tools. Ahead of the launch we wanted to give you some idea of what you will be able to access, what you might want to view and what you might want to compare with these new historical research tools. Professor Colin Coates has been exploring the possibilities… 

The “Trading Consequences� website literally allows us to compare apples and oranges.  Both fruits became the objects of substantial international trade in the nineteenth century, as in the right conditions they can remain edible despite being shipped great distances.

Screen shot of a visualisation of Apple Trades

They are complementary fruits in many ways, as apples are grown in temperate climates whilst oranges prefer warmer conditions.  They may overlap geographically, but typically we associate different parts of the world with each fruit.  In the context of the British world, apples grew in the United Kingdom, of course, but they also came from Canada, New Zealand and the United States, among other locations.  Oranges from places like Spain, Florida or Latin America entered the United Kingdom in the nineteenth century.  The two maps which result from entering “apple� and “orange� into the database show, at a glance, how oranges appeared more often in reference to warmer zones than apples.

Screen shot of a visualisation of Orange Trades

The chronological distribution of commodity mentions was roughly similar in both cases.  Increased attention from 1880 to 1900 reflects in part the expansion of the documentation in that period, but it likely also reflected growth in trade and consumption.  Historian James Murton has pointed out that regular trade in apples developed from Canada to Great Britain in the 1880s, focused primarily in Nova Scotia.  On average, one million bushels of apples reached British markets (Murton, 2012).

In contrast, both apples and oranges show sudden spikes in the 1830s, for entirely different reasons.  The spike for apples points the researcher to a useful “Report from the Selection Committee on the Fresh Fruit Trade� in 1839.  But the mid-1830s spike in oranges points instead to the activities of Orange Lodges in Ireland.  The other visualisation shows this anomaly even more clearly, as IRELAND takes on a prominence in related geographical terms in the 1830s that it did not occupy afterwards.

Screenshot of Visualisation looking at trades in the 1830s

This project entailed teaching computers to read as an historian might, and there are distinct advantages to being able to deal with such a wide range of documentation.  However, all historians must be critical of the sources we use. The visualisations in “Trading Consequences� point towards useful sources for further study, and to suggest that historian may wish to consider some regions in their analysis.  The importance of the United States in the discussions about apples is noteworthy, for instance.  Australia has a large number of mentions of oranges, though it is important to note that a small city boasts the same name and could account for part of the number.  (Interestingly enough, Orange, New South Wales, did not grow many oranges according to the Australian Atlas 2006! But it does have apples.)

"Fruit" by Flickr user Garry Knight / garryknight

“Fruit” by Flickr user Garry Knight / garryknight

The increase in mentions of both apples and oranges from the 1880s on may reflect improving living standards in Britain in that period.  Britain’s decision to adopt free trade had led to an increase in a wide variety of imported foodstuffs (Darwin, 2009).  As the heightened attention to both apples and oranges probably shows, these fruits were part of that movement.

The “Trading Consequences� visualisations show some instructive comparisons, some that may point to different ways to conceive of trade in these resources, and others which illustrate the care with which researchers should approach results.

References

  • John Darwin, The Empire Project: The Rise and Fall of the British World-System, 1830-1970 (Cambridge: Cambridge University Press, 2009)
  •  James Murton, “John Bull and Sons: The Empire Marketing Board and the Creation of a British Imperial Food Systemâ€� in Franca Iacovetta et al., eds., Edible Histories, Cultural Politics: Towards a Canadian Food History (Toronto: University of Toronto Press, 2012), 234-35.
  • New South Wales Government, Agriculture – Fruit and Vegetables in the Atlas of New South Wales, Available from: http://www.atlas.nsw.gov.au/public/nsw/home/topic/article/agriculture-fruit-and-vegetables.html

Invited talk on Digital History and Big Data

Last week I was invited to give talk about Trading Consequences at the Digital Scholarship: day of ideas event 2 organised by Dr. Siân Bayne.  If you are interested in my slides, you can look at them here on Slideshare.

Rather than give a summary talk about all the different things going on in the Edinburgh Language Technology Group at the School of Informatics, we decided that it would more informative to focus on one specific project and provide a bit more detail without getting too technical.  My aim was to raise our profile with attendees from the humanities and social sciences in Edinburgh and further afield who are interested in digital humanities research.  They made up the majority of the audience, so this talk was a great opportunity.

My presentation on Trading Consequences at the Digital Scholarship workshop (photo taken by Ewan Klein).

Most of my previous presentations were directed to people in my field, so to experts in text mining and information extraction.  So this talk would have to be completely different to how I would normally present my work which is to provide detailed information on methods and algorithms, their scientific evaluation etc.  None of the attendees would be interested in such things but I wanted them to know what sort of things our technology is capable of and at the same time let them understand some of the challenges we face.

I decided to focus the talk on the user-centric approach to our collaboration in Trading Consequences, explaining that our current users and collaborators (Prof. Colin Coates and Dr. Jim Clifford, environmental historians at York University, Toronto) and their research questions are key in all that we design and develop.  Their comments and error analysis feed directly back into the technology allowing us to improve the text mining and visualisation with every iteration.  The other point I wanted to bring across is that transparency in the quality of the text mining is crucial to our users, who want to know to what level they can trust the technology.  Moreover, the output of our text mining tool in its raw XML format is not something that most historians would be able to understand and query easily.  However, when text mining is combined with interesting types of visualisations, the data mined from all the historical document collections becomes alive.

We are currently processing digitised versions of over 10 million scanned document images from 5 different collections amounting to several hundred gigabytes worth of information.  This is not big data in the computer science sense where people talk about terrabytes or petabytes.  However, it is big data to historians who in the best case have access to some of these collections online using keyword search but often have to visit libraries and archives and go through them manually.  Even if a collection is available digitally and indexed, it does not mean that all the information relevant to a search term is easily accessible users.  In a large proportion of our data, the optical character recognised (OCRed) text contains a lot of errors and, unless corrected, those errors then find their way into the index.  This means that searches for correctly spelled terms will not return any matches in sources which mention them but with one or more errors contained in them.

The low text quality in large parts of our text collections is also one of our main challenges when it comes to mining this data.  So, I summarised the types of text correction and normalisation steps we carry out in order to improve the input for our text mining component.  However, there are cases when even we give up, that is when the text quality is just so low that is impossible even for a human being to read a document.  I showed a real example of one of the documents in the collections, the textual equivalent of an up-side-down image which was OCRed the wrong way round.

At the end, I got the sense that my talk was well received.  I got several interesting questions, including one asking whether we see that our users’ research questions are now shaped by the technology when the initial idea was for the technology to be driven by their research.  I also made some connections with people in literature, so there could be some exciting new collaborations on the horizon.  Overall, the workshop was extremely interesting and very well organised and I’m glad than I had the opportunity to present our work.

 

 

Guest Post on Kew Gardens’ Blog

The Trading Consequences team have created a guest post, “Bringing Kew’s Archive Alive” for Kew Gardens’ Library, Art and Archives’ blog

The post looks at how digital data produced by Kew’s Directors’ Correspondence team can be used as a source for visualising the British Empire’s 19th Century trade networks.

You can read the post in full here: http://www.kew.org/news/kew-blogs/library-art-archives/bringing-kews-archive-alive.htm

Progress to date on Trading Consequences Visualizations

Up here in St Andrews we are in the process of exploring several routes to visualize the vast amount of commodity data that have been extracted from the historical archives by our colleagues from the University of Edinburgh.

Research in environmental history can be an open-ended process where research questions are formed and refined as part of working with the available data (i.e. historic documents). Our goal is therefore the development of visualization concepts that will reveal a range of temporal, geographic and content-related perspectives on the commodity data, and that will highlight different conceptual angles and relations within the data. Such “interlinked” visualization perspectives can provide an overview of the entire dataset and, at the same time, act as probes to explore certain aspects of the commodity data in more detail. Using this approach we aim to support more open-ended explorations of the commodity data as well as providing easy access to specific documents of interest.

Our design process so far has been driven by discussions with Jim and Colin, paper sketches to iterate on certain visualization ideas and some literature research on information visualization and digital humanities.

Discussions with Jim and Colin revealed that the temporal and geographic aspects of the data are central to their research but always in close combination with commodity types and their relations to each other. This resulted in several paper sketches, as you can see below, to explore how these particular aspects could be visually expressed and augmented with interactive features.

We also created (static) computational sketches (shown below) based on samples from the actual database. At the same time, our collaborators from EDINA created an interface to the database that allowed interrogating the data through textual queries and list views.

Both these approaches allowed use to explore the character of the data and potential visualization challenges that this introduces.

The implementation of a web-based visualization prototype that combines the ideas from our early design explorations is currently in full swing. This prototype is based on the popular visualization library d3.js. We are closely collaborating with the teams from Toronto and Edinburgh on iterating  its design and implementation.

Moving from questions and the interests of researchers in environmental history to interactive visualizations which support digging into data with fluid and commodity oriented inquiries is a process on continual refinement and the exploration of small and large interaction research questions.

Putting it all together: first attempt

Within Trading Consequences, our intention is to develop a series of prototypes for the overall system. Initially, these will have limited functionality, but will then become increasingly powerful. We have just reached the point of building our first such prototype. Here’s a picture of the overall architecture:

Text Mining

Our project team has delivered the first prototype of the Trading Consequences system. The system takes in documents from a number of different collections. The Text Mining component consists of an initial preprocessing stage which converts each document to a consistent XML format. Depending on the corpus that we’re processing, a language identification step may be performed to ensure that the current document is in English. (We plan to also look at French documents later in the project.) The OCR-ed text is then automatically improved by correcting and normalising a number of issues.

The main processing of the TM component involves various types of shallow linguistic analysis of the text, lexicon and gazetteer lookup, named entity recognition and grounding, and relation extraction. We determine which commodities were traded when and in relation to which locations. We also determine whether locations are mentioned as points of origin, transit or destination and whether vocabulary relating to diseases and disasters appears in the text. All additional information which we mine from the text is added back into the XML document as different types of annotation.

Populating the Commodities Database

The entire annotated XML corpus is parsed to create a relational database (RDB). This stores not just metadata about the individual document, but also detailed information that results from the text mining, such as named entities, relations, and how these are expressed in the relevant document.

Visualisation

Both the visualisation and the query interface access the database so that users can either search the collections directly through textual queries or browse the data in a more exploratory manner through the visualisations. For the prototype, we have created a static web-based visualization that represents a subset of the data taken from the database. This visualization sketch is based on a map that shows the location of commodity mentions by country. We are currently working on setting up the query interface and are busy working on dynamic visualisation of the mined information.

Share

The question is key in Trading Consequences

“Dreams are today’s answers to tomorrow’s questions.”
– Edgar Cayce

Looking back to the global trading of commodities during the 19th century we see increasing access to digitised historical record, in a myriad of forms. Today, the rate at which we can collect and store data about trading is ever expanding, from high level statistics, to low level sensor data on containers in transit. In each case, the scale of the data is rapidly outstripping the provision of tools for the effective analysis and exploration of such data. The volume of data results in historians focussing on popular commodities or analysts asking for course-grained, aggregate measures.

Image of Savannah ©iStockPhoto 2012

Instead, to understand the consequences of our trading history, historians need to ask difficult, subtle, multifaceted and challenging questions. Questions which aren’t polluted by knowledge of the limitations of the methods and technologies we have today. These insightful questions won’t come from a focus on what the tools of today can support, what the analysis or visualisation methods can do or what data is available. Simply put, if you only know about hammers, all your problems will look start to like nails. And worse than this, everyone will start to think like the carpenter, reducing the power that the breadth of inter-disciplinary expertise gives you.

Overview of Information Visualisation pipeline

Figure 1: Overview of Information Visualisation pipeline

In this project we are bringing together an inter-disciplinary research team of historians, text analysis and information visualisation experts. Instead of starting with the key “historical questions” which historians are seeking answers to, it’s very tempting to focus on one or more of the earlier technology stages as shown in Figure 1. This figure is our adapted view of the “information visualisation pipeline” [1,2]. Data comes in a variety of abstract forms without a clear physical manifestation and needs to be dynamically collected, processed, cleaned and hence mined before interactive display.

However, if we first focus on a technology stage it will impact on the questions the historians might be able ask or the approaches to be taken. Consider, for example, the rendering step in Figure 1. Modern graphics APIs (eg. OpenGL), desktop computers or even commodity displays are showing increased ease of access to 3D software and hardware. If this is our starting point, we can quickly see how 3D stereoscopic tools will emerge and will shape what (if any) questions historians might pose, with our tools.

Focussing first on the data, mining, software, algorithms, layouts, methods etc. is the wrong approach in a project such as Trading Consequences. Instead, the historical questions are key. Our challenge as a team is to ensure that at the earliest stage we do not pollute the aspirations of historians. We need to encourage the historians to ask interesting questions about the data, without being hampered by expectations of what is feasible given current technology.

Of course, over time as questions emerge, prototypes will be developed and the creation of a shared view across a team is natural. We aim to continually bring in fresh perspectives to ensure that we are answering the questions which need to be asked, rather than the questions which can be asked.

[1] Stuart T. Kard, Jock D. Mackinlay, Ben Scheiderman (1999) Readings in Information Visualization: Using vision to think. Morgan Kaufman.
[2] Ben Fry (2007), Visualizing Data: Exploring and Explaining Data with the Processing Environment. O’Reilly Media.

 

Share