If you give a historian code: Adventures in Digital Humanities – Jean Bauer Seminar LiveBlog

This afternoon I’m at UCL for the “If you give a historian code: Adventures in Digital Humanities” seminar from Jean Bauer of Princeton University, who is being hosted by Melissa Terras of the UCL Centre for Digital Humanities. I’ll be liveblogging so, as usual, any corrections and additions are very much welcomed. 

Melissa is introducing Jean, who is in London en route to DH 2016 in Krakow next week. Over to Jean:

I’m delighted to be here with all of the wonderful work Melissa has been doing here. I’m going to talk a bit about how I got into digital humanities, but also about how scholars in library and information sciences, and scholars in other areas of the humanities might find these approaches useful.

So, this image is by Benjamin West, the Treaty of Paris, 1783. This is the era that I research and what I am interested in. In particular I am interested in John Adam, the first minister of the United States – he even gets one line in Hamilton: the musical. He’s really interested as he was very concerned with getting thinking and processes on paper. And on the work he did with Europe, where there hadn’t really been American foreign consuls before. And he was also working on areas of the North America, making changes that locked the British out of particular trading blocks through adjustments brought about by that peace treaty – and I might add that this is a weird time to give this talk in England!

Now, the foreign service at this time kind of lost contact once they reached Europe and left the US. So the correspondence is really important and useful to understand these changes. There are only 12 diplomats in Europe from 1775-1788, but that grows and grows with consuls and diplomats increasing steadily. And most of those consuls are unpaid as the US had no money to support them. When people talk about the diplomats of this time they tend to focus on future presidents etc. and I was interested in this much wider group of consuls and diplomats. So I had a dataset of letters, sent to John Jay, as he was negotiating the treaty. To use that I needed to put this into some sort of data structure – so, this is it. And this is essentially the world of 1820 as expressed in code. So we have locations, residences, assignments, letters, people, etc. Within that data structure we have letters – sent to or from individuals, to or from locations, they have dates assigned to them. And there are linkages here. Databases don’t handle fuzzy dates well, and I don’t want invalid dates, so I have a Boolean logic here. And also a process for handling enclosures – right now that’s letters but people did enclose books, shoes, statuettes – all sorts of things! And when you look at locations these connect to “in states” and states and location information… This data set occurs within the Napoleonic wars so none of the boundaries are stable in these times so the same location shifts in meaning/state depending on the date.

So, John Jay has all this correspondence between May 27 and Nov 19, 1794 and they are going from Europe to North America, and between the West Indies and North America. Many of these are reporting on trouble. The West Indies are ship siezures… And there are debts to Britain… And none of these issues get resolved in that treaty. Instread John Jay and Lord Granville set up a series of committees – and this is the historical precident for mediation. Which is why I was keen to understand what information John Jay had available. None of this correspondance got to him early enough in time. There wasn’t information there to resolve the issue, but enough to understand it. But there were delays for safety, for practical issues – the State Department was 6 people at this time – but the information was being collected in Philadephia. So you have a centre collecting data from across the continent, but not able to push it out quickly enough…

And if you look at the people in these letters you see John Jay, and you see Edmund Jennings Randolph mentions most regularly. So, I have this elaborate database and lots of ways to visualise this… Which enables us to see connections, linkages, and places where different comparisons highlight different areas of interest. And this is one of the reasons I got into the Humanities. There are all these papers – usually for famous historical men – and they get digitised, also the enclosures… In a single file(!), parsing that with a partial typescript, you start to see patterns. You see not summaries of information being shared, not aggregation and analysis, but the letters being bundled up and sent off – like a repeater note. So, building up all of this stuff… Letters are objects, they have relationships to each others, they move across space and time. You look at the papers of John Adams, or of any political leader, and they are just in order of date sent… Requiring us to flip back and forth. Databases and networks allow us to follow those conversations, to understand new orders to read those letters in.

Now, I had a background in code before I was a graduate student. What I do now at Princton is to work with librarians and students to build new projects. We use a lot of relational databases, and network analysis… And that means a student like one I have at the moment can have a fully described, fully structured data set on a vagrant machine that she can engage with, query, analysise, and convey to her examiners etc. Now this student was an excel junky but approaching the data as a database allows us to structure the data, to think about information, the nature of sources and citation practices, and also to get major demographic data on her group and the things she’s working on.

Another thing we do at Princton is to work with libraries and with catalogue data – thinking about data in MARC, MODS, or METALTA record, and thinking about the extract and reformatting of that data to query and rethink that data. And we work with librarians on information retrieval, and how that could be translated to research – book history perhaps. Princeton University library brought th epersonal library of philosopher Jaques Derrida – close to 19,000 volumes (thought it was about 15,000 until they were unpacked), so two projects are happening simultaniously. One is at the Centre for Digital Humanities, looking at how Derrida marked up the texts, and then went on to use and cite in On Grammatology. The other is with BibFrame – a Linked Open Data standard for library catalogues, and they are looking at books sent to Derrida, with dedications to him. Now there won’t be much overlap of those projects just now – On Grammatology was his first book so those dedicated/gifted books to him. But we are building our databases for both projects as Linked Open Data, all being added a book at a time, so the hope is that we’ll be able to look at any relationships between the books that he owned and the way that he was using and being gifted items. And this is an experiment to explore those connections, and to expose that via library catalogue… But the library wants to catalogue all works, not just those with research interest. And it can be hard to connect research work, with depth and challenge, back to the catalogue but that’s what we are trying to do. And we want to be able to encourage more use and access to the works, without the library having to stand behind the work or analyse the work of a particular scholar.

So, you can take a data structure like this, then set up your system with appropriate constraints and affordances that need to be thought about as they will shape what you can and will do with your data later on. Continents have particular locations, boundaries, shape files. But you can’t mark out the boundaries for empires and states. The Western boundary at this time is a very contested thing indeed. In my system states are merely groups of locations, so that I can follow mercantile power, and think from a political viewpoint. But I wanted a tool with broader use hence that other data. Locations seem very safe and neutral but they really are not, they are complex and disputed. Now for that reason I wanted this tool – Project Quincy – to have others using it, but that hasn’t happened yet… Because this was very much created for my research and research question…It’s my own little Mind Palace for my needs… But I have heard from a researcher looking to catalogue those letters, and that would be very useful. Systems like this can have interesting afterlives, even if they don’t have the uptake we want Open Source Digital Humanities tools to have. The biggest impact of this project has been that I have the schema online. Some people do use the American Foreign Correspondents databases – I am one of the few places you can find this information, especially about consuls. But that schema being shared online have been helping others to make their own system… In that sense the more open documentation we can do, the better all of our projects could be.

I also created those diagrams that you were seeing – a programme that creates these allows you to create easy to read, easy to follow, annotated, colour coded visuals. They are prettier than most database diagrams. I hope that when documentation is appealing and more transparant,  that that will get used more… That additional step to help people understand what you’ve made available for them… And you can use documentation to help teach someone how to make a project. So when my student was creating her schema, it was an example I could share or reference. Having something more designed was very helpful.


Q1) Can you say more about the Derrida project and that holy grail of hanging that other stuff on the catalogue record?

A1) So the BibFrame schema is not as flexible as you’d like, it’s based on MARC, but it’s Linked Open Data, it can be expressed in RDF or JSON… And that lets us link records up. And we are working in the same library so we can link up on people, locations, maybe also major terms, and on th eaccession id number too. We haven’t tried it yet but…

Q1) And how do you make the distinction between authoritative record and other data.

A1) Jill Benson(?) team are creating authoritative linked open data records for all of the catalogue. And we are creating Linked Open Data, we’ll put it in a relational database with an API and an endpoint to query to generate that data. Once we have something we’ll look at offering a Triple Store on an ongoing basis. So, basically it is two independent data structures growing side by side with an awareness of each other. You can connect via API but we are also hoping for a demo of the Derrida library in BibFrame in the next year or two. At least a couple of the books there will be annotated, so you can see data from under the catalogue.

Q1) What about the commentary or research outputs from that…

A1) So, once we have our data, we’ll make a link to the catalogue and pull in from the researcher system. The link back to the catalogue is the harder bit.

Q2) I had a suggestion for a geographic system you might be interested in called Pelagios… And I don’t know if you could feed into that – it maps historical locations, fictional locations etc.

A2) There is a historical location atlas held by Newbury so there are shapefiles. Last I looked at Pelagios it was concerned more with the ancient world.

Comment) Latest iteration of funding takes it to Medieval and Arabic… It’s getting closer to your period.

A2) One thing that I really like about Pelagios is that they have split locations from their name, which accommodates multiple names, multiple imaginings and understandings etc. It’s a really neat data model. My model is more of a hack together – so in mine “London” is at the centre of modern London… Doesn’t make much sense for London but I do similar for Paris, that probably makes more sense. So you could go in deeper… There was a time when I was really interested in where all of Jay’s London Correspondents were… That was what put me into thinking about networking analysis… 60 letters are within London alone. I thought about disambiguating it more… But I was more interested in the people. So I went down a Royal Mail in London 1794 rabbit hole… And that was interesting, thinking about letters as a unit of information… Diplomatic notes fix conversations into a piece of paper you can refer to later – capturing the information and decisions. They go back and forth… So the ways letters came and went across London – sometimes several per day, sometimes over a week within the city…. is really interesting… London was and is extremely complicated.

Q3) I was going to ask about different letters. Those letters in London sound more like memos than a letter. But the others being sent are more precarious, at more time delay… My background is classics so there you tend to see a single letter – and you’d commission someone like Cicero to write a letter to you to stick up somewhere – but these letters are part of a conversation… So what is the difference in these transatlantic letters?

A3) There are lots of letters. I treat letters capaciously… If there is a “to” or “from” it’s in. So there are diplomatic notes between John Jay and George Hammond – a minister not an ambassadors as the US didn’t warrant that. Hammond was bad at his job – he saw a war coming and therefore didn’t see value in negotiating. They exchange notes, forward conversations back and forth. My data set for my research was all the letters sent to Jay, not those sent by Jay. I wanted to see what information Jay had available. With Hammond he kept a copy of all his letters to Jay, as evidence for very petty disputes. The letters from the West Indies were from Nathanial Cabbot Dickinson, who was sent as an information collector for the US government. Jay was sent to Europe on the treaty…. So the kick off for Jay’s treaty is changes that sees food supplies to British West Indies being stopped. Hammond actually couldn’t find a ship to take evidence against admiralty courts… They had to go through Philadelphia, then through London. So that cluster of letters include older letters. Letters from the coast include complaints from Angry American consuls…. There are urgent cries for help from the US. There is every possible genre… One of the things I love about American history is that Jay needs all the information he can get. When you map letters – like the Republic of Letters project at Stanford – you have this issue of someone writing to their tailor, not just important political texts. But for diplomats all information matters… Now you could say that a letter to a tailor is important but you could also say you are looking to map the boundaries of intellectual history here… Now in my system I map duplicates sent transatlantically, as those really matter, not all arrived, etc. I don’t map duplicates within London, as that isn’t as notable and is more about after the fact archiving.

Q4) Did John Jay keep diaries that put this correspondance in context?

A4) He did keep diaries… I do have analysis of how John Quincy Adams wrote letters in his time. He created subject headings, he analysed them, he recreated a filing system and way of managing his letters – he’d docket his letters, noting date received. He was like a human database… Hence naming my database after him.

Q5) There are a couple of different types of a tool like this. There is your use and then there is reuse of the engineering. I have correspondance earlier than Jay’s, mainly centred on London… Could I download the system and input my own letters?

A5) Yes, if you go to eafsd.org you’ll find more information there and you can try out the system. The database is Project Quincy and that’s on GitHub (GPL 3.0) and you can fire it up in Django. It comes with a nice interface. And do get in touch and I’ll update you on the system etc. It runs in the Django framework, can use any database underneath it. And there may be a smaller tractable letter database running underneath it.

Comment) On BibFrame… We have a Library and Information Studies programme which we teach BibFrame as part of that. We set up a project with a teaching tool which is also on GitHub – its linked from my staff page.

DO you think any system can be generic reused?

Have you submitted this to JORS


Who Wrote the County Surveys?

Some of the significance and much of the character of Sir John Sinclair’s ‘great pyramid’ comes from the many authors involved in reporting and writing up the surveys.  In the case of the  Statistical Accounts of Scotland, Sinclair drafted in local ministers to describe their parishes. Knowing their parishioners intimately, these men of the cloth were able to answer detailed questions about the place and the people, and frequently gave their individual opinions and perspectives on local tales, customs and morals.

The authors of the County Surveys, in contrast, were not of one profession or social position. The surveys were commissioned  from a wide range of  ‘intelligent gentlemen’, including university professors, farmers, landowners, clerics, professional writers, and political activists. Moreover,  it was planned that “every farmer and gentleman in the district” would have the opportunity to read and remark on the first series, which would be revised to incorporate all their insights before final publications in the second series. It was, in other words, to be a collective undertaking by many hands, designed to provide the board with “a greater variety of information and a greater mass of instructive observations from a greater number of intelligent men for their consideration and guidance.”* The incentive for such men to give up their time and energy was not financial, indeed several of the surveyors worked for free and most claimed only their expenses. Rather, they worked in the name of the public good and in the belief that their undertaking would be of significant value to their nation and its people.


Arthur Young, 1741-1820

While the stories of many of these contributors are lost to history, a few  were historically notable individuals. The Reverend Dr. Walker, for example, who surveyed the Hebrides was Professor of Natural History at the University of Edinburgh. A distinguished scientist with interests in botany, mineralogy and geology, and a pioneer in the study and teaching of agriculture, he had conducted exploratory tours of the Western Isles on behalf of the Board of Annexed Estates in the 1760s and 70s: a more suitable candidate for surveying these counties for the Board of Agriculture would be hard to imagine.  Where Walker was a pillar of the establishment, Charles Vancouver was a more colourful figure. Like his older brother the explorer George Vancouver (who famously charted the Pacific Coast of North America in the  early 1790s, and after whom the Canadian city Vancouver is named), Charles was a traveller and frontiersman in the American colonies. Of Dutch origin, and originally a farmer, he had spent decades working the land and writing about ‘natural philosophy’ in newly-settled Kentucky, before returning to the UK in the early 1790s. He would later work in the Netherlands, before returning to the Americas, using his ‘practical expertise’ in cultivation and farming to support himself.  Vancouver’s friend and secretary to the Board, Arthur Young, was also an author and completed the survey for Suffolk. Young began his career in a mercantile house, but was more interested in travel, literature and politics than commerce. The author of four novels, pamphlets, magazines, and a number of travelogues, he was also interested in experimental agriculture and in the rights of agricultural workers. Although his experiments did not produce revolutionary results, as an astute as a social and political observer “he remains the greatest of English writers on agriculture.” (Higgs, Dictionary of National Biography, 1885-1900, Vol. 63 p.362 )

In the combined wisdom of such fascinating, experienced and erudite writers, supported by the numerous contributors whose names are lost to posterity, the county surveys offer us insights not just into the agriculture of the time but also into the intellectual milieu and social conditions of Romantic Britain.

*all quotations in this paragraph are from Appendix G of Sinclair’s  1797 Communications to the Board of Agriculture, on Subjects Relative to the Husbandry and Internal Improvement of the Country, Volume 1. p. xlviii-xlix.


Who Bought the County Surveys?

In a recent post  (‘Who Read the County Surveys?’) I wrote about the insights that book reviews can give into historical reception and reading practices. Another interesting way of exploring reception is through researching the price of a book: for the amount that booksellers charge can give clues not just to the perceived value of the text but also the levels of disposable income available to the target markets.

The prices of the County Surveys varied, usually between 7 and 12 shillings, when they were sold on boards (this was common at the time, purchasers would then arrange for binding according to their own tastes and budgets). Using the great calculators provided on the brilliant Measuring Worth website we can see how much this equates to in today’s money (2013 is the most recent data available), as well as how it compares to the average income of the time and the labour costs of the time.

The surveys were published between 1794 and 1817, so let’s use the year 1806 in the middle of the range, as our point of comparison.  Seven shillings in 1806 equates to a real price in 2013 of £24.77. Twelve shillings equates to a real price of £42.46.  On this information, the surveys seem to be priced fairly reasonably, not particularly expensive although one would not call them cheap. This apparent affordability may be deceptive however: for, in order to really benefit from the instructive comparisons between counties that the surveys were intended to reveal, purchasers would have to buy multiple volumes.  Moreover, the real price really only indicates the relative cost of the volume, and must be read against the incomes of the time.

The average male agricultural worker in 1806 earned somewhere between £24 7s and £38 7s* per year. Let’s base our calculations on the lower end of the spectrum. There were 20 shillings to the pound, so £24 7s was 487 shillings per year, or 40.5 shillings per month: so 7 shillings is roughly 17% of the average workers monthly wage.  The contemporary income value of £24 7s is £26,710. This gives a monthly wage of £2225.83. 17 % of this is £378. This changes the picture quite significantly, suggesting the relative value of the book to a worker is much higher than the ‘real price’.  Would you spend £378 on a single book? What kind of person would have the means to do that?

We know that ‘improvement’ was the pursuit of landowners and that—notoriously in the case of the Scottish clearances—changes could be instituted at the expense of smaller tenant farmers. The figure of £378, which is for one volume rather than a set, suggests that the practical knowledge that the set of Surveys represents was only really affordable only to the relatively wealthy, rather than common agricultural labourers who likely could not have afforded the books. It thus raises interesting questions about the politics of Enlightenment improvement. To explore this further, it would be very interesting to research other reading contexts such as borrowing books: were the surveys acquired by libraries of the time (such as Innerpeffray for example), and did their members borrow the volumes?

As Measuring Worth is at pains to point out, establishing value is far from an exact science and involves subjective interpretation and, in the case of historical values, there is obviously some speculation involved. I think these figures are interesting none-the-less, and although they do not lead to reliable conclusions, they do give a bit of a sense of the historical circumstances in which the Surveys were produced and consumed.


*This figure comes from taking the average in 1832 (the closest historical match I’ve found, from this paper by Gregory Clark, University of California, and using the measuring worth calculators to get the comparable wage for 1806.



1893 map of Shetland, from Cassell’s Gazetteer of Great Britain and Ireland; Published by Cassell and Company Limited, London.

One of the aims of our current project is to establish the cost and workflow requirements for creating a complete virtual collection of the County Surveys.  Many of the surveys are already available in various online archives but discovering them is not as easy as it could be and quality and accessibility remain quite variable. In the longer term, we hope to aggregate high quality full-text files that we can use for research-led text mining.  In order to establish the projected costs and labour involved in such a project, as part of the pilot we plan to identify one or two rare surveys and digitise them according to current best practices, documenting this process for ourselves and others. Clearly, as funds are limited, it makes sense to focus on volumes that are not already available in digital form and which are rare even in print.

One such candidate is the General view of the Agriculture of the Shetland Islands by John Shirreff which was published in Edinburgh by Constable & Co in 1814. This is a volume, according to one early 19th Century reviewer, which was of a peculiarly special interest to contemporary readers for it describes “a remote part of the British dominions, with which many readers are perhaps as little acquainted as with the Islands in the South Sea; and they exhibit a state of Society very different in several respects from that which prevails in the other provinces of Britain.â€�  Indeed, comparing Orkney and Shetland to the wilds of the American frontier, he suggests the inhabitants of these northern islands belong to a different, less civilised time and “bring into view a stage in the progress of improvement at which the inhabitants of the South has arrived some centuries ago, and which had been long since passed over by the people of almost every other part of the Island.â€� (The Farmer’s Magazine 15 (Aug 1814): 343) The exoticism, snobbery and geo-political bias of these remarks seems almost comical today, but they suggest that the contents of Shetland survey may be of particular importance to historians given the apparently substantial differences from more ‘advanced’ mainland practices.  Happily we will all be able to judge for ourselves soon, because a print copy of the Shetland survey is held here in Edinburgh at the Royal Botanic Gardens and they have kindly agreed to allow its digitisation: we’ll post about this process once it gets underway.”

The Need for ‘Improvement’: On Potato Flour


pot flour

Sir John Sinclair, ‘On Potato Flour’, York Herald (1817)

One of the key motivations behind the commissioning of the County Surveys was the Enlightenment zeal for ‘improvement’, which characterised late 18th Century Britain and drove the agricultural revolution. Until this time, the fundamentals of farming had changed little over the centuries, with many farmers still using the runrig and open field systems that originated in the middle ages. Attempting to modernise and increase productivity, ‘improving’ farmers developed new principles, worked to cultivate greater areas of land and increasingly large and applied new scientific thought and discoveries to their practices. These changes had vast social implications. As the notorious Highland Clearances showed, in some case they were devastating for farming communities and rural life. Yet, as thinkers like Sinclair knew, ‘improvement’ would be key to Britain’s future, enabling the nation to support its growing population in towns and in the colonies as it became a modern industrial state.

His 1817 article ‘On Potato Flour’ gives an insight into Sinclair view. In it, he writes of recent experiments with drying and milling potatoes to create a cost effective flour that could be stored for long sea voyages ‘without being injured by vermin.’ Pointing to the documentation of this process in The General View of the Agriculture of the County of Kent he states that such new approaches must be ‘prosecuted with zeal, until so important an object as that of enabling this country to supply itself with food, from its own resources, is attained.’  Indeed such is the importance of these new methods, he concludes, that they are ‘entitled to the attention and support of the public’.  National debate on such agricultural issues is both warranted and necessary, and in this light, it appears that the intended readership of and interest in the agricultural County Surveys is likely to have been considerably broader than we might now assume.