BL Labs Roadshow 2016

1330  Introduction
Dr Beatrice Alex, Research Fellow at the School of Informatics, University of Edinburgh

1335 Doing digital research at the British Library Nora McGregor, Digital Curator at the British Library

The Digital Research Team is a cross-disciplinary mix of curators, researchers, librarians and programmers supporting the creation and innovative use of British Library’s digital collections. In this talk Nora will highlight how we work with those operating at the intersection of academic research, cultural heritage and technology to support new ways of exploring and accessing our collections through; getting content in digital form and online; collaborative projects; offering digital research support and guidance.

1405  British Library Labs
Mahendra Mahey, Project Manager of British Library Labs.

The British Library Labs project supports and inspires scholars to use the British Library’s incredible digital collections in exciting and innovative ways for their research, through various activities such as competitions, awards, events and projects.

Labs will highlight some of the work that they and others are doing around digital content in libraries and also talk about ways to encourage researchers to engage with the British Library. They will present information on the annual BL Labs Competition, which closes this year on 11th April 2016. Through the Competition, Labs encourages researchers to submit their important research question or creative idea which uses the British Library’s digital content and data. Two Competition winners then work in residence at the British Library for five months and then showcase the results of their work at the annual Labs Symposium in November 2016.

Labs will also discuss the annual BL Labs Awards which recognises outstanding work already completed, that has used the British Library’s digital collections and data. This year, the Awards will commend work in four key areas: Research, Artistic, Commercial and Teaching / Learning. The deadline for entering the BL Labs Awards this year is 5th September 2016.

1420  Overview projects that have used British Library’s Digital Content and data.
Ben O’Steen, Technical Lead of British Library Labs.

Labs will further present information on various projects such as the ‘Mechanical Curator’ and other interesting experiments using the British Library’s digital content and data.

1500 Coffee and networking

1530 BL Labs Awards: Research runner up project: “Palimpsest: Telling Edinburgh’s Stories with Maps�
Professor James Loxley, Palimpsest, University of Edinburgh

Palimpsest seeks to find new ways to present and explore Edinburgh’s literary cityscape, through interfaces showcasing extracts from a wide range of celebrated and lesser known narrative texts set in the city. In this talk, James will set out some of the project’s challenges, and some of the possibilities for the use of cultural data that it has helped to unearth.

1600 Geoparsing Historical Texts data
Dr Claire Grover, Senior Research Fellow, School of Informatics, University of Edinburgh

Claire will talk about work the Edinburgh Language Technology Group have been doing for Jisc on geoparsing historical texts such as the British Library’s Nineteenth Century Books and Early English Books Online Text Creation Partnership which is creating standardized, accurate XML/SGML encoded electronic text editions of early print books.

1630 Finish

Feedback for the event
Please complete the following feedback form.

Share/Bookmark

Digital Scholarship Day of Ideas 2014: “Data” – LiveBlog

Today I am at the University of Edinburgh Digital Humanities and Social SciencesDigital Scholarship Day of Ideas 2014 which is taking place at the Edinburgh Centre for Carbon Innovation, High Street Yards, Edinburgh. This year’s event takes, as it’s specialist focus, “data”. These notes have been taken live so my usual disclaimers apply and comments, questions and corrections are, as ever, very much welcomed.

Introduction: Prof Dorothy Miell, Head of College of Humanities and Social Science

I’m really pleased to welcome everybody here today. This is our third Digital Scholarship Day of Ideas and they are an opportunity to bring in interesting outside speakers, but also for all of us interested in this area to come together, to network and build relationships, and to take work forward. Again today we have a mixture of international and local speakers, and this year we are keeping us all in one room so we can all hear from those speakers. I am really glad to see such a popular take up for the day, and mixing from across the college and Information Services.

Digital HSS, which organised this event, is work that Sian Bayne leads and there are a series of events throughout the year in that strand, as well as these events.

Today we are going to be talking about the idea of data, particularly what data means for scholars in the humanities, how can we understand the term Big Data that we hear in the Social Sciences, and how can we use these concepts in our own work.

Sian Bayne, Associate Dean (digital scholars) is introducing our first speaker. Annette describes herself as an “itinerant researcher”. Annette’s work focuses on internet and qualitative research methods, and the ethical aspects of internet research. I think she has a real talent for great paper titles. One of my favourites is “Undermining Data” – which today’s talk is partially based on – but I also loved that she had a paper entitled “Fieldwork in Social Media: What would Manonovsky do?”. Anyway, I am delighted to welcome Professor Annette Markham.

Can we get beyond ‘data’? Questioning the dominance of a core term in scientific inquiry - Prof Annette Markham, Department of Informatics, Umeå University, Sweden; Department of Aesthetics & Communication, Aarhus University, Denmark; School of Communication, Loyola University, Chicago (session chair: Dr Sian Bayne)

As Sian mentioned I have spent a lot of time… I was a professor for ten years before I quit in 2007 and pushed myself across other disciplines, to push forward some philosophical work on methods. For the last 5 years or so I’ve been thinking about innovative and creative ways to think of methods to resonate better with the complex and complexity of modern life. I work with STS – Science and Technology – scholars in Denmark, Informatics scholars, Machine learning Scolars in Boston, Language scholars in Helsinki… So a real range across the disciplines.

The work today is around methods work I’ve done with colleagues over the last few years, much is captured in a special issue of First Monday: Vol 18, No 10: Making Data – Big Data and Beyond Special Issue. And this I’m doing from a post humanist, STS, non positivist sort of perspective, thinking about the way in which data can be used to to indicate that we share an understanding when actually, we are understanding the same information in very different ways. For some data can be an easy term, consistent with your world view… a word that you understand in your own method of inquiry. Data and data sets might be familiar parts of your work. We all come from somewhere, we all do research… what I say may not be new, or may be totally new… it may resonate… or not at all… but I want this to be a provocation, to make you question and think about data and our methods.

So, why me, well mainly I guess because I know about methods… so this entire talk is part of a bigger project where I look at method, at forms of inquiry… but looking at method directly isn’t quite right, but looking at it from the side, from the corner of your eye… And to look at method is to look at the conditions in which we undertake inquiry in the 21st century. For many of us inquiry is shaped by funding, and funding priviledges that which produces evidence, which can be archived. For many qualitative researchers this is unthinkable… a coffee stain on field notes might have meaning for you as an ethnographer but how can that have meaning for anyone else? How can that be archivable or sharable or minebale.

And I think we also have to think about what it is that we do when we do inquiry, when we do research… to get rid of some of the baggage of inquiry – like collecting data, analysing and then writing up as there are many forms of inquiry that don’t fit that linear approach. Another way to think of this is to think of frames, of how we frame our research. As an American Scholar trained in the Chicago School of Sociology is that I cannot help but cite Erving Goffman. They both tell us to focus on something, and to ignore other things… So if I show you a picture of a frame here…. If I say Mona Lisa you might think of that painting. If I tell you to look outside of the frame you might envision the wall, or the gallery, or what sits outside that frame. And if you change the frame it changes what you see, what you focus on… so if I show you a frame diagram of a sphere and say that is a frame, a frame for research what do you see? (some comment they see the globe, they see 3D techniques, they see movement). The frame tells us to think about certain phenomenon…. to also not think about others… if I say Mona Lisa now… we think of very different things… Similarly an atomic structure type image works as a very different type of frame – no inside or outside but all interconnected node… But it’s almost impossible to easily frame, again, Mona Lisa…

So, another frame – a not-quite-closed drawn circle – and this is to say that frames don’t tell you a lot about what they do… and Goffman and others say that frames work best when they are almost invisible…. like maps (except say the McArthur Corrective Map). So, by repositioning a map, or by standing in an elevator the wrong way and talking to people – as Harold Garfield had his students do – we have a frame that helps us look differently at what we do. “Data” can make us think we look at the same map, when we are not… Data may not be understood as a shortcut term of a metanym, it could be taken rather as preexisting aspects of the phenomenon – have been filtered and created through a process, and organised in some way. Not the meaning I want for my work but not good or bad…

So I want to come back to “How are our research sensibilities being framed?”. In order to understand inquiry we have to understand three other things. (1) How do we frame culture and experience in the 21st Century; (2) How do we frame objects and processes of inquiry; (3) How do we frame “what counts” as proper and legitimate inquiry?

For me (1), as someone focused on internet studies, I think about how our research context has shifted, and how has our global society shifted, since the internet. It’s networked for instance. But also interesting to note how this frame has shifted considerably since the early days of the internet… So taking an image from the Atlas of CyberSpace – an image suggesting the internet as a tunnel. But city scapes were also common ways to understand the world. MIT suggested different ways to understand a computer interface. This is about what happened, the interests in the early days of the internet in the 90s. That playfulness and radical ideas change as commerce becomes a standard part of the internet. Skipping forward to Facebook for instance… interfaces are easy to understand, friendly, almost all social media looks the same, almost all websites look the same… and Google is a real model for this as their interface has always been so clean…

But I think the significant issue here about socio-technical research and understanding has been shaped by these internet interfaces we encounter on a daily basis.

For me frame (2) hasn’t changed that much… two slides…. this to me represents any phenomenon or study – a whole series of different networks of nodes connected to the centre. There is no obvious starting point. Not clear what belongs in the centre – a person, an event, a device – and there are all these entanglements charecterising these relationships. And yet our methods were designed for and work best in the traditional anthropological fieldwork conditions… And the process is still very linear in how we understand it – albeit with iterative cycles – but it’s still presented that way. And that matters as it priviledges the neat and tidy inquiry over the messy inquiry, the inquiry without clear conclusions… so how we frame inquiry hasn’t changed much in terms of inquiry methods.

Finally, and briefly, (3) my provocation is: I think we’ve gone backwards… you can go back to the 60s or earlier and look at feminist scholars and their total reunderstanding of scientific method, and situated research. But as budgets tighten, as research is funded under more conservative conditions this stuff that isn’t well understood isn’t as popular… so we’ve seen a return to evidence based methods, to clear conclusions, to scientific process. Particularly in media coverage of research. It’s still a dominent theme…

So… What is data?

I don’t want to be glib here. The word “data” is awefully easy to toss around. It is. In every day life this term is a metanym for lots of stuff, highly specific but unspecified stuff. It is arguably quite a powerfully rhetorical term. As Daniel Rosenburg says the use of the term data has really shifted over the last few hundred years. It appeared in the 1760s or so. Many of those associated with the word only had it appear in translations posthumously. It is derived from Latin and, in the 1760s, it was about conditions that exist before arguement. Then as something that exists before analysis. And in that context data has no theoretical baggage. It cannot be questions. It always exists… has an incontrovertible it-ness. A “fact” can be proven false. But false data is still “data”. Over time and usage “data” has come to represent the entirity of what the researcher seeks and needs in pursuit of the goal of inquiry. To consider the word in my non-positivist stance, I see data as “what is data within the more general idea of inquiry”. In the mid 1980s I was taught not to use that word, we collect materials, we collect artefacts as ethnographers… and we construct… data… see even I used it there, so hard not to. It has been operationalised as discreet and uncontrovertible.

Big data has brought critical responses out, they are timely and subtle responses… and boyd and Crawford (2011) came up with six provocations for big data. And Nancy Baym (2013) also talks about all social media metrics being a nonrepresentative partial sample. And that there is an inherant ambiguity that arises from decontextualising a moment of clicking from a stream of activity and turning it into a stand alone data point. Bruno LaTour talked about this too, in talking about soil from the Amazon, of removing something form it’s context.

And this idea disturbs me, particularly when understanding social life as representated in technology. Even outside the western world, even if we don’t use technology, as Sonia Livingstone notes, we are all implicated in technology in our everyday life. So, I want to show you a very common metaphor for everyday life in the 21st century – a Samsung Galaxy SII ad. I love this ad – it’s low hanging fruit for rhetorical critique! It flattens everything – your hopes and dreams offered at equal value to services or products you might buy… and flatterns as equal in not infitesimal bits that swirl around, can be transmitted, transformed, controlled – as long as we purchase that particular phone. An interesting depiction of life as data – and humans and their data as new. It’s not unusual and not a problem as we don’t buy into it as a notion, uncritically.

This ad troubles me more. This is Global Pulse, an NGO, a sub committee of UN, that distributes data on prices in the developing world. It follows the story of a woman affected by price shifts. So this ad… it has a lot of persuasive power and I want to be careful about this arguement that I make to conclude…

I really like what we get from many big data analyses. I have nothing against big data or computational analysis. Some of the work you hear about today is extroadinary, powerful… I won’t make an arguement about data, about data to solve certain problems. I want to talk about what Kate Crawford talks about as “big data fundamentalism”. I wouldn’t go that far… but algorithms can be powerful but not all human experience can be reduced to data points. And not everything can be framed by big data. Data can be hugely valuable but it’s important to trouble what is included and what is missed by big data. That advert implies data can be understood as it happens. Data is always filtered, transformed, framed… from that you draw conclusions. Data operates within the larger framework for inquiry. We have to remember that we have strong and robust models for inquiry that do not focus on data as the core of inquiry. Data might be important – it should be the chorus not the main player on the stage. The focus of non-positivist research is upon collecting the messy stuff….

And I wanted to show a visualisation, created in Gephi, by one of my colleagues who looked at Arab Spring coverage in media and social media in Sweden… In doing this as he shifts the algorithm he is manipulating data, changing how the data appears to us, changing variables to make his case… most of the algorithms of Gephi create neat round visualisations. Alex Galloway critiques this by saying that some forms may not be representable, and this tool does not accommodate that, or encourages us to think that all networks can be visualised in that way. These visualisations and network analyses are about algorithms… So I sort of want to leave it there, to say that data functions very powerfully as a term… and that from a methodoly perspective it creates a very particular frame that warrants concern, particularly when the dominant context tells us that data is the way to do inquiry.

Q&A

Q: I enjoyed that but I find you more pessimistic than I would be. That last visualization shows how different understandings of that network as possible. It’s easy to create a strawman like this but I’ve been reading papers where videos are included in papers… the audience can all think about different interpretations. We can click on a data point, to see that interview, to see that complex account of that point. There are many more opportunities to create richer entanglements of data… we should emphasize those, emphasize that complexity rather than hide the complexity of how that data is created.

A: Thanks for finishing my talk for me! If we consider the generative aspects of inquiry then we can use the tools to be transparent about the playfulness of interrogation, by offering multiple interpretations… I talk about a process of Borrow / Play / Move / Interrogate / Generate. So I was a bit pessimistic – that Global Pulse ad always depresses me. But I agree!

Q: I was taken by your argument that human experience cannot be reduced to a single data point… what else can it be reduced to… it implies an alternative to data… so what might that be?

A: I think that question is not one that I would ask. To me that is not the most important question. For me it’s about how we might make social change – how might I create interventions, how might I represent someone’s story. I’m not saying that there is an alternative… but that discussion of data in general puts us in that sort of terrain… and what is more interesting or important is to consider why we do research in the first place, why do we want to look for a particular phenomenon… to not let data overwhelm any other arguments.

Q: I think your talk noted that big data focuses on how people are similar and what similarities there are, whilst ethnography tend to be about difference. That makes those data tracking that cover most people particularly depressing. Is that the distinction though?

A: I think I would see it as simplification versus complexity… how do we envision inquiry in ways that try to explode the phenomenon into even a more complex set of entanglements and connections. It may be about differences but doesn’t have to be… its about what emerges from a more generative process… it’s an interesting reading though, I wouldn’t disagree.

Q: I wanted to share a story with you of finishing my PhD, a study of social workers when I was a social worker. I had an interview for a research post at the Scottish Government and one of the panel asked me “and how did you analyze your data” and I had never thought of my interviews and discussions as data… and since then I’ve been in academia in 20 years but actually I’ve had to put that idea, that people are not data, aside to progress my career – holding onto the concept but learning to talk the talk…

A: I can relate to that. You hear that a lot, struggling to find the vocabulary to make your work credible and understandable to other people. With my students I help them see that the vocabulary of science is there, and has been dominant… and to help them use other terms to replace the terms they use in the inquiry, in their method… these terms of mine (Borrow / play / move / interrogate / generate) to get them thinking another way, to make them look at their work in a different way from that dominant method. These become a way that people can talk about the same thing but with less weighty vocabulary, or terms that do not carry that baggage. So that’s one way I try to do that…

Crowd-sourced data coding for the social sciences: Massive non-expert coding of political texts - Prof Ken Benoit, Professor of Quantitative Social Research Methods, London School of Economics and Political Science (session chair: Prof John McInnes)

Professor John McInnes is introducing our next speaker, Professor Ken Benoit. Ken not only talks about big data but has the computational skills to work with it.

I will be showing you something very practical…. I had an idea that I’d do something live… so it could be an Epic Fail!

So I took the UKIP European Election Manifesto… converted to plain text in my text editor. Made every sentence one line… put into spreadsheet… Then I’m using CrowdFlower with some text questions… So I’ll leave that to run…

So back to my talk… the goal is to measure unobservable quantities… we want to understand ideology – the “left-right” policy positions… we have theories of how people vote, that they vote to parties most proximate to their own positions. For political scientists this is a huge issue. We might also want to measure corruption, cultural values, power… but today I’m going to focus on those policy positions.

A lot of political science data is “created” by experts… a lot of it is, frankly, made up. A lot of it is about hand-coded text units – you take a text, you unitise it…. e.g. immigration policy statements… (Comparative Manifesto Project, Policy Agenda Project). Another way is Solicited Expert Opinion (Benoit and Laver, Chapel Hill, etc) – I worked with Laver for years looking at understanding of policies of each party. It’s expensive work, takes an expert an hour to fill out a form… real headache… We have expert-completed checklists (Polity, Comparative Parliamentary Democracy Dataset, Freedom House, etc.). And there are Coded International events (KEDS, Penn State Event Data). And we have inductively scaled quantities (factor analysis such as “Billy Joe Jimbon Factoral analysis).

So what are some of the problems of coding using “experts”. Who are experts anyway? Difficult to find coders who are suitably qualified. It’s hard to find them AND hard to train them… most of the experts coding texts tend to be PhD students who find it a pleasing thing to do whilst avoiding finishing their thesis. There can be knowledge effects since no text is ever anonymous to an expert coder with country knowledge. Human coders are unreliable – their codings of the same text unit will vary wildly. And even single coding is relatively costly and time-consuming. So only one coder codes each text. Even when you pay the experts, they are still doing you a favour!

So I will talk about an alternative solution to this problem, and that problem is about classifying text units. So the idea is to observe a political party’s policy position by content analysis of it’s texts. And party manifestos are most common texts. The idea behind content analysis is breaking text into small units and then using human judgement to apply pre-defined codes. e.g. coding something as right wing policy. And usually that is done for LOTS of sentences by only ONE coder.

Tomorrow I’ll be in Berlin… the biggest (only?) game in town is the Comparative Manifesto Project (CMP). This is a huge project with 3500 party manifestos from 55 countries from 1945-2010 though still going. Human coders are trained and have PhDs. They break manifestos into sentences, human judgement to apply pre-defined codes. Each sentence assigned to one of 56 policy categories. Category percentages of the total text are used to measure policy. And each manifesto is seen by just one coder, and coded by just one coder.

So… what could we do… crowd-sourcing involves outsourcing a task by distributing it to an unspecific group, usually in parts… based idea of this, versus expert coding is that it reduces the expertise of each of the coders, but increase the number of coders. Distribute texts for coding partially and randomly. Increase the number of coders per sentence. Treat different coders as exchangable – and anonimous, and we don’t care if sitting in internet cafe in Estonia in their underwear, or whether they engage on a day off from a bank…

The coding scheme here is to have a more simplified coding scheme. We applied it to 18 of the “big 3″ British party manifestos from 1987 to 2010. So a sentence can be coded as Economic, Social or neither… under either of the first two categories there are further options (anti, neutral or pro) from “Very left” to “Very right”, or “Very liberal” to “Very conservative”. And there is a 10 question test to show correct codings, to guide the coder and to keep them on track.

So, to get this started we wanted a comparison we understood. We wanted to compare crowd coding to expert coding. So my colleague and I, and some graduate students, coded a total of 123,000 sentences between us… With between 4 and 6 coders per manifesto and using the same system to be deployed to the crowd. This was  a benchmark for the crowd sourcing end of things. This took ages to do… we did that…. that’s a lot of expert coding… and in practice you wouldn’t get this happening… For the crowdsourced codings we got almost twice as many codings…

We used an IRT type scaling model to estimate position. We didn’t want to just take averages here… we used a multi nomial method here. We treat each sentence as an item, to which the manifesto is responding, and the left or rightness (etc) as a quality they exhibit. Despite that complexity we found that a mean of means approach led to very similar results. We are trying to simplify that multi nomial method… but now the results…

Comparing expert codings to expert surveys on economic and social positions look pretty good.. good correlation for economic particularly a thing that we’d expect – and we see.

We tested to see how best to serve up results… we tried the sentences in order and out of order. Found .98 correlation so order doesn’t matter…

For the crowd sourcing we used Crowdflower, a front end to many crowd-sourcing platforms, not just Mechanical Turk. Uses a quality monitoring system so that you have to maintain an 80% “trust” score to be rejected. Trust maintained through “gold questions” carefully selected and generated by experts…

So, we can go back to the live experiement… it’s 96% complete!

So, looking at results in two dimensions… if Liberal Democrats were actually Liberal would be right of economics and left of social… but actually they are more left on economics. Conservatives on the right socially but getting nearer the left in some cases… but it’s not about the analysis so much as the comparison with the benchmark…

When we look at expert codings versus crowd coders… well the points are all over the place but we see correlations of 0.96 for economic, 0.92 for social dimensions. So in both cases there isn’t total agreement – we have either have a small crowd of experts or a bigger crowd of non experts. Its always an average but just a matter of scale…

So, how many coders do we need? No need for 20 codes for a sentence if it’s clearly not about immigration policy… we did massively over sample, then drew sub sets there for standard error… we saw that estimates from our errors the uncertainty starts to collapse… The rate of collapse for experts is substantially steeper… for aggregate of these two processes you need five times more non-expert coders than experts. But you can run good codings with five coders…

So we did some tests for immigration policy… used 2010 British manifestos, knowing that there were two expert surveys on this dimension (but no CMP measures). Only coded immigration or not, and if immigration is positive or not. Cost about $300. Ran again, same cost, extremely similar results…

Doing this we had 0.96 correlation with Benoit 2010 expert survey. .94 correlation with Chapel Hill Survey. And between the two runs correlation of around 0.94. Would have been higher… the experts differed between the immigration policies of Labour and Conservative… were not obvious positions in the text… but they had positions that experts knew about…

So, who are these people? Who are these crowd coders? They are from all over the world… the top countries were USA, Britain, India and Estonia. One person coded over 10,000 sentences! Crazy person loves coding! The mean trust score rarely drops below 0.8 as you’ll be booted off if it does… You don’t pay or get data from those that fail. Where are these jobs being sourced? We tried Mechanical Turk… we’ve used Crowd Flower… there are huge numbers of these sites – a student looked at about 40 of these sites… but trust scores are great no matter how these people are sourced… Techniques are not all ideal… but they don’t stay in the system if trust score changes. No relationship between coder quality and platform…

Conclusions here. Non experts produce valid results, just need a few more of them. Experts have variance, have noise, so experts are just another version of a crowd with higher expertise (lower variance). Repeat experiments prove that the method is reliable (and replicable). Some places require your work to be replicatable… is data plus script a good way to do that? Here you really can… You can replicate everything here. You can redo in February what you did in December… with the right text you can reproduce the result. Why does this appeal? Well it’s cheap, it’s flexible. Great for PhD students who lack expert access. And you can work independently from big organisations that have their own agenda for a study. You can try an idea, run again, tweak, see what works… Can go back again… And this works for any data production job that is easily distributed into simple tasks… sign up for Mechanical Turk, be a worker, see what it’s like to actually do this… for instance for transcriptions of audio tapes… it’s noisy…. a common job is that they upload 5 second clips and you transcribe that… gives you pretty good human transcription that timestamps weaves back together. Better than computer method…

So, we are 100% finished with our UKIP crowdsourcing experiment… Interestingly 40 negative, 48 positive… needs further analysis…

Q&A

Q: In terms of checking coders do the right thing – do you check them at the beginning or do you check during the process of codings?

A: Here I cheated a bit… used 126 gold questions from another experiment. You have to give a reason for each question about why it’s there – if the person doesn’t get it right then they get text to explain why that is the case… Very clear unambiguous questions here. But when you deploy a job you can monitor how participants responded or if they contested it… In a previous experiment we had so many contested responses that I actually looked again and removed it…

Q: A very interesting talk… I am a computer scientist and I am interested in whether now you have that huge gold data set you have thought about using machine learning.

A: Yes, we won’t let that go to waste. The crowd data too…

Q: I am impressed but have two questions… you look at every sentence of every manifesto… they are funny things as not every sentence is about the thing you are searching for – how do you deal with that? And a lot of what is in manifestos are sort of dog whistle things – with subtexts that the reader will pick up, how do you deal with that in crowdsourcing?

A: You get contextual sentences around the one you are coding, that helps indicate the relevance of that sentence, it’s context. In terms of the dog whistle question… people think that but manifestos are not designed to be subtle. They actually tend to be very plain, very clear. It’s rare for that subtlety to be present. Want truly outrageous immigration policy look at the BNP manifesto… every single area is about immigration, not subtle at all.

Q: I’m a linguist, I find it very interesting… and a question about tasks appropriate to crowdsourcing. Those that can be broken down into small tasks, and that your participants can relate to their daily life. I am doing work on musical interpretation… I need experts because I can’t see how to do that in language, in a way that is interpretable to non experts…

A: You can’t give something that’s complex… I couldn’t do your task… you can’t assume who your crowd is, we have very little information… we didn’t ask about language but they wouldn’t retain that trust score without some good English language skills. But workers have a trust score across projects so anything they can’t do they avoid as losing that score is too costly… You could simplify the task with some sort of task that can test corect or incorrect interpretation… but we keep the task simple.

Q: A very interesting talk, I have a quick question about how you set the right price for these tasks… how do you do that? People come from different areas and different contexts.

A: Good question. We paid 2 US cents per sentence. We tried at 5 cents and it was done very fast but quality wasn’t better. A job at 1 cent didn’t happen fast at all. So it’s about timings and pricing of other jobs.

Q: Could you say something about the ethics of this kind of method… you are not giving much consideration to the production of these texts, so I wondered if you could talk about the ethics of this work and responsibilities as researchers.

A: Well I didn’t ruin any rainforests, or ruined any summers. These people have signed up for terms and conditions. They are responsible for taxation in their jurisdiction. Our agreement with Crowdflower gives them responsibility. And it’s voluntary. Hopefully no sweatshops for this… I’m receptive to the idea of what ethical concerns could be… but couldn’t see anything inherently wrong about the notion of crowdsourcing that would be a concern. Did run past ethics committee at LSE. Didn’t directly contact people, completing tasks on the internet through third party supplier.

Q: You were showing public domain documents… but for research documents not in the public domain how would security be handled…

A: Generally transcriptions are private… but segments are usually 3 or 5 segments… like reading a document from the shredder basket… the system have that data but workers do not have access to that system

Q: But the system does have that so you need trust in the platform…

A: Yes.

Comment from floor: companies like Crowdflower have convinced companies to give them data – doctors notes etc. they have had to work on making sure they can assure customers about privacy of data… as a researcher when you go in you can consider what is being done in that business market in comparison

Q: Have you compared volunteer coders to paid coders? I am thinking particularly about ethical side of things and motivations, particularly given how in political tasks participants often have their own agendas. Might be interesting to do.

A: Volunteer crowdsourcing? Yes, it would be interesting to compare that…

Reading Data: Experiments in the Generative Humanities – Dr Lisa Otty, Lecturer in English Literature and Digital Humanities, University of Edinburgh (session chair: Dr Tom Mole)

Dr Tom Mole is introducing our next speaker, Dr Lisa Otty whose interests are in the relationship betweeen reading, writing and the technologies of transcription. And she will be talking about her work on Reading Poetry, and the process of what happens when we read a poem.

Now to be  a literature scholar speaking at an event like this I have to acknowledge that data is not a term typically used in our field. When you think about what we are used to reading texts are often books, poems… but a text is not neccassarily a traditional material but may also be another linguistic unit, something more complex. Taking the Open Archival Information Systems (CCSDS 2002) describes data as “a reinterpretable representation of information in a formalized manner suitable for communication, interpretatio, or processing”. Interpretation being crucial there. When we look at texts like books or poems those are “cooked” – edited, curated, finished. Data is too often not seen as that.

Johanna Drucker – in Humanities Approaches to Graphical Display (DHQ 5.1 2011) talks about data as Taken Not Given, Constructed from the Phenomological World. Data passes itself off as a priori conditions, as if same as phenomena observed, collapsing the critical gap between the data collection and observation.

Some of these arguements gel with some of the arguements around close versus distance reading. And I think it can therefore be more productive to see data as a generative process…

Between 2009-2012 I was involved in the research project Poetry Beyond Text (University of Glasgow, and University of Kent). This was a collaborative project so inevitably some of my reflections and insights are also collaborative and I would like to acknowledge my colleagues work here. The project was looking at interpretation of poetry, and particular visual forms of poetry such as artist boks. What these works share is that they are deeply resistent to being shared as just information.

For example Eugen Gomringer’s (1954) “silencio” is an example of how the space is more resonant than the words around it… So how do we interpret these texts? And how do our processes for interpretation effect our understanding. One method, popular in psychology, is eye tracking… a physical way of registering what you are doing. We combined eye-tracking with self-reporting. Eye Tracking takes advantage of the movements of a small area of the retina. So a map of concentration sees those little jumps, those movements around the page. But it’s an odd process to be part of – you wear a head brace with a camera focused on your eye. You get a great deal of data from the process. Where more concentration that usually indicates trickiness or challenge or interest in that section – particularly likely for challenging parts of text. From this data you can generate visualisations from this data. (We are watching a video of eye tracking process for poetry).

Doing this we found a lot of patterns. We saw that people did focus and understand space, but only when that space has significance in the process. In poems where space is more conceptual than nemetic. But interestingly people who recorded high confusion also reported liking them much more… With experiments with post linear poems the cross-linear connections. All people start with a linear reading patterns before visual reading. And that reflects the colour strip test – psychology test that shows that visual information trumps linguistic information… so visual readings and habitual reading processes are hard to overcome. We are programmed to read in a certain way… our habits are only broken by obstacles or glitches in the text we are reading…

Now talking about this project if I talk about findings I am back in that traditional research methods… and that would be misleading. We were a cross disciplinary team and so I am particularly interested in focusing on that process, on how we worked on that. The eye tracking data generates huge amounts of numerical data… we faced real challenges in understanding how to understand, to read this data… a useful reminder of the fact that data’s apparent neutrality has real repurcussions. Its one thing to make data open, another to enable people to work with it.

To my colleagues in psychology didn’t understand our interest in visualisations of numerical eye tracking data, it is an abstraction… and you have to understand the software to understand how that abstraction works. Psychologists like to interpret the data through the numerical data. They see visualisations, graphs etc. as having a rhetorical rather than analytical function. Our team were interested in that rhetorical function. We were humanists running an experiment – the framework was of hypotheses, of labs, of subjects… but the team came from creative practice background so this sense of experiment was also in play. In it’s broadest terms experiments are about seeing something in process and see how they behave, for scientists about testing hypotheses in this way, creative experiements rather different… For humanist analysis of these texts you have to deal with a huge number of variables, very much a contrast to traditional psychology experiements. For creative experiments there is a long tradition of work in surrealism, dadaism, etc. that poetry can unleash and disrupt our traditional reading of texts… they are deliberately breaking our habits. The reader of the literary form is a potentially revolutionasible(?) subject.

In Literary scholarship and humanities the process of reading is social, contextualised process. In psychology reading is a biomedical process, my colleagues in this field collapse the human and machine. In a recent article by Lutz Koepnick asked Can Computers Read? (2014) and discussed the different possible understandings of what reading is for.. our ideological framework of reading means to us… computational reading is less about what computers are, more about how we invest in them and envision them.

One of the things that came out of our project was the connections between poetry and psychology, and the connections to creative experiments.

To finish I want to talk about some examples of experiments around reading and what reading can mean.

The readers project – John Cayley and Daniel Howe (2009 – ) their work explores imaginative critiques of reading. Cayley is a literary scholar and has been working in digital production for some time. The readers project features “programmed autonomous entities”. Each reader moves through a text at different speeds and in different ways. So for each part of the experiment projections are used, and they are often shown with books, a deliberate choice. A number of interfaces are available. But these readers move according to machine reading rather than biomechanical reading. Cayley terms this an exploration of vectors of reading… directions in which reading might take of. It explores and engaged with new creative understandings of reading. This seems to be seen by Cayley in avant garde context. Emphasis on constructed nature of the work.

“because the project’s readers move within and are thus composed by the words within which they move, they also, effectively, write. They generate trxts and the traces of their writings are offered to th eproject’s human readers as such, as writing, as literary art.” (Cayley, The Readers Project website).

As someone engaging with these pieces the experience is of reading with, more than processing or consuming or analysing.

Tower – by Simon Biggs and Mark Shovman (2011), working at Hive, uses knowledge of natural language processing to build visualisations. When the interactor speaks their words spiral around them. And other texts are also present – the project is inspired by the Tower of Babel and builds up and up. Shovman’s previous work at Hive was on geometric structure. Biggs hope is that participants “will be enabled to reflect upon the inter-relations of the things that they are experiencing and their own contingency as part of that set of things.”

Michelle Kendrick talks about hybrids, that hybrid of human and machine interaction, the centrality of human investment in computer reading.

When I talk about this work I am overwhelmed by the rhetorical significance of words like “experiment” and the dominance of scientific research methods – the first interpretation of this work is often wrongly around seeing the work as applying scientific methods to literary interpretation.  But instead this work is about interpretation and exploring methods of understanding and interpretation.

Q&A

Q: You talked about different disciplines coming together. Do you think there is a need for humanities researchers to understand data and computational methods?

A: I think we would all benefit from a better understanding of data and analysis, particularly as we move more and more into using digital tools. I’m not sure if that needs to be in the curriculum but it’s certainly important.

Q: One of the interesting things about reading is the idea of it being a process of encoding and decoding… but the code shifts continously… and a challenge in experimental reading or interpretation is that literature is always experimental to some extent because the code always changes.

A: I think the idea of reading as always being experimental… I think that experimental writing is about disruption… less about process but more about creating challenge.

Q: I was very struck in what you were presenting there in the Poetry Beyond Text project about the importance of spatiality and space… so I was wondering about explicit spatial understandings – the eye tracking being a form of spatial understanding…

A: We were looking at the way that people had been interpreting those texts in the past, in the ways people had looked at that poetry in the past… they had talked about the structural work of the poets themselves… and we wanted to look beyond that…We wanted to find out people’s responses to some of these processes, and what the relationship was between that experience and those critical views of those texts.

Q: Did you do any work on different kinds of readers – expert readers or people who had studied these works?

A: It was quite a small group but we looked at the same people over time and we did see development over time. We worked mainly with students in literature or art and most hadn’t encountered this type of concrete poetry before but were well experienced with reading.

Q: I wanted to ask you about the ways in which we are trained to read… there are apps showing images of texts very very quickly, are we developing skills to read quickly rather than more fully and understand the text.

A: There was a process of rapid image showing to the eye (RSVP was the acronym) – to allow you to absorb more quickly but in actual fact that was quite uncomfortable. We do see digital texts playing with those notions. I don’t think we will move away from slow reading but we are seeing more of these rapid reading processes and technologies.

Chair: Kinetic Text project works in some of these ways, about focusing eye movement…

A: The text can also manipulate eye movement and therefore your reading and understanding of the text. Very interesting in that respect.

Algorithm Data and Interpretation - Dr Stephen Ramsay, Associate Professor of English at the University of Nebraska; Fellow at the Center for Digital Research in the Humanities (session chair: Prof James Loxley)

James Loxley is introducing our next speaker, Dr Stephen Ramsay.

I want to say that my mother is from Ireland, a little place west of here, and she said that if she had ever been to University it would have been to University of Edinburgh which she felt was the best in the world.

Now I was planning to teach a technical talk – I teach computer science in an English faculty. But instead I’m going to talk about data. So I’m going to start with the 1965 blackout of New York. At the time it was about disaster, groping in the dark, a city stranded. But then 9 months later they ran stories on the growth in birth rates, a sharp rise across hospitals across the state. All recording above average numbers of births. Although one report noted that Jewish hospitals did not see an increase. Sociologists talked about the blackout as in some way responsible… three years later a sociologist published a terse statement showing no increase in births after the Great Blackout. This work looked at average gestation period and noting that births would have been higher from June through to August, not just in August… but he found that 1966 was not unusual or remarkable. Black Out Babies were a myth…

You could read this tale as a cautionary one about the misuse of data. But I think this can be read another way… the New York Times piece said something about human nature – people turning to each other when power out is a sad reflection on the place of television in our life, but a hopeful narrative for humanity. And citing birth rates and data and using scientific language adds to that. And the comments about Jewish people shows prejudice. But at the same time that subsequent analysis frames the public as prone to fantasy, as uninformed, with the scholar overcoming this…

The idea of “lies, damn lies, and statistics” encourages us to always look for falsehood hiding behind truth… so we think of what stories we are being told, and what story we want to tell. It’s simple advice that is hard to do. I want to give a different spin on this. I think that data is narrative automatic. the way we use data is instructive – we talk about lists, numbers… Pride and Prejusice does not seem to be a data set unless we convert it. It gains narrative in transformation. The data can be shown to show and mean things – like stories, stories waiting to be told… data doesn’t mean anything by itself, someone has to hear what it is saying…

What does data look like in its pre interpretive state? There is an internet site called “Found” – collecting random items such as notes, cards, love letters, shopping lists. Materials without their context. Abandoned artefacts. All can be found there. But the great glorious treasure of Found is it’s lists…

[small pause here for technical difficulty reasons]

These lists are just abandoned slips of paper… one for instance says:

beer

neat

dogfoot

domestic

stenga

another:

roach spray

flashlight

watermellon

The spareness and absence of context turns these data-like lists turns them, quickly into narrative… not all are funny… one reads:

go out for a walk with someone

speak with someone

watch tv

go out to cemetry to speak to mom

go to my room

Have you ever wanted to give your data a hug? Bram Stoker said in writing Dracula he just wanted to write something scary… his novel is far more interesting without him as the interpretations of others are fascinating and intriguing… Do facts matter in the humanities? In some areas… who painted a picture, when a treaty was signed… these are not contingent truth claims… surely we can say fact is a good word for those things that are not subject to debate. Scholars can debate whether a painting is by Rembrandt or his school, that debate is about establishing a fact. But facts still matter…

If we look at Rembrandt’s Night Watch the lighting of the girl equating to that of the captain is intriguing. If he said it meant nothing we’d probably ignore him… The signing of a treaty may be a fact but why it occured is much more interesting. Humanities are about that category 1 inquiry more than the category 2 fact inquiries. Often this is the critique of the humanities and the digital humanities, Jonathan Gotschil insists that the humanities should embrace scientific approaches and sense of optimism… And sees the sciences as doing a better job of this stuff but that “what makes literature special” should be retained… he doesn’t say what those things are. There are unsettled matters if one takes scientific approaches. Of course Gotschil’s nightmare is to understand data with the same criticality we apply to Bram Stoker, questioning it’s being and meaning… and I suggest we make that nightmare a reality!

[More technical issues… ]

What I wanted to show you was a list of English Novels [being read to us]… It is a list, from Hoover, organises novels in terms of breadth of the vocabulary in that list. I have shown this list to many people over the last few years, including many professors… they see Faulkner and Henry James at the top and approve of that and of Mark Twain…. and young adult novel writers at the bottom… but actually I read you the list in ascending order… Faulkner and James are at the bottom. Kipling and Lewis are at the top. And there it starts… richness is questioned… people want to point out how clearly correct the answer is, despite having given the wrong answer; some explain that the methodology is flawed or misreported… these are category 1 people being annoyed by category 2 reality…

But when we stop using it as a Gotcha it is a more provocative question… each of these titles contains a thousand, a hundred thousand thoughts and connections… it is what we do… as humanists we make those connections… we ask questions of the narrative we have created… part of our problem is a general discomfort with lettinng the computer telling us what is so… but if we stop doing that we might see peculiar mappings of books a cultural objects… it might show us a way to deeper understanding of reading itself… it raises any number of questions about the development of English style… and most of all it raises questions of our discursive paradigms.

That gives us narrative possibilities we could not see. We cannot think of text as 50k word blocks. The computer can ONLY apprehend the text in such terms. To understand the computer as finding facts is to miss the point. It is about creating triggers to ask questions, to look at the text in new ways. This is something I came across working on Virginia Woolf’s The Wave. The structure is so orderly… and without traditional cultural narrative. And they speak in very similar styles, sentence structures, image patterns… some see some difference between gender or solidarity… but overall it is about unity… this is the sort of problem that attracts text analysis scholars like myself. I ran algorithm clustering models looking for similaritudes unseen by scholars. On a lark we posted a simple question… “what are the words that the women in the novel use in common, that none of the men do?” and it turns out that there are 9 such words. Could see that as a narrative – like a Found list – and then we did it with men and found 120 words! Dramatic. So many words… Some critics found that disparity frightening… some think it backs up sexism of western cannon. Others see this as a chance to ask another questions… to try with other authors, novels, characters… if you think this way, perhaps you’ve caught the DH bug, I welcome you. But do we think we’ll find an answer to questions of gender and isolation? Do we want to answer those? The humanities want a world that is more complex, deeper than we thoughts. That process is a conversation…

In 2015 the Text project will release huge volumes of literature. Perseus contains most greek texts… there are huge new resouerces. almost all questions we ask of these corpuses have not been asked before… we can say they will transform the humanities but that may not be true… the limiting factor is whether we choose to remain humanists in the face of such abundance… perhaps we need to be programmers, tool builders, text engineers… many more of us need to invite the new texts – lists, ngrams, maps etc. – into our ongoing conversation. We are here to talk about philosophical issues of data and these issues are critical… but we have to be engaging with these questions…. Digital humanities means databases, mark up, watermelon…!

Q&A

Q: I am intrigued to think about how we design for the things we don’t know what we need to know…

A: Sure, imagining what we don’t know… you inevitably build your own questions into the tools… ironically an issue for scientific methods. The nice thing about computers is that they are fast, obedient and stupid. They will do anything we ask them to, even our own most stupid ideas, huge serendipity just baked into that! Its a problem but its amazing how the computer does that job for me, surprisingly.

Q: That was a brilliant fascinating talk. Part of the problem with digital humanities for literature right now is that it either tells us what we do know… or it tells us what we don’t know but then we worry that it’s wrong… The description of the richness list was part of that. I really liked your call for an ongoing discussion that includes computer generated data… but I don’t see how we get past the current description. If all literary criticism says something is so, and expects “yes, but…” I can see how computer generated data sits in that… but how can data be a participant in that conversation – beyond ruling something out, or concurring with expectations.

A: Excellent point and lets not downplay at all the first part of your question. I saw Franco Morelli give a talk about titles getting shorter for instance… who’d have thought?! But I think it has a lot to do with how we build our tools… I find it frustrating that we all use R, or tools designed for science or psychology… I want our schools to look more like the art-informed projects Lisa talked about. I think the humanities needs to do more like that, to generate the synergies. Tools that are more ludic.

Q: May be to be about perceived barriers being quite high. An earlier speaker talked about the role of repeatability. Ambiguity reading a poem is repeatible. if barriers to entry low enough for repitition and for others to play, to ask new questions, maybe that brings the data in as part of the conversation…

A: There are tools that let you play with the text more ludically. Voyant for instance. But we come with a lot of cultural baggage as humanists… there is a phenomenon that… no matter what they are talking about they give a literary critical reading of a text but when they show a graph we all think we are scientists… there is so much cultural baggage. We haven’t learned how to be humanistic users of these tools, or to create our own tool.

Q: A question and an observation… There is a school of thought in cognitive psychology that humans are infinitely able to retrofit any narrative to any circumstances whatsoever, and that is very much what was coming through your data… Many humanities departments have become pseudo social sciences departments… but if you don’t have a clear distinction between category 1 and category 2 they can end up doing their own thing…

A: I don’t want the humanities. I resist the social science type study of literature, the study of human record or of the human condition… when we are talking about… in my own work I move between being a literary critic and being an engineer… when it comes to writing software that method definition is wrong, it doesn’t work… when I am a literary critic it is about all those shades of grey, those complexities… but those different states both seem important in pursuit of that end goal… if studying flu outbreaks lets not be ludic… but for Bram Stroker then we should!

Q: In my own field of politics there was a particular set of work which gave statistical data a bad name… and I wonder in your field is the risk of the same is there…

A: In digital literary studies this is sometimes seen as a 25 year project to get literary profs into the digital field.. but I always say that that’s not true, there’ll always be things to be done. There was a book in the 70s that looked at slavery in an entirely quantitative way, it made the arguement no one wanted to hear, that slavery had been extremely lucrative. Economists said that it’s profitable. History fled from statistical methods for years after that… but they do all agree that that was profitable. And there is quantitative work there again/still. If I had to predict I’d say the same thing for digital literary studies does seem likely…

Q: I can’t resist one here… I was following a blog by Kirsch where you say that scholars should code and I wanted to ask about that…

A: OK, well Kirsch lumps me in with the positivists… I’m not quite in the devils party. But I teach programming and software engineering to humanists. Its extremely divisive… My views have softened over the years… for me programming is a magnificant intellectual excercise… knowing about it seems to help understand the world. But also if you want to do research in this area you need some technical skills. If that’s programming… well learn what you need whether thats GIS, 3D Graphics… if you want to build things you might need coding!

Big Data and the Co-Production of Social Scientific Knowledge - Prof Rob Procter, Professor of Social Informatics, University of Warwick (session chair: Prof Robin Williams)

Professor Robin Williams is now introducing Professor Rob Proctor, our next speaker, talking about his work around social informatics.

The eagle eyed amongst you will spot my change of title – but digital is infinitely rewritable! I am working in the overlap of sociology and computational tools and methods. So, the second thing I want to talk about is Sociology in the age of “big data”. I think what this demonstrates is the opportunities for sociology to respond in various different ways to this big data, and tools to interrogate that data. The evolving of tools and methods is a key thing to look at in the area. So that brings me to the Collaborative Online Social Media Observatory (COSMOS) and tools we are developing for understanding social media… and then I want to talk about Sociology beyond the academy – knowledge co-produced of social scientific knowledge. But there are other types of expertise being mobilised at the moment, in looking at the computational turns things are taking. Not always a comfortable thing for social scientists…

So firstly Social Informatics. So what is that? Well to me its the inter-disciplinary study of factors that shape adoption and use of ICTs. And what gets me excited is how these then move into real processes. And for me the emphasis on innovation as public, participatory process of experimentation and learning where meanings of technologies are collaboratively explored and co-produced. In social media you can argue that this is a large scale experiment in social learning… Of course as we witness growing scale of adoption more people experience those processes: how social media works, how they might adopt or use it… to me this is a fascinating area to study. And because it is public and involves social media it is very easy to see what’s going on… to some extent. And generally that data is accessible for social research purposes. It is not quite that simple but you can research without barriers of having to pay for data if you do it in a careful way.

So these developments have led me into social media as a prime area of my research. So firstly some work we did on the impact of Web 2.0 on scholarly communications – work with Robin Williams and James Stewart – many of us will be part of this, many of us tweet our research… but many of us are not clear of what that means, what the implications are. So we did some work, got some interesting demographic research… we also did interviews with people and got ideas of why they were, and why they were not adopting… Some very polarised. And in parallel we looked at how scholarly publishers incorporate social media tools into their work, in order to remain key players… they do lots of experiments and often that is focused on measuring impact and seeing the movement of their work to other audiences. Some try providing blogs on their content. But that is all with mixed success. A comment notes that it is easier to get comments on cricket reports than on research online… So it’s hard to understand and capture impact…

I’ll come back to that and about co-creation of knowledge. But first I want to talk about the riots in England in 2011. This was work in conjunction with the Guardian Newspaper. They had been given 2.5 million tweets directly by Twitter. They wanted to know if social media was particularly vulnerable for sharing false information, did that support calls for shutting down social media at times of crisis? So we looks at a number of different rumours known about and present in the corpus: zoo animals on the loose; london eye on fire; miss selfridge on fire; rioters attack a children’s hospital in Birmingham. I will talk about that latter example. But we wanted to ask about how people use and understand and interpret social media in these circumstances, how they engage with rumous…

So this is about sociology in the age of “big data”. It calls for interpretive methods but we can’t do that at scale easily… so we need computational methods to focus scarce human resources. We could crowdsource some of this but at this scale that would still be a challenge…

So firstly lets look at the work of Savage and Burrows (2007) talked about the “coming crisis of empirical sociology” because the best sociology, as they saw it, was conducted by private companies who have the greatest and most useful data sets which sociologists could not rival nor access. However we might be more confident about the continuing relevance of social sciences… social media provides a lot of born digital data… maybe this should be entitled the “social data deluge”. There is a lot of data available, much of it freely available. Meanwhile lots of policy initiatives to promote open data in government for/by anyone with a legitimate usage for it. Perhaps we can be more confident about the future of academic sociology…

But if you see the purpose this data is put to, its a more mixed picture… so we see analysis of social media for stock market prediction. But here correlation is mistaken for causality. Perhaps more interesting are protest movements – like occupy wallstreet – or use of social media during the Egyptian revolution… It is a tool for political change, a way for citizens to acquire more freedom and change? Is it a movement to organise themselves? Lots of discussion of these contexts. Methodologically its a challenge of quantity, and methods that combine social science understanding with social media tools enabling analysis of large scale data…

So back to that rumour from the riots and that rumour of a children’s hospital being attacked in Birmingham. This requires thorough work with the data, but focused where it counts.

So, what sparked this off was someone tweeting that the police were assembling in large numbers outside the hospital… therefore the hospital must be under threat. A reasonable inference.

So, methodologically we undertook computational methods for analysing tweets in an active area of research: sentiment analysis; topic analysis. We combine a relatively simple tool looking at information flows… and then looking at flow from “opinion leaders” to others (e.g. RTs). Once that information flow analysis has been done we can then take those relative sizes to analyse that data, size as proxy for importance… this structure, we argue, is relatively useful for focusing human effort. And then we used coding frames for conventional qualitative methods of content analysis to understand how Twitter was used – to inductively analyse information flow content to develop a “code frame” of topics; use code frame to categorise inofrmation flows (e.g. agreement, disagreement, etc.); and then we used visualisation around that analysis of information flows…

So here we see that original tweet… you see the rumour mushroom, versions appear… bounding circles reflect information flows… and individuals and their influence… Initially tweets agree/repeat… and we then start to see common sense reasoning: those working or nearby dispute the threat, others point out that the police station is next door to the hospital thus providing alternative understanding. People respond and do not just accept the rumor as true… So rumours do break quickly BUT they are not neccassarily more vulnerable as versions and challenges quickly appear to provide alternative likely truth. That process might be more rapid with authoritative sources – media or police in this case – adding their voice. But false information may persist longer, with potential risk to public safety – see follow on Pheme project.

But I wanted to talk about authoritative sources again. The police and media and how they use social media. The question is what were the police doing on twitter at that time? Well another interesting case here… riots in Manchester led to people creating new accounts to draw attention to public bodies like the police, as an auxillery service to raise awareness of what was going on. Quite an interesting use of social meidia where these see something like this arising.

So what these examples demonstrate is innovation as a co-production… lots of people collectively experimenting, trying out things, learning about what social media can and cannot do. So I think it’s a prime example for sociologists. And we see uses are emergent, people learn as they use… and it continues to change and people reinvent their own uses… And we all do this, we have our own uses and agenda shaping our interactions.

So this work led to development of tools for use by social scientists… COSMOS involved James S, Ewan K, etc. from Edinburgh… It would be an error to assume social media can tell us everything that takes place in the world – this data goes with crime data, demographic data, etc. The aim of COSMOS is to forge interdisciplinary working between social and computing scientists. To provide open, sustainable platform for interoperable social media analysis tools. And refine and evolve capabilities, provide service models compatible with needs of diverse user communities.

There are existing tools out there for social media analysis… but many are blackbox systems, its hard to understand that process that is taking place. So we want those blackbox processes to be opened up, they are complex but can be understood and explored…

So the Cosmos Tools let you view timelines, to look at rates and flows… to look for selection based on keywords and hashtags… and to view the networks of who is tweeting… and to compare data with demographic data.

Also some experimental tools around geographical tools for clustering. The way people use Twitter can show geographical patterns. Another factor is about topic modelling, topic clustering… identifying tweets on the same topic. This is where NLP and Ewan and his colleagues in Informatics has become important.

So current research looking at: Social media and civil society – social media as digital agora; “hate” speech and social media – understanding users, networks and information flows –  a learning challenge here about people not understanding impact and implications of their comments, perhaps a misunderstanding of social media… ; citizen social science – harnessing volunteer effort; social media and predictions – crime sensing, data integration and statistical modelling; suicide clusters and social media; humanitariansim 2.0 – care for the future; BBC World Service – tweeting the olympics. And we have a wide range of collaborators and community engagement.

Let me briefly talk about social media as digital agora… may sound implausible… many talk about social media as a force for change… opportunities to promote democracy… not just in less democratic countries, but also democratic countries where processes don’t seem to work as well… So we are looking at social media in communicative, in smaller communities. And also thinking about social resiliance in a day to day small scale way… problems which if not managed may become bigger issues. For that we have studied Twitter in several locations, collected data, interviewed participants… and built up a network of communications. What is interesting, for instance, is that non governmental group @c3sc seems to have big impact. We have to see how this all plays out… deserves longitudinal approach…

So, to conclude… let me talk about the lessons for academic sociology… and I think it’s about sociology beyond the academy and the role of wider players. Firstly data journalism – was interested in Steven’s 1965 press accounts of the black out earlier. Perhaps nowadays the way journalists are being trained might change that… journalists are increasingly data savvy. We see this through Fact Check, through RealityCheck blog… through sourcing from social media. So is citizen journalism, used to gather evidence of what is happening… tools like Ushahidi… and a sense of empowerment for these communities… reminds me of notion of sousveillance… and the possibility of greater accountability… And Citizen Journalism in the expenses scandal – guardian recruited people to look at the expense claims. The journalists couldn’t do that externally… so recruited others.

So, citizen social science… in various ways (see Harris 2012 “Oh man, the crowd is getting an F in social science”. And Ken Benoit’s work discussed earlier… we see more people coming into social science understanding…

So the boundaries of social science research production are becoming more porous, social scientific knowledge production is changing, potentially becoming more open. These developments create an opportunity to reinvigorate the project for a “public sociology” – as per Burawoy (2005) and his call “For a public sociology”. to make sociology accountable to more people, to organisations, to those in power. Ethically we need to ask what is needed and wanted, how the agenda is set, how to deliver more meaningful and useful social sciences to the public.

How can we do that? New modes of scholarly communications, technology, but it’s not enough… we’ve also been working with a company on a  possible programme for the BBC where social media is used to reflect on the week, a knowledge transfer concept. Also knowledge transfer in the Pheme project – for discriminating false and true information… all quite conventional… but we need other pathways to impact… with people as sensors and interpreters of social life, training and capacity building – in ways we have not done before, and something that has emerged in science and citizen science has been the notion of workshops, hackathons, getting people engaged in using mundane technologies for their own research (e.g. Public Lab), we need something similar for tools, social media, to extract data they want for their purposes for their agenda… to create more public sociology that people can do themselves. And we need to also have an open dialogue about research problems.

Q&A

Q: My question is about COSMOS and the riot rumours stuff… within COSMOS do you have space for formal input around ethics and law… you cut close to making people identifiable and locatable. And related to that… with police in those circles… may arouse suspicions about motives… for instance in Birmingham did police just monitor or did they tweet.

A: They did tweet but not on that rumour. It is an understandable concern that collaborations make powerful state actors more powerful… for us we want these technologies available for anyone to use them… not some exclusive arrangement, should be available to communities, third sector organisations… anyone who feels that social media may be important in their research

Q: I was more concerned about self-led vigilantes, those who might gang up on others…

A: A responsibility of civil society to be aware of those dangers, to have mechanisms to avoid harm. It does exist already… so if social media becomes instrument of that we have to respond and be aware – partly what hate speech project is about… Bigger learning problem is about conduct in social media space. And the probably issue that people don’t realise how conduct quickly becomes visible to much bigger group of others… and that relates to ethics… twitter is public domain space but when something is highlighted by others… we have to revisit the ethics issues time and again… for the study for the riots we did the usual clearance process… Like Ken we were told it was fine… but don’t make identifiable but that is nearly impossible in social media. Not an easy thing to resolve.

Q: I’m curious about changes in social media platforms and how that effects us… moves from facebook to twitter to snapchat to instagram… how does that become apparent, may be invisible, how do we track that..

A: There is a fundamental issue of sustainability of access to data from social media. Not too much of a problem to gather data if you design harvesting appropriately for their rate limits. In terms of other platforms, and people moving to them, and changes in modality and observability and accessibility of data… what social research needs is agreement with providers of data that, under certain conditions of access, that their data is available for research.. to make access for legitimate data easy. There are efforts to archive data – Library of Congress collects all tweets. Likely to allow access under license I think, to ensure access to platforms as use of platforms change…

Edinburgh Data Science initiative – Prof Dave Robertson, Head of School of Informatics

Sian Bayne quickly introducing Dave Robertson providing a coda to today’s session.

I’m just briefly going to talk about the Edinburgh Data Science Initiative. The ideas being data as the catalyst for change in multiple academic disciplines and business sectors.

So firstly the business side… big data can be very big and very fast… that can be off-putting in the humanities… And you don’t have to build something big to be part of this… I work in these areas but my models are small… and there is a stack you never see – economic and political side of this stuff.

And here’s the other one… this is about variety and velocity – a chart from IBM – looking at predictions of the volume of data and, more interestingly, the uncertainty of data… And the data sites in a few categories… Enterprise Data, loads of Social Media, and loads of Sensors (internet of things)… but uncertainty over aggregate data is getting hugely large… and that’s not in sphere of traditional engineering, or traditional business…

The next slide here is about architectures… this is topical… it’s IBM’s Watson system… this is the one that won Jeopardy… harvested loads of information and hypothesis generation… This stack starts with very computational stuff but the top layers look much more like humanities work and concepts…

Now technology and society interact. Often technology pushes on society. For instance if we look at Moore’s Law (memory in your computer doubles every year) mapped against the cost of mapping the human genome. It looks radically different, costs drop hugely in late 2000′s as a lot of effort is pushed in here. And that drop in cost to $1000 per unit… that is socially important… I could sequence my genome… maybe I don’t want to. You can sequence at population scales… machines generate a TB of data a week too – huge data being generated! And this works the other way around… sometimes technology gives you an inflection point and you have to keep up, sometimes society pushes back. A lot of time online is spent on social networks (allegedly 1/7)… now a unified channel for discovery and interaction… And the number of connected devices is zooming up…

So that’s the sort of thing that is pushing a lot of things… A lot of people have spoken to all the schools in the university… everyone reacts… you will find everyone recognising this… and you hear them saying “and it changes the way it makes me think about my research”. That’s so unusual to have such a common response…

Why this is important at Edinburgh… We have many interdisciplinary foundations at Edinburgh… All are relevant, no matter how data intensive, but we are well developed in interdisciplinary working…

And we have a whole data driven start up Ecosystem in Edinburgh… we have Silicon Walk (miicard, zonefox, etc.), Waverley Gate (Amazon, Microsoft), Appleton Tower (Informatics Ventures, feusd, Disney research, tigerface), Evo House (FlockEdu, Lucky Frame, etc), Quartermile (Skyscanner, IBM), Informatics, Techcube (FanDuel, Outplay, CloudSoft, etc.). A huge ecosystem here!

So, I’ll leave it there but input, feedback welcomed, just speak to myself and/or Kevin.

And that was it for the day…

Related resources:

Share/Bookmark

A digital humanities workshop in four keys: medicine, law, bibliography and crime – Liveblog

This afternoon I am attending “A digital humanties workshop in four keys: medicine, law, bibliography and crime“, a University of Edinburgh Digital Humanities and Social Sciences event. I will be liveblogging throughout the event and you can keep an eye on related tweets on the #digitalhss tag. The event sees four post doctoral researchers discussing their digital humanities work.

As usual this is a liveblog so my notes may include the odd error or typo – please let me have your thoughts or corrections in the comments below!

Alison Crockford – Digital articulations: writing medicine in Edinburgh

In addition to the four keys we identified we also thoughts about the four ways you can engage with the humanities field more widely. And in addition to medicine I will be talking about motions of public engagement.

Digital articulations plays on the idea of the crossover of humanities and medicine. So both the state of being flexibly joined together and of expressing the self. The idea came from the Issecting Edinburgh exhibition at Surgeons Hall. Edinburgh has a very unique history of medicine when compared to other areas of the UK. But scholars don’t give much consideration to the regional history and how medicine in an area may be reflected in literature. So you get British texts or anthologies with may be one or two Scottish writers bundled in. Edinburgh is one of the most prominent city in the history of medicine. My own research is concerned with the late 19th century but this trend really goes back at least as far as the fifteenth century. As an early career researcher I can’t access the multimillion pound grants from the ESRC you might need… So digital humanities became a kind of natural platform. I wanted to build a better more trans historical perspective on literature and medicine, would need input from specialists across those areas, I would also need ways to visualise this research in a way that would make sense to researchers and other audiences. I was considering building an anthology and spoke to a colleague creating a digital anthology. I chose to do it this way with a tool called omecca, in part because of its accessibility to other audiences. Public engagement is seen as increasingly favourable, particularly for early career researchers I’m interested in tools to foster research but also to do so in digital spaces that are public, and what that means.

I don’t have a background in digital humanities and there doesn’t seem to be a single clear definition. But I’m going to talk about some of the possibilities, what drives a project, how does that influence the result, etc. I will take my cues from Matthew Kirshenball’s 2002 essay on digital humanities and English literature. He sees it as concerned with scholarship and pedagoguey being more public, more collaborative, and more connected to infrastructure.

I was reassured to know I am not alone in looking at this issue and to have questions, there was a blog post on HASTAC – the humanities, arts, science and technology alliance and cOllaboratory. This was looking at the intersection between the digital humanities and public engagement, despite that organisation being already active in that space. I get the sense that this topic comes up as being there, but perhaps only recently ave there been deliberate reflections on the implications for that.

The digital humanities manifesto 2.0 which talks about increasingly public spheres. There’s a kind of deprivation in kirshenberg’s take on digital humanities and public engagement. I’m not sure public engagement deserves such derisive treatment, even though I am concerned about how public engagement and similar values judgement is increasingly chipping away at the humanities. But there is more potential there…

Many digital humanities tools are web based apps, they are potentially public spaces, and there are implications on our perspectives on any digital humanities, or indeed any humanities work. For instance the Oxford digital humanities conference last year, lookin at impact, nonetheless talked about public engagement as something more than just dissemination, but also something richer. Thinking about the participation of your audience, their needs and interests, not just your own.

Bowarst states that humanities scholars may risk letting existing technologies dictate their work, rather than being the inventors and designers of their tool. Whilst we may be more likely o be adopters I do not think that it is always the case nor neccassarily a problem. Working as Wikipedian in Residence at NLS I have been impressed with the number of GLAM collaborations embracing a range of existing kit: flickr, WordPress, Omeka, Drupal.

Omeka is designed for non technical users, it is based around templates and editable content. It is about presentation of materials. They are designed for researchers, those already interested… Who will SE it as a tool fr their research but not for wider audiences (e.g. Digitising historical serialised fiction and depictions of disability in nineteenth century literature). But these can look samey as websites, there are limitations without design support. However looki b at Lincoln 200 or Indeed George Arthus Plimpton rare book and manuscript page vs treasures of the New York Public Library website which is more visual and appealing. So I am interested in having the appeal of a public orientated website with the quality of a scholarly tool.

So looking At Gothic Past we see something that is both visual and of quality. You can save materials. The ways these plugins, opportunities for discourse etc. in Omeka etc. one up public engagement in richer ways…

Returning to medical humanities.. I think it has inherent links to public engagement, it helps enhance understand perceptions of health and illness. It’s impact can be so universal. Viewing medicine through the lens of literature enables a massively diverse audience who have their own interest, experience and perspectives to share. Giving a local focus also connects to the large community interested in local history. And designing the resource for that diverse audience with these many perspectives will help shape the tool. Restricting a resource to researchers

Q&A

Q) really interesting oaicularly the problems of digital humanities and research… Could yo say more about Omeka and how you plan to use it?
A) I have a wish list for what I want to make from Omeka. I would like logins, the ability to save material, and to have user added content and keywords to drive the site, so that there is input from other audiences, not just researchers but also public audiences. For instance exhibitions around digital patienthood. I hoe to be a good customer. If you don’t have the technological skills, you still have to put in the time to understand the software, to create good briefs, two months in I’m still working with the web team to create a good resource. I want to be a good customer so that I get what I want without making the teams life hell!

Q) what do you think being a good client means for our students. Bergson mentions that the more we rely on existing technologies, the harder it becomes. Think outside the box.
A) I think some f those coming up behind me have a better nderstanding of things digital… But those are the corporately driven websites, but they don’t neccassarily look. Eying that. Maybe you need something akin to research methods, looking at open source materials and resources. But realistically that may not be possible.

Q) I wanted to ask abut the way the digital humanities is perceived as a thing. In your public engagement work is that phrase used?
A) I think largely people think that these are the humanities and these are digital tools. There are parallel conversations in humanities and in the cultural contexts… The ideas of the digital library just being the library. So this doesn’t seem to be specific to academia, it is a struggle fr others to work out how to incorporate the digital into your experience.
Q) we are alread post digital?
A) kind of… The ideas of a digital resource from a library being a different tool doesn’t really seem to be what you actively consider, you see a cool tool.

Q) do you think the schism between research and public engagement exists in the cultural sector?
A) they have a better potential chance to do that. They must provide materials for research and also public engagement and public audiences. We think about research and sharing further but these organisations think inherently about their audiences, but the resources are great for research, for instance the historical post office directory research. The sector is a good place to look to to see what we might do.

Chen Wei Zhu – Rethinking property: copyright law and digital humanities research

Chen Wei did his research on open source but spen much of that time at the British Library.

I will be doing a whistle stop tour of copyright law, mainly drawing on the non digital. Just to set the scene… When did the digital humanities staRt? 1946 is a convenient start date, an Italian Jesuit priest tried to index the massive work of Thimas Equinus, they were digitised, put onto CDROM and now online. But at that time the term wasn’t digital humanities but “humanities computing”. I tried Googles n-gram viewer and based on that corpus you see that “humanities computing” comes in in the 1970s but “digital humanities” emerges in the 1990s. Humanities computing is still hugely used but will be interesting to see when “digital humanities” becomes dominant or bigger. A health warning here… Best between 1820s and 1922. 1922 in the US marks the beginning of copyright, but in Europe materials published before then were already in copyright. And another Heath warning… oigkes scanning kit isn’t perfect before 1820s because of print inconsistencies and changes. E.g. “f” instead of “s”. It fell out of use after times newspaper dropped the long f/s in 1893. So much data to clear up.

So what are the digital humanists opinion and understanding of copyright. I feel that digital humanities scholars are quite frustrated. E.g. burdock et al 2012 sees it this way. Cohen and Rosenzweig 2005 see it as an issue of Things never being fixed? [check this reference]

The US copyright office is shutdown… The US federal government closure included the copyright office being shut down. It is still saying it is shut… There will be a huge backlog for registering copyright.

So how did copyright law begin? What is the connection between the loch ness monster and copyright? The story goes that st columba is not only the first sighted of Nessie, and the first person engaged in copyright dispute. There is a mythical connection too…

The first copyright dispute is sometimes called the patron saint of copyright, huge misunderstanding, he is more the first pirate, copying a manuscript without the permission of his tutor. When he was caught secretly copying the book of psalms st finnian was very angry, he wanted to restrict the copy. The idea “to every cow belongs her calf, therefore to every book belongs its copy”. So this was the first copyright case. Columba had the decision go against him, and he rose up against the king s he led something of a bloodbath.

Now in this case there was no clear author of either finnian or columba. Ad no publishing planned r taking place. SL skip forward to 12th century china we see Cheng Sheren, the first publisher to register their copyright. We see a picture like Pre 18th century England, where the publisher has copyright. In china as in 16th and 17th century England is all about censorship not copyright in any other sense.

The Statute of Anne 1710 is the first copyright act, which brings in the rights of authors and does not include censorship clauses. The first modern copyright law. But author based copyright didn’t really take off until the early nineteenth century, think this was another ethos. Only as authors are seen as romantic genius in the romantic age does this model takes off. Publishers recede to the background to manage economic aspects and authors move to the forefront.

Enter stage left the authors guild. So Authors Guild vs HathiTrust (2012). The Authors Guild has around 8000 members at present. The authors ar encouraging decision that the distinct judge recognised a fair use defence for HathiTrust Trust to digitise copies of texts. The judge argued two types of transformations: full text search, and accessibility of text. That is very very important as an aspect of the ruling. And the judge was convinced of fair use defence. Some humanities scholars submitted, matthew jocker did an analysis of the use of digitised text.

Where we are… We started from the year 1550 and ended in 2012. The meaning of copy has changed. Is digitisation the same as copying by hand? And for digital humanist and copyright lawyer we have to reimagine the role of copyright and the role of the author in copyright. Could see authors as intellectual property owners. We didnt see intellectual property as a term emerge until 1960s when we saw an influential book and the IPO set up, but that idea does change our thoughts of copyright to some extent. But we also see open source, coined in 1988.. There are parallel growth there… We are more a steward and custodian rather tha exclusive intellectual property owner.

Q&A

Q) just to be a pedant here… Your discussion of the romantic author… I think you got it reversed… The law precedes the author by a distance. In the 18th century original works, poems, epic poems like the work of alexander pope etc. for the sake of erectile, their rank of gentlemen, and royal sponsors made books of vellum, extremely expensive.. The way the publishers got around the need to publish these expensive texts was to republish out of copyright works, recycled materials (including shakespeare), etc. cheap material on recycled rag paper. When new works appear, when paper costs drop, then you see new types of writing replacing old writing and publishers have little say… And in the early nineteenth century you see authors assert power. Profit and capitalisation of ideas in republishing of works is so crucial to current Authors Guild debate is important.

A) I’m glad you mentioedn Alexander pope, he is quoted in 1771 case. Almost all cases in 1710s onwards are between publishers but pope actually sued his publisher in that time. That is a gradual change… Going o the nineteenth century.

Q) us versus uk
A) divergence of law… In 1922… Us copyright act was a 56 year act. In 1978 that was in place… Anything Pre 1922 Out of copyright. UK it is 70 years after authors death. Canada 50 years, sheet music sites in Canada. Stuff out of copyright in Canada but not in the uk. But you can access in the uk. Definitely territorial but internet access is not.

Q) interesting you raised music, a whole other complicated history there.
A) absolutely, very complex. For instance Stravinskys work was very difficult for him to copyright because of Russia’s take on property.

Q) the ease of violating copyright law… Working fr Wikipedia and Wikipedia UK… It can be twisted around. The NLS we frequently have conversations about releasing digitised materials. In the uk unlike the us new digitised material has new rights attached. But we have just been putting content out there.
Comment) the British library lets you use copies of less that 3000 copies but if you have an ebook contract you have to pay huge sums for an image.
Q) it costs more to enforce copyright and fees. The NLS have a non commercial clause for digitised materials, usually we won’t charge if the come and ask us. But cost of enforcement can be higher than perusing. Is this unique to digital?

Gregory Adam Scott – The digital bibliography of Chinese Buddhism as a research and reference tool

Gregory is a digital humanities post doctoral fellow at IASH, his doctorate looked at printing and publishing in early Buddhist cultures. His talk has a new title “building and rebuilding a digital catalogue for modern Chinese buddhism”.

I chose this title inspired bynjorge Louis borges’ “the library of babel” containing the sum of all possible knowledges, versions with all typographic mistakes, the catalogue itself… I evoke this to represent the challenge we face today in looking at mountains of data, whilst the text may be less random we still risk becoming lost in our own library of babel.

My own work looks at a more narrow range of data. I began studying the digital catalogue of Chinese Buddhism cataloging texts from 1866 and 1950. But first a whistle stop tour of printing and religious printing in china. A woodblock print edition if the diamond Astra from 886 CE remains the earliest printed text that records the year of printing. In ore modern east Asian print history religious texts we some of the most frequently printed texts. The printing blocks of the Korean buddy canon was an enormous undertaking in terms of time, cost and political support. Often the costs were supported by ideas that contributing to publishing religious works would be something of a merit economy, bringing good things to you and to your family, which can then be gifted to others – s these texts often include a credit to donors in which they dedicated the texts to loved ones.

Yang Wenhui (1837-1911) and his students published hundreds of texts, thousands of copies and was a hugely influential lay Buddhist publisher. As we see the introduction of movable type and western printing processes this was hugely important, more work was printed in a thirty seven year window than in the previous two thousand years. This is great interma of accessing primary sources but problematic for understanding printing cultures. We see publishers opening up. The history of modern china is pepped with conflict and political and cultural change. And religious studies were often overlooked in the move towards secularisation, this is now slowly changing. And libraries were often free from key religious texts and it can be particularly hard to track the history of print in this time because of variance of names, of contributors, of texts, and of cataloging.

So I wanted to go back to original sources to understand what has been published. S I started with five key sources who had created bibliographies based on accessing original materials rather than relying on primary sources. There were still errors and inconsistencies. I merged these together where appropriate. I wanted to maintain citations so that original published sources could be accessed, that the work could be understood properly.

I did this by transcribing the data. I used a simple bare bones methods with XML. Separating the data and the display of the data. If someone wants to transform the data this format will allow them to do that. This is used simply, tags and descriptions are as human readable as possible. I want future researchers to be able to understand this. I also used Python for some automated tasks for indexing some of these texts.

Looking at the web interface that I put online, it uses Php, the same stack as Omeka. The database runs on SQL. There is a search interface where you can enter Chinese keywords and eventually you will be able to search by year or pairs of years. It returns an index number, title, involved author etc. simple but helpful information. It includes 2328 entries whe the spike at the golden age of china in 1902 is very evident. And then each item has its own static HTML page. That is easy to cite and includes all information I know about this text. S far I think this resource has been useful to produce data t pint the way towards future work… Less the end f research, more the beginning. This work has let me see previously undiscovered texts, you can also look across trends, across connections, the relationships to the larger historical picture. It could also be applied to other disciplines regions.

All of my input to this project is provided under creative commons (non commercial). Bibliographic data isn’t copyright able as it is lucid knowledge but the collection of that could be seen to be original work so I’ve said it is my work that I am happy for others to use.

The reason there is such a spike in 1902, where a date is not known it is assigned to that date free which all texts will have a date.

This catalogue is different from book suppliers data as the purpose is so different, my research use is not for purchase in the same way. I want to add features and finesse this somewhat but my dream is if doing what I’d call “Biblio-Biographies” to see the appearance of text over time, seeing nowhere it appears in publishers catalogues… and how the pricing and presentation changes. For instance looking at the Diamond Sutra we see different numbers of editors, one offers a special price for 1000 copies. I used bibliographic sources but there are so many more forms and formats that I will need to consider, each source will be treated differently. Adverts may appear for publications that were never produced. Have moved from bibliography, to catalogue to something else.

Q&A

Q) why not use existing catalogue tools
A) didn’t have anything with the right sort if fields, very different roles of authors, editors, etc. not in a standard format, consider MARC but it wailed be relatively easy to transform the XML to MARC.

Q) are you thinking about that next stage, about having ways for more people to contribute.
A) I have been involved in the wiki based dictionary of Chinese buddhism, we opened it up to colleagues and nothing happened. But only us, the co-editors contributed. Big issue is about getting credit for your work which may be the issue for contribution.
Comment) have a look at the website Branch on nineteenth century literature, have asked for short articles and campaigned for MLA bibliographies inclusion and that helps with prestige. Just need big names to write one thing…

Q) could you say something more about other sources
A) there are periodicals, a huge number of the,. A lot of these focus in on ocular printings of texts, some include advertisements, etc. so these texts point off to other nodes and records.

Q) you talked about deliberately designing your catalogue for onwards for transformation, and whether you’ve thought about how you will move forward with the structure for the data…
A) I’m not sure yet but I will stick to the principle that simple is good and reusable, and transform ale are good.
Comment) you might want to look at records of music and musical performance.
A) I’ll keep that in mind, Readings of these texts are often referred to as performances so that may be a useful parallel.

Louise Settle – Digitally mapping Crime in Edinburgh, 1900-1939

Louise is a digital humanities post doctoral fellow at IASH and her work builds upon her PhD research on gender and crime in the nineteenth century.

I want to talk about digital technologies and visualisation of data, particularly visualisation of spatial data. I will draw upon my own research data on prostitution. And considering the potential fr data analysis.

My thesis looked at prostitution in Scotland from 1892 and 1939. The first half looked at the work of reformers, and the second half looks at how that impacted on the life of women at this time. S why do crime statistics matter? Well it sets prostitution in context, recording changes and changing attitudes. My data comes from the borough court records, where arrests took place, where police looked for arrests, and the locations of brothels at this time. Obviously I’m only looking a offences, so the women who were caught, and that’s important in terms of understanding the data. Because these were paper records, not digitised, I looked at four years only coinciding with census years, or the years with full data nearest census years.

I used Edinburgh Map Builder, developed as part of the Visualising Urban Geographies project led by Professor Richard Roger who helped me use this tool, although it is a very simple tool to use. This allows you to use NLS historical maps, Google Maps and your own data. There are a range of maps available so you pick the right map, you can zoom in and out, find the appropriate area to focus on. To map the addresses, you input your data either manually or you can upload a spreadsheet and then you press “start geocoding” to have your records appear on the map. You can change pin colours etc. and calculate the difference between different points. Do have a look and play around with it yourself.

The visual aspect is a very simple and clear way to explore your subject, and the visual element is particularly good for non specialist audiences, but it also helps you spot trends and patterns you may not have noticed before. So looking at maps of my data from 1903, 1911, 1921 and 1931. The maps visualise the location of offences, for example it was clear from the maps that the location changed over time, particularly the move from the old town to the new town. In 1903 offences are spread across the city. In 1911 many more offences particularly around the mound. In 1921 move to new town further evident. By 1931 the new town shift is more evident, some on Calton hill too.

The visual patterns tell us a lot, in the context of the research, about the social geography of edinburgh. Often old town is seen as working class area and new town as a middle class area. Prostitution appears to move towards to centre but that is also the grin statistician, the shopping areas, the tourist areas. This tells us there is more work there. They keep being arrested there but that does not deter them. Small fines and prison spells did not deter. Entertainment locations were more important than policing policies. You can see that a project that is not neccassarily about geography has benefitted from that spatial analysis aspect.

If you have spatial information in your own research then do have a look at Edinburgh Map Builder. But if you have data for elsewhere in the UK you can use Digimap which includes both contemporary and historical maps. There are workshops at Edinburgh University, and the website on the bottom there. That’s UK-wide. And a new thing I’ve been playing with is HistoryPin – this uses historical photography. You can set up profiles, pictures, paints, etc. you can plot these according to location. You can plot particular events, from your computer or smartphone. Yo can look at historical images and data. So I have been plotting prostitution related locations such as the Kosmo Club, the coffee stalls on The Mound. You can add your data and plot them on the map. Very easy to use site and this idea of public engagement, this is a great tool for doing this.

Q&A

Q) I was quite interested in those visual tools and the linking of events tying them to geographical places. And there are other ways to visualise social network maps, I wonder how it would be to map those in your work, there must be social connections ther. Social network analysis can look very similar… I wanted to know if you have considered that or come across that sort of linkage.
A) I haven’t but that sounds really exciting.

Q) I wanted to ask you about the distribution and policing. If one were to return to the maps. Some marked differences in the number of offences – arrests? – how much detail did you take out of it? You said they were going back and were not deterred. In 1911 markedly different numbers. But even at the times when there was actually more policing towards the old town, the police were just sticking to the main routes. So was the old town a lawless zone at that time? Police not wanting to venture into dark alleys. And how long does Edinburgh’s tolerance zone persist. And it’s curious o see that without Leith too! As now the city operates a more direct reflection but perhaps before the amalgamation of the authorities perhaps there wasn’t such a direct deflection affect?
A) in terms of Keith it was occurring there. The argument is coming from the suggestion that it was informally tolerated in the old town… I don’t disagree that it happened in the old town but my arguement is that it is also happening in the new town and measures there don’t stop it when they should. And my research also sees the police not always caring and judges and juries moving for reform rather than harsher sentences. Cafes and ice cream parlours were a cause of concern in Glasgow in 1911 which may impact the figures then. The 1903 records are not correct, it may be an outlier as the general trend is of decreasing offences over time…

Q) about the visualisation tool, you have tremendous amount of interest in those maps, are this emails important for research design, for research questions. Or would you wish for a tool with more possibility for contextualisation. Fr instance statistics from authorities etc, to interpret your findings. What possibilities for researchers to have these tools yield more stuff?
A) the maps are interesting, they are more appealing. But these need to be used with tables, charts, statistics. If just presenting on the work I would have included those other factors. So in 1903 you lose some density when all dots are in the same place. But an interactive tool to do that would be great.

Comment) what is so attractive of visualisation is speed and efficiency but that also means there is a risk in concluding too quickly, of not necessarily reflecting reality of prostitution – the reader may read your map of offences in that way, that will be easy to do but the methodology can be dull to people and that can mean misunderstandings.
A) absolutely. This needs to be in context.

Q) could you have layers comparing income against offences etc. if you’d found any projects that were developing more complex…
A) the big project is the Edinburgh Atlas, there is a mini conference on hidden histories and geographies of edinburgh on mapping crime, it’s on the IASH mailing list, there are others doing that.

Q) you talked about women seduced by foreigners in edinburgh?
A) in edinburgh there was concern about Italians at ice cream parlours, brazilians were the concern in Glasgow. And in edinburgh there was also a German Jewish pimp of concern as well.

Discussion more widely…

Comment) I’m primarily a learnin technologist and I send my life trying to get people to start from the activity they want to undertake, and not starting with the tools. I found it refreshing tat you all started with your data and looking for tools with the right affordances. How did you find you were helped with that search for a tool.
Louise) it was human contacts. I saw a lecture from professor Richard roger.
Ally) it was similar for me, I found a software through a contact but found it hard to find what else was out there. It basically came down to Omeka or Drupal that the web team knew about. but it would have been great to know what was out there, what the differences are, what resources there are. Even looking through DHNow and DH Quarterly there isn’t a sense of easily identifying the options for the tools. That can be a bit of an issue.
Greg) I used the tools colleagues were using to build my own…
Comment) HCI has the notion of affordances, what it easily enables you to do and what else it could enable yo to do. Is there something there about describing affordances for the humanities. My sense is that often they are pitched towards the sciences, sometimes terminology varies event, so understanding affordances varies.
Ally) sometimes developing your own tools is good, but even a little knowledge and terminology let’s me get better results from these tools, if. Come to these tools end these colleagues with no knowledge then I will not have a successful outcome. I want to really explore Omeka so that I feel confident and able with it.

Question) have the tools changed your research questions or ways of working?
Louise) not me
Ally) for me the have. I was introduced to the 19th century disability reader digital anthology and knowing what was capable with the tools changed what I wanted to d with my project. It did to some degree. By the basic aim was I want to know more about late nineteenth century medical history hasn’t changed. But the project has
Wei Chen) I find the legal documents, creative commons licenses etc. most useful, I was able to be involved in the first version of the Chinese Creative Commons license.
Greg) it hasn’t changed my questions but the scale of work possible and how I might explore it has changed for me.

Question) what advice would yo give for people thinking about digital tools for research
Greg) don’t be afraid to just try things out, work out what’s possible…
Louise) do ask for help, do take advantage of courses…

Question) I was struck with the issue of time when you gave your presentations. Have you reflected on the process of the use of time. How to use jt creatively and consain it. And how that use of time perhaps changed your view of get, of hard copy materials.
Ally) with digital projects you can find you go with the additional time used. Yo should not underestimate the time neccassary. But at the same time I would spend hours and hours leafing through texts to answer a research question. I want t use this tool to reduce the time to find the data I need, to access it, to interpret it. But this project is about developing this oll to benefit myself and others later. You need to be realistic, step back, and be realistic about what is possible.
Louise) that’s part of the issue of digital humanities. My work will be in a traditional book format but the Historypin work, very engaging, but not counting towards career, towards a job. That’s a challenge fr digital humanities and for early career researchers, it’s why our scholarships are so good.
Wei Chen) and there is the distant versus close reading difference. Close reading still has a role but that distant reading allows us to interrogate that reading, to find that resource, etc.
Greg) nothing we are doing are unrecognisable research but we are able to perhaps examine more material, or to do things more quickly. We are not doing everything differently but using new tools in our work.

Question) do you think this investment in tools is changing humanities as a result f this temporal and labour investment in tools. Ally you talked about putting off other work…
Ally) well I am song research, You always have to manage many projects at once. And ther will be an impact. But. Chose the digital path because time and financial limitations changed what was possible. It could have been done another very expensive way. So I’m not putting off research, I would probably be spending years collating information… Instead I am setting something up to facilitate my own research in the future. The relationship between distant and close reading. That divide isn’t as fiery as it appears.
Comment) the superficial view of the digital is happening in teaching. Universities jump on the digitisation bandwagon in a way that changes how humanists are employed, how software are copyrighted and licensed. All these tools help universities save money. One can overreact… Ealignments f labour and resources makes not so positive inroads…
Ally) it’s a huge problem, I have huge concerns about the University’s MOOC programme. There was discussion of open access individuals to talk about what these means…
Louise) not sure but I know colleagues are concerned.
Wei Chen) open access is about economic growth, not hardcore humanist values. Humanist values should be at the core for digital humanists, there will be an increasingly curatorial role fr all formats of material
Comment) abit critical engagements

Question) one of my concerns about this sort of work, and the work in geography in ways of making and curating an archive. I was wondering about the length of time an archive is available after a project. There was a BBC project to save our sound and it finished and the map is no longer accessible… So who looks after and preserves data.
Greg) I think it’s hard to “lose” data, it’s abit implementation not methods.
Ally) I think it’s about how digital humanities adopt tools, about reflecting on project aftermath. When looking into project funding you don’t want that tool lost. It’s not an issue f methodology or individuals but it has implications for future archiving.
Comment) which is why Greg’s work in XML matters
Me) and the use of research data management plans and research data repositories to help ensure planning and curating of data at the outset, and to ensure lon terms access and sustainability.

Share/Bookmark

Digital Scholarship Day of Ideas 2

Today I am blogging from the University of Edinburgh Digital Scholarship Day of Ideas 2, a day long look at research in the digital humanities and social sciences. You can find out more on the event on the Digital HSS website. As usual these are live blog posts so apologies for any spelling errors, typos, etc. And please do leave your comments and corrections here.

Professor Dorothy Miell, head of college of Huminities and Social Sciences is introducing the day. Last year we shaped the day around external speakers but we are well aware that there is such a wealth of work taking place here in Edinburgh so this year we have reshaped the event to include more input from researchers here in Edinburgh, with break out sessions and discussion time. The event is part of a programme of events in the Digital HSS thread, led by Sian Bayne. The programme includes workshops and a range of other events. Just yesterday a group of us were discussing how to take forward this work, how to help groups gather around applications for grants etc, developing fora for post graduates etc. If you have any ideas please do contact Sian and let her know.

Our first speaker is Tara McPherson who is based in the School of Cinematic Arts at USC in Los Angeles. She is a researcher on cinema and gender. Her new media research concentrates on computation, gender and race as well as new paradigms of publishing and authorship.

Scholarship across scales: humanities research in a networked world – Dr Tara McPherson, School of Cinematic Arts, University Southern California

We are often told we are living in an era of big data, of large digital data sets and the speed of their expansion. And so much of this work is created by citizens, “vernacular archives” such as Flickr and YouTube. And those spaces are the data for emerging scholars. And we are already further along in how big data and linked data can support scholarship. There is a project called DataONE – Data Observation Network for Earth  – is a grant project for scientists, the grand archive of knowledge. This is the sort of data aggregation Foucault warned us about! But it’s not just in the scientists. In the humanities we also have huge data sets, the Holocaust Testimony video collection is an example of that – we can use that as visual evidence in  a way that was previously unavailable to us. Study of expression, of memory, of visual aspects can be explored alongside more traditional ways of exploring those testimonies. And we can begin to ask ourselves about what happens when we begin to visualise big data in new ways. If communication is increasingly in forms like video what are the opportunities for scholarship to take advantage of that new material, the vernaculars, and what does it mean that we can now have interpretation presented in parallel to evidence. Whilst many humanities scholars have been sceptical about the combination of human and machine interpretations there are rich possibilities for thinking about these not as alternative forms but as a contiunuum. And we will see shifts in how we collaborate, in sharing the outcomes of our knowledge. Rather than thinking of our outputs as texts, as publications, we also need to think about data sets, as software. Stuff that exists at multiple levels from bite size records – metadata that records our work for instance, to book size, to bigger. And we need to think about how we credit work, how we recognise effort, how we assess that work. How do we reward and assess innovation – how do we do that for research that may not lead to immediate articles but be much longer, much bigger scale.

Going back to DataONE there is a sub project called eBird, a tool to allow birdwatchers to gather data on birds. They are somewhat ahead of the game in thinking about crowdsourced science. Colleagues at Dartmouth are starting to look at crowdsourcing data. My son plays a game that lets you fold proteins that contributes to scientific research. There are examples from Wikipedia, to protein folding to metadata games, etc. which also challenge traditional publishing. The Shakespeare Quarterly challenges peer review with an open process – an often challenging form of peer review. Gary Hall and colleagues at Goldsmiths are also innovating with open journals. But we also see a change from academic knowledge as something which should be locked away, a move away from the book as fetish object etc. In the UK we saw JISC fund livingbooksaboutlife.org – from open access science but curated by humanists and scientists.

And we see information that can be discovered and represented in many ways. We can get hung up on Google or library catalogue search dynamics but actually searches can be quite different. So for something like Textmap we get an idea of different modes of discovering and browsing and searching the archive, opportunities for academics to reinterpret and reuse data. The opportunity to manipulate and reuse data gives our archive much more fludity. We can engage on many different registers. You can imagine the Shoah Foundation archive which I showed earlier having a K12 interface, as well as interfaces for researchers, for publishers etc. Some may be functional interfaces but some may be much more playful, more experimental.

Humanities scholars and artists are helping to design some of these spaces. The tools will not take the form that we need them to as particular humanities scholars unless we are part of that process. We often don’t think of ourselves as having that role but we have to shape those ways to communicate our data, to visualising it etc. Humanities scholars have spent years interpreting text, visual aspects, emotion, embodiment, we are extremely well placed to contribute, to help us build better tools, better visualisations etc. There is no logical fit between the design of the database and the type of fit with the work of humanities researcher. Data can have inconsistencies, nuances, multiple interpretations, they don’t easily fit into a database but databases can be designed to do that. Mukurtu (www.mukurtu.org) is an ethnographic database and exploration space, the researcher has worked with the world intellectual property association and indiginous groups to record and access data according to their knowledge protocol, that reflect kinship relations, codings of trust. We also have much to learn from experimental interactive design. The Open Ended Group (openendedgroup.com) do large scale digitisation. They have digisted a huge closed detroit factory, and used 3D visualisation. It’s for an experimental art space not a science museum. It’s a powerful piece to experience and inhabit and explores the grammers of visuality. It’s not about literal reinterpretation but creative and immersive explorations.

Another example: Sharon Daniel – database driven documentary from IV drug users in a needle exchange programme in San Francisco. 100 hours of audio to be explored through the interface, work in Vectors. Vectors is a journal I edit, an experiment on the boundary of humanities research, visual interpretation and screen culture. Can you play an argument like a video game? Can you be emersed in an argument like a film? Another example here is an audio exploration of the largest womens prisons in California. Curated to make an arguement about our complicity in the rhetoric of imprisonment by the state. The piece has a tree based structure which allows exploration based on where you have been. You can navigate the piece through a variety of themes. You can follow one woman’s story through the archive in a variety of ways, and incarceration and the paradigms on which it depends. The piece is quite different to a typical journal article – it will be different every time. Which raises interesting questions for the assessment of scholarship. It’s fairly typical of what else is in the archive. We pair scholars with minimal or no programming experience with staff in design and programming staff in the lab. A fantastic co-creative process but not scalable, especially as many of these pieces are in Flash. But we have identified many research questions and areas for exploration here.

I work in a cinema schools, looking at visual cultures. We found we needed tools, we didn’t want to build tools but the scholarly interpretation needed by our scholars does not fit into existing rigid strcutures. Since we began to work in this area we’ve moved to thinking about potential around vernacular knowledge, collaboration with the Shoah Foundation, temporal and geographical maps from Hypercities that let you explore materials in space and time. And from those partnership we have formed a group, the Alliance for Networking Visual Culture (scalar.usc.edu/anvc) funded by Carnegie Mellon(?) with partners from the Internet Archive, with the SHoah Foundation, with traditional humanities research centres, with design partners, 8 university presses to explore none traditional scholarly publications and those presses have committed to publishing these born digital scholarly materials. And you can begin to think about scholarship across scales, with new combinations, ways to draw in the archives. Traditionally humanities scholars have a vampiric relationships with the archive! We can imagine in the world of Linked Data that the round tripping of our scholarly knowledge back to the archive might become quicker and more effective. So we’ve been building a prototype… this is a born digital book about YouTube by a media scholar, which takes the form of YouTube. It’s an open access book but peer reviewed in the same way as any other. So we have built a platform called “Scalar”, a publishing platforum for scholars who use visual materials. Anyone can log in, to play with the software, to try to create and engage with the software. It’s connected to archives – partners, YouTube, Vimeo, etc. and particularly to Critical Commons – an archive that includes some commercial materials (under US copyright law) and also links to the metadata around that material. And it lets you create different structures that allow you to take multiple paths through materials, through data, more like a scholarly form but not neccassarily in linear routes. So, for example, “We are all children of Algeria” by Nicolas Mirzoeff. He had a book coming out in print but when submitted the Arab Spring took place and was very relevant to the book so he created a companion piece. As you built the piece on Scalar a number of visualisations are generated on the fly to show you data on the content of the book, visual Table of Contents, metadata, the paths, etc. Another recent project, “The Nicest Kids in Town” – on American Bandstand that includes video that couldn’t be in the book. Also Diana Taylor and the Hermispheric Institute

Henry Jenkins and colleagues interactive book on digital cultures. Third World Majority an activist archive and scholarly expert pathways through that archive. Blurring the boundary between edited collection and archival collection. And the Knotted Line blurs public humanities and public curation. It explores incarceration in the US and this is based on the Scalar API with their own interface which is quite tactile.

These tools allow us to explore the outputs of scholarly research in different ways, the relationship to evidence, but also to think about teaching differently. See programme in the humanities and media studies, at intersection of theory and practice, where students must “make” a dissertation rather than write a dissertation. See also Rethinking Learning – a series of cards and materials from which students could create peer to peer learning. It is also a dissertation. The author Jeff Watson will be in a tenureship track role in Canada in the fall. Susana Ruiz has created a dissertation prototype which is a model of learning around games and video archives. But both of these projects look at new possibilities for teaching and learning.

We are building tools here for humanities scolars not “digital” humanities scholars. We build upon rich traditions of scholarly citation and annotation. Our evidence can live side by side by the analysis which increases the potential rigour of scholarship, the reader has far more opportunity to question or asses those arguemens. And the user/reader has an opportunity to remix. This isn’t about watering down our scholarship or making it ritzy, rather it is about making our scholarship flexible to an ever changing world and accessible in new ways.

Q&A

Q1 – Richard Coyne, Architecture & ECA) You raised the question of citation and academic and scholarly practices. Visual materials can be difficult to that

A) We tried stuff out here. A flash project is really hard to quote, accessing a specific audio file in Sharon Daniels work is really challenging. But in scalar each object has a unique identifier and URI, and you can export as XML and PDF, and you can use the API. It’s a traditional relational database with quite an idiosyncratic semantic layer on top. So you can build interesting stuff because of that combination.

Q2) You talked about emotion. There can be excitement around this sort of material but for some there is a sense of fear around knowing how to engage, particularly when incorporating into our own curricula and research. We can be quite traditional when we return to our desks. Any simple start up ramps to get through the fear barrier?

A2) It’s been a slog, even at USC. Dealing with visual rhetorics and argument. We have an institute in visual literacy for practice based PhD and interactive undergraduate and postgraduate programmes. We have guidelines and rubrics developed there for multimedia work and assessment and those have been useful rubrics for other schools in the university. At university level for tenures and promotion committee we have created criteria for assessing digital scholarship, the different ways to evaluate that work. The issue is less the form of the work but actually assessing the contribution of such a wide range of collaborators with very different skills. We have borrowed from the sciences but that’s not a simple mapping, there are issues. We have had only four digital media PhDs completed so far but all have gone on to good things. Visual temporality have traditions that it can draw upon… it will be an unevenly distributed move for next 10 years or so at least.

Q3 – Clara O’Shea, School of Education) the engagement with living archive, and the role of the scholar in that – what are the ethical implications? And what ways are your work changing the way scholars assess their own work?

A3) I’m just starting to look at assessing the role of the digital archive and the radical shift in purpose than the traditional archive. The library is about access, the archive to preserve. Digitally that split isn’t as relevant. Ethically it is very tricky though. The Shoah Foundation recorded materials long before the web, this was set up by Stephen Spielberg. Now they did sign away their rights to materials but we have been working with the board of the Shoah Foundation around what is and is not appropriate to do with the materials. There are projects for kids to remix video – so we have developed an ethical editing guideline for those students. At Dartmouth with that metadata game there has been a need to really think about the ethical and quality implications – exploring by layer, the difference of “expert” and crowdsourced, is a way that has been handled. In terms of scholars it changes the relationship to evidence and to scholars own work. So back to the Shoah material they have a policy of not providing transcripts as they want researchers to actually watch the video, to understand hesitancy and emotion. They have had scholars who have gotten students to make transcripts for them, analysed that and the Shoah foundation queries the analysis and whether scholars had seen the films. When those scholars actually watched the films their experience and analysis was quite different.

I was trained as a feminist film scholar when it was hard to find the film. I had read about the films before seeing them, often long before, and you could be left wondering if the scholar you had read was based on the same thing. Having the evidence there changes that, gives you a more direct relationship. Also writing small sections of arguments, writing more modually, that is what you start to do rather than long form structures we are used to, and that can be really appropriate for humanities scholars in some areas.

And now many thank yous and onto breakouts. I am going to Breakout 2, chaired by Professor Robin Williams:
I will be talking about  a project from the last three years looking at electronic literature as a model for creative innovation and practice. It’s mainly about networked communities of data analysts and practitioners. I was looking at ideas, concepts and new ontologies, of creativity in particular. And focusing on co-creation and collaboration. I say that is novel but really it isn’t, co-creation and collaboration pre-dates the digital era, pre-dates publishing in craftsmanship traditions. I was looking at both amateur and professional artists and practitioners, in a transnational, transcultural contexts. How we use the internet to create, say, art. So this is about exploring process, creativity, community, these sorts of aspects.
We came across the idea of creativity as a social ontology. Creativity as “an activity of exchange that enables (creates) people and communities” (Simon Biggs). You need interaction in the making process of this sort of ontology. In the communities I engaged with creativity was a subsequent activity of the collaborative community. They were interested in the making process rather than the objects of the making. Ethnographically I took a post-modern multi cited approach as a framework: follow the community; follow the artefact; follow the metaphor; follow the story; follow the life; follow the conflict; and I added the idea of follow the line (follow the rhizome). The communities are dynamic, changing, they move in different directions. The same in the voices, how many are there within those communities… The fieldwork was very nomadic both offline and online. I started following one community, then found many others connected. I followed online but also offline (within Europe). I looked at a network physically based in London, other communities started with New Zealand, moved to Germany, Italy, etc. and online presences moved beyond this.
I was looking at the idea of a “creative land” sat between place, artefact, practice. The practices are connected, through a community of bodies that make these assemblages happen. I look at the theoretical approach by (?) of creative lands. I didn’t just look at the creation of objects but also the creation of communities. Looking at creativity of Synergy and Assemblance. So I looked at Furtherfield.org, probably largest digital arts community in Europe. They have an offline gallery in London where I undertook fieldwork in January 2011 and this is still ongoing. This comes from the idea of being further than the leftfield, their basis is political and based in politics of late 1970s but also with criticism of commercialism of the New British Artists and Saachi’s influence on the arts. I looked at the daily activities, how they communicated their activities, and it is very equally distributed, not hierachical. For example one co-founder Mark Garrat talked of the community as “the medium” for this work. The artists were involved could come from sound, to network, to cyber performance, quite an open approach by Furtherfield. They have created the idea of DIWO – Do-It-With-Others, the making of art and artistic practice. This is defined on the website and clearly requires social interaction and collaboration as part of this work, about heteroarchy. The DIWO ethos is about contemporary forms of collaboration, an open and political praxis, about peer-to-peer processes for learning and sharing knowledge and making knowledge. And the idea of media art ecologies – based on Bucht who believes in a continuum of humans and environment, and from George Babbetson who talked about ecologies of mind, as multifunctional and different ideas and cultures coming together to make an assemblance.
The particular projects using digital platforms tend to focus on social change, particularly environmental change. And there is a movement called “make-shift”. Two groups, one around the world, one in Exeter. They have cyberperformances. And they have an open source “App Space” performance space for video, for materials, tweets, etc. This is one kind of process, of use of ideas. The artists have particular materials for performance including facilities to allow multiple audiences, multiple mixing, multiple points of access to be part of the performance. Another performance brings in comments from Facebook. As well as her belongings from the last 5 years, juxtaposing this with other forms of collection.
Another project, Read/Write Reality and their work Art is Open Source. Their idea is creating academies of knowledge. They share the knowledge of how to use open source tools to make art. So one project of Art is Open Source uses ubiquitous realities movies with WordPress. Their work is about co-creation and collaboration. I am also looking at AOS: Ubiquitous Pompeii through autoethnographic processes. This works with high school children in Pompeii, looking at designing and imagining possibilities to see the city in different ways. And co-creating and remixing material with schools. Using ubiquitous technology to co-create cities. It is still about peer-to-peer processes, about co-design… We are seeing the process of working together. The largest and best known project of Art is Open Source is La Cura – the call for a cure for a brain tumour, sharing medical information and scans etc. openly on the web.
Q&A
Q1) We have a project on open source and film, how do people engaging in these works actually make money from them?
A1) Furtherfield are using crowdfunding, education projects etc. to keep running. Art is Open Source runs educational and other projects and provides funding to make some of these projects happen.
Q2) You write in scholarly journals etc. Did the keynote give you thoughts about how the projects you look at may be written up in new ways.
A2) Yes, I think one thing that is interesting is the idea of being open source but I would also like to see collaborative writing. The monograph is all about me. But I would like to see multi voice texts and would like to look at this for sure.

Copyright, authorship and ownership in digital co-creative practices – Dr Smita Kheria

My work arose from Penny’s previous project. Some of the participants will be common to Penny’s presentation just now. My research interest is in exploring the norms of collaborative practices so far as copyright and ownership are concerned. I am a copyright lawyer and I am interest in how authors relate to copyright law in their practices. Copyright law poses 2 problems. Firstly how it conceives authorship and how that author is credited; and the second problem is how collaborative authors are perceived and how that works in practice, and particularly in emerging collaborative processes online.

So, just to ensure we are all in the same place. Copyright protects the work, it must be an original work. There must be some originality, some effort, skill and judgement. Usually the first author is the first owner, they are the copyright holder and has the economic rights. In collaborative work there are particular assumptions. In co-authorship – for example distinct book chapters in a book – each author has the rights for their contribution. When a joint authoer is perceived, a collaborative authorship, then all contributions have rights. But there is no distinctions within the concept of a joint author. And that has implications for the perception of authorship.

Last year Penny and I worked on a six month AHRC project looking at creation and publication of the “Digital Manual” and looking at authority, authorship and voice. Explored through interviews and focus groups. Participants were working with open source mechanisms. We asked participants – and creators – what the role and meaning of collaborative authorship was for them. What they felt about this, rules of attribution etc. And we found no set rules here, some ideas of how they should perceive authorship. Some commonalities across all four communities – which included MakeShift (from UpStage) and Art is Open Source. What they created was built in real time, changing regularly, grounded heavily in collaboration. The first case study on Art is Open Source we saw a very hands off approach to authorship and ownership. They are a network, they provide open source platforms and software, and also a fake competition in the project we were looking at. They were clear about the ownership of the platform and the software – open source and GPL licensed. But in terms of authors they wanted to disappear, they don’t want control, do not mind what others do with the material they have created. So for instance a book which came out of the project was discussed, they felt forced to be on the cover by publishers. They did take responsibility for the process but didn’t want to engage in what was made with what they made available. They felt attribution was important, generally important but they were not concerned about attribution of their own work.

This was very different to Sauti ya Wakulima. This is a collaborative knowledge base project set up by a set of farmers in Tanzania who share materials gathered via smartphone. There is an ongoing community around farming practices, climate change, etc. The person who set up this project took a very active role in terms of the content created and in the platform etc. They spoke to farmers about the licensing of content etc. This was made available under Creative Commons. His own perception of authorship was different. He did see himself as the author of the software, although he talks about using others materials and code. He was the author but no “not everything came from my own mind”.

Looking at UpStage from make-shift. The platform is totally open. But what about the performances? Well they left that to  performers. There was no licence fee payment option within platform for instance. Performance organisers used the term “brokers” of collaborative performances in the space but, when asked about the performance, capture of the performance for instance, they conceived themselves as authors. They wanted to disassociated themselves from notions of authorships but that was very much their own perceptions. And there was ambiguity about contributed images around performances as well.

And the final case study was FLOSS Manuals – collection of manuals on free and open source software. It is entirely open and editable. A collaborative publishing platform. A lot of manuals there. When editing videos we had taken in this work I actually used one of their manuals for my own work. The platform is open but what about the content? The platform takes a very active role in the content. They have clear licensing, using GPL. Anyone can publish, sell, reuse content. Within the community creating the manuals there was no consensus, it was imposed by the platform owners. And the creative community here radically expanded attribution – anyone who had done anything at all (a single letter, a font face, etc) was credited. Some uncertainty when we spoke to them as the community was unsure about attribution and licensing.

This was a small study but it is clear that collaboration and co-creating has huge implications for perceptions of authorship and huge relevance for copyright law.

Q&A

Q1 – Ewan Klein, Informatics) A comment more than a question: GPL does not let you do what you like. But do you think that Creative Commons would have provided a trail of attribution in the right way?

A1) Yes Creative Commons would allow that but not all of those we spoke to had the same feeling about attribution, about how work should be attributed and whether there is to be attributed. And under the law some may not be a copyright work (e.g. 1 line in a manual). Here attribution and copright ownership would be split. Do you attribute the collective or the individuals? The farmers went for collaborative attribution… that solves the problem but not the issue of who should be attributed.

Q2 – Chris Speed) something here to do with reciprocity. In terms of commons, in commons land… implicit models of not taking all your sheep… could that translate to copyright

A2) Reciprocity did come up as a suggestion on the basis of which attribution could be made. But how do you assess reciprocity? This comes back to Robin’s question of funding. All of these projects were started by grants, thereafter funded by second jobs, projects, PhDs, voluntary contributors. So if coming in voluntarily is attribution the least you can do (e.g. FLOSS), but maybe if getting a performance that is reciprocity enough? Now these were very different projects and that does need bearing in mind, but those differences were interesting.

Simon: There is a model in Open Source Software of attribution. In open source films we see this work at first but it falls apart when it gets to being an interface from enthusasim and creation and the longer term sustainability.

Penny: FLOSS is an interesting one. This is sort of a benevolent dictator model. He was reluctant to be involved. They do not have money, looking in different directions… This open source, almost utopian community have realised that they need funding to continue.

Smita: and they had an issue. They could publish those manuals but so could anyone else. It would be good to go back in a year’s time to see what had happened.

– And a break whilst I spoke at the Scottish Crucible –

“It’s a computer m’lord”: law and regulation for the digital economy – Prof Burkhard Schafer

I have come in a little late here but Burkhard is talking about new forms of data, such as monitoring data on older people, for the monitoring of their health but potentially ethical and legal concerns. What if you use technology to help people with their memory – what if it has legal issues? What if it leads to a criminal investigation? New forms of data collection invalidate traditional metaphors, traditional divisions of law.

I am based at the law school, notoriously the scene of a crime – the body snatchers of Edinburgh. The law tried to manage supply side, that led

Regulation through Architecture (Larry Lessig) – they restricted access, they build fencing around graves, they patented thick metal coffins that allowed you to view the decomposition before burying, to avoid body snatchers. I call this DRM (Death Risk Management!). But this does relate to the loss of things that are precious. There was a case of a father who gave his daughter, who was dying of cancer, a phone with unlimited voice mail box. But the phone was in her name and when she died the messages were deleted. He took legal action but this is not an easy case.

Whose assets are they, whose privacy is at stake? What happens to the digital artefacts after death? This is complex. This work is part of a multidisciplinary research project, not just informatics and lawyers but across anthropologists, sociologists etc. We came up with radical suggestions far from that of these judges. For instance the “Dead Man’s Switch” – a way to wipe your hard drive and remove embarrassing stuff on your death. There were joke companies promising to look after pets in the case of the rapture to ensure your pets were taken care of by good aethists. But there are serious questions about a service here… about legal liability when taking action on behalf of a dead person.

What about disintermediation? The body snatchers were banned so they cut out the middle man, killing for bodies rather than digging them up again. But could it happen again? Well child trafficking and sex abuse sits in some of the same places of preying on the nieve. We work on this area, looking at ways to understand the role of social workers, teachers, police so that they can extract information they need to evidence a case without breaching data protection law or compromising privacy. This is one of our more technical projects around encryption. And this includes consideration of risk to informants, what can be shared and how, to make sure that there is sharing of neccassary data without exposing others in responsible roles’ as informers on their clients or communities.

Robots bring deep seated problems. They will be something more than machines. They change how we think or interact with technology. To give examples is it appropriate legally, ethically… to give someone suffering with althzeimers a robot that speaks like her husband even if it comforts here? It may be justifiable emotionally but it is a massive deception. Similarly is it ethical to have robots looking like people, should that be another law of robotics.

Meanwhile we have Sensecam devices that automatically take images of their day. Althzeimers patients have been given these to go through their day and work through them with their support worker – to go through their day, remember what they have done, this seems to have benefits for retrieval. They use these devices on dogs too (for more fun purposes). Legally… well in galleries, theatres, movies… photography is banned but should there be an overriding right to take pictures. In Germany public buildings are copyrighted and images cannot be taken. We let guide dogs go where other dogs cannot, maybe this is a similar justification.

And a final example: David Valentine records his performances: “Duellists” and “The Commercial” in public space – demands made on council for CCTV films of his performance for his performers rights. Legally in the UK this is complex!

Q&A

Q1 – Jen Ross, School of Education) In recent release of Google Glass some restaurants and business banned Google Glass and I’m wondering about the social response and impact of these technologies.

A) Google “St Patricks Day Google Glass” for amusing example. One of the concerns I have… these are being designed in health settings and medical settings but are being designed for live blogging. This is sort of a trojan horse for changing privacy laws and expectation. Private time has origins in latin for robbing time from others, we expect to be alone. It’s fine if we are OK to have images taken etc. But without ability to be alone, if privacy is a public good not a private good then we may not want people to give it up so easily. It becomes very complicated. Lots of frivolous uses trying to get public use on the back of essentially medical technologies.

Q2) I worked on a project with Charles Wab on data sharing. A thing I found in that context is that once you’ve released data into that space… you’ve talked about advocacy role of the social worker… but once released how do you retrench into your social role?

A2) It’s not surprising that in case of child abuse evidence was there but have not been shared. Rules have been changed but it still doesn’t work. People find a way around that. If I don’t trust the recording mechanism I don’t share the data. If I’m concerned about use of my data then I don’t write them down any longer. I don’t think all the evidence we’ve found from the social scientists, the political scientists is that technology doesn’t change that. People respond to requests in our approach, not dumping all their data as they just won’t comply in any manner of creative ways. And it’s a distributed system, rather than centralised for the same reason.

Letting your digits do the walking: on the road with Ben Jonson, 1618 & 2013 – Prof James Loxley  and
Dr Anna Groundwater

We are at the beginning of our digital journey in comparison to others who have been talking today. I will tell you a bit about the manuscript we are looking us, it’s significance and the journey we think it could take us on. In 1618 Ben Jonson walked from London to Edinburgh on foot – an extended walk with no evidence until James Loxley came across an account by a walking companion, a treasure trove of primary evidence for researchers, and a window into life along the Great North Road. So I will talk a bit about how we can recreat that world, to understand that using primary and digital resources.

My experience of digital online resources as a user was as a beginner. I physically dug around in regional and national archives along the Great North Road. Digital catalogues have really helped me to do this, it has allowed me to achieve much more and in a much more cost effective manner. Tools like EEBO have helped me speed up the collation of materials online, to gather biographical information alongside literary texts. Most apposite here is EDINA’s Digimap, I’ve been using it on a daily basis, a way to reinterpret and consider networks, social spaces in early modern britain.

And the literature allows us to understand social spaces, social practices. We can look at practices of hospitality at that time, the experience Jonson was having. Welbeck Abbey for instance is discussed in the manuscript, with specific descriptions of taking over the house from Sir William. Also mention of Mr Bonner the Sherief in Newcastle. Some of this text we have been able to verify. We have been able to use OED to understand some of the terminology e.g. hullock, a wine for very important people.

The texts also provide a history of cultural interests, antiquarianism of tourism and travel.Of the places visited, of the castles, buildings and grand houses along the way. And the route taken there. From Belvoir Castle through to Pettifour Well in Kinghorn. So Edinburgh castle, for instance, was one of his stops. We can use art and images of that era to recreate that voyage. We can physically make these journals, but we can make these journals digitally too. The digital journey remaking the mental and physical connections of that historical journey.

Over to James: I will touch on the dimensions of the project which have emerged as we have been going along. Dimensions of which we have become aware. This was a digital project right from the start, since we have been talking about the project and the manuscript, many have asked about how the manuscript came to light and why this has happened now. The story is a disappointing one. In fact it involved me sitting down to consider the potential for a set of digitised set of catalogues, done by the National Archives, which are catalogues of archives around the UK in a project called Access to Archive. This allowed discovery of collection and structure of collections. I was looking through materials and how they worked, I was able to find literary manuscripts and where it sat in the collection… seemed to refer to Ben Jonson but the spelling was such that no one searching would have found it. There was no rummaging in archive attics. But we have been further exploring digital dimensions.

Because we have a journey here, because it is not like Boswell’s account of Samual Johnson but is instead a list of people, places, food, etc. We can see dimensions that are not classically those that a literary scholar are looking for, what we see as a quantifiable text I suppose. For instance an account talks of the time a journey began, the time of arrival, the locations. And can work out the distance of 9.5 miles, a time of 3 hours, what the walking pace was. Jonson seems to be at about 3.17 mph (modern human average 3.3 mph). An interesting one since Jonson in his own notes says he is around 20 stone. maybe something is not quite right there?

We don’t know who wrote the account, we have candidates but the companion is still anonimous. We can work out the height of the companion using surviving architectural drawings of a venue visited. We can work out that he is 5’5!

We are inevitably working with small data here. We have places, times, distances, speed etc. allows us to visualise the journey in ways we maybe would not have been able to do before, a manifestation beyond the annotated text. We’ve initially been exploring that in terms of a map. (see blogs.hss.ed.ac.uk/ben-jonsons-walk) This inital map on our website gives a sense of places visited (via map pins) and on those pins we include the time they were there and notes which is growing as metadata (excellent sweet water at York!). This is a starting point to begin to map out the data the walk has presented us with. This is really at “rehearsal” stage. There is a performative aspect to this walk – Jonson is greeted by crowds, by property owners, etc. markers etc. People have told us that we must reenact the walk! So we are doing a virtual walk, on 8th July he will tweet in real time on Twitter, that will be linked into the map and the information on the blog site, an interaction between those channels. Hopefully Ben will get into conversations as he is on his way, that’s part of what we’d like to do!

We are already thinking about the possibilities of expanding this for future projects. There is an example called Mapping the Lakes, a team at University of Lancaster made this tracking Thomas Grey and Coleridge journeys around the lakes, created with a GIS to visualise the walks. They have mapped obvious markers but have also tried to map more subjective things such as mood of the walk. You can look at them separately or together. That seems a way of thinking about the literary journey that we would like to develop for ourselves. We would like to think beyond the map we are “performing” this summer… There is clearly an interplay between sites and routes… some are easier to map and work out than others. In some places there was a guide to take them on their way – very hard to find the obvious route. Thinking also about how the mapping of the journey could bring in different possibilities, views, prospects, meaning of sites, etc. We haven’t represented those on the map but we would love to, particularly to compare their walk to modern walks. How do different models of the walk undertaken “for the sake of it” compare? And how can we take that walk, preserve that experience, feed in other materials etc. We hope to be able to approach the AHRC for follow on funding and we would love to talk to anyone interested in the spatiality of walking who might be interested in engaging.

Q&A

Q1) A connection: Joseph Burlaff, an artist in the US, recreated Gandhi’s walk using a treadmill and hooked up to Second Life avatar and reproduced that there… possible digital precursor

A1) Interesting possibility. Could get gradients in perhaps. There are analogues or comparitives out there to explore. There is a deepening tension and intensifying interest in the process and practice of walking. And how that carries with it expectations and kinds of appropriate representational modelling, do some justice to spatiality but not assuming a single model is all that we need.. need to weave different senses of the spatial within literary walk.

Q2 – Rocio) Comment on idea of the walk: making a collective walk, ask people in surrounding areas to do a bit of it, make it interactive and add their part of the journey… If you can’t do it yourself.

A) Exactly what we hope to do. Want to bring in local history societies and walking groups etc. on the old roads and feed that in.

Old light on new media: medieval practices in the digital age – Dr Eyal Poleg

We are working on a project called Manuscript Studies in an Interoperable Digital Environment funded by the Mellon Foundation. We have found interesting parallels between the reading of medieval manuscripts and medieval practices. Perhaps we can learn from Medieval practices to think about developing digital practices. In many ways printed books are an interim step here between practices we see across old and new media.

Lets start with hypertext. Hypertext is very common in medieval manuscripts, particularly in the Bible. The problem with the New Testament is the Gospels, how do you jump from one to another. You can explore a version at University of Toronto for instance. But in the manuscript era we get the usepian cannons, in the margins of each episode the usepian cannons and use the tables to jump from one to another, very similar to click on a link. This starts something new in exploring the text.

In the 12th Century there is a beautiful text in France. It is a working manuscripts. It has physical cut and paste. It shows the authors wrestling with technology, with experiments in navigating the text. Inventing references. And they tie that to the “late medieval bible” – Gutenberg bible is a replication of one of these bibles. The innovation of these bibles is evident in the chapter division, previously no divisions in the text. From 1230 onwards, with help of Stephen Langton the Archbishop of Canterbury, we have the chapter divisions. And we begin to get Book and Chapter divisions. This fits into mindset of Christian Exegetists at the time of the linkages within the bible. But this linking etc. took off like wildfile – the most efficient way to link and navigate. When we think about hypertext in the Medieval we have to also think of the web of illusions that people also had. So when reading a text, for example a psalter, there is an interaction of text, image and sound. For monks reading the text created a world of illusion. So we can, using digital technology, replicate that to an extent. By adding musical strata of the text, intricate links that evoke the memory of the men and women who would read these texts.

The wiki is a structure we also see in medieval texts. Even now the interaction one has with a printed book is limited. In Middle ages books were different, they were communal objects even for the monks. Annotations were seen to add value to the text, a communal project to read the text. You can read generations of commentators through the margins of the text. The way it took place.. and this is worth considering… is by giving amply space to interact, to comment on the text. Space deliberately left, intermedial and marginal glosses, spaces for comments and annotation. You can see the different hands, texts, monks reflected in the communal commenting on the text. And you see some commentators responding to each other. In one manuscript in Glasgow an O character has been vandalised, a later reader finds this offensive, erases for future readers… so how much can readers interact, erase, changes to the text do we allow? That would have been a nice image…

There is also a sort of Open Code emerging in manuscripts. a Printed book is not that open. But looking across the same manuscripts we see differences – some are errors or changes by the scribe. In the medieval ages the scribe assumes the text could have been faulty, they try to correct them, the text was in flux. Scholars use this to reproduce the text and we can also explore connections between one manuscripts and another. But of course what is a text? What is a changed text? What is a fixed text?

And finally we have non linear texts here, this can be created now in digital environments. Not necessarily beginning, middle and end. Navigation can be very different. For instance a medieval teaching manual uses images and associated ideas to explore but these are non linear, the image point us in directions within the text. And this ties into a late medieval aesthetic vision of ellisons. The idea of a network of ellisions.

Q&A

Q1) This is a fascinating talk, there are several very orchestrated ways to explore medieval manuscripts that this relates to. You touch on websites reflecting print books, not neccasarily taking advantage of the multimodal opportunities of the web.

A1) That was the starting point to the project. Mellon saw medievel manuscripts increasingly being digitised but that people were using them as printed texts and it wanted to look at new ways of working. So for instance you can see the Summarium, a prototype that uses TEI annotating a non-linear version of the texts, in a communal way.

Q2) Is there a connection between the idea of hypertext in medieval texts and the role of the church as an information system. There have been times where the physical church acted as an information system for state information etc. I’m not sure if that is true of the medieval era.

A2) In the middle ages, unlike the reformation, this is less about inforcement and more about the reality of texts. You live the texts. Monks especially live and breath the text and information. You wake and pray 7 times a day, you are surrounded by images, you are embedded within the textuality.

Q3) Do you find any dilution of the text transferring them to digital technologies? I am sure that institutions are very careful about this

A3) This is not an issue for us. The texts are not of interest to religious institutions today. Very early or very later texts might be an issue but these are not an issue

Q4) Have you ever come across work on roman law reception in the middle ages in codex, I think he came to similar conclusions analysing legal texts as hypertext and wikis. He has a secular models of the same phenomenon

A4) Yes I wasn’t aware of that but I will be interested to have the references. The manuscript texts were a little behind legal texts but it would be very interesting to compare.

And now onto the closing from Sian Bayne saying that it really has been a day of new ideas, very inspiring. And thank yous to the audience and the organisers and of course to all of our speakers.

 

DeliciousShare/Bookmark