Digital Scholarship Day of Ideas 2018 – Live Blog

Today I am at the Digital Scholarship Day of Ideas, organised by the Digital Scholarship programme at University of Edinburgh. I’ll be liveblogging all day so, as usual, I welcome additions, corrections, etc. 

Welcome & Introduction – Melissa Terras, Professor of Digital Cultural Heritage, University of Edinburgh

Hi everyone, it is my great pleasure to welcome you to the Digital Day of Ideas 2018 – I’ve been on stage here before as I spoke at the very first one in 2012. I am introducing the day but want to give my thanks to Anouk Lang and Professor James Loxley for putting the event together and their work in supporting digital scholarship. Today is an opportunity to focus on digital research methods and work.

Later on I am pleased that we have speakers from sociology and economic sociology, and the nexus of that with digital techniques, areas which will feed into the Edinburgh Futures Institute. We’ll also have opportunity to talk about the future of digital methods, and particularly what we can do here to support that.

Lynn Jameson – Introduction

Susan Halford is professor of sociology but also director of the institution-wide Web Science Institute.

Symphonic Social Science and the Future of Big Data Analytics – Susan J Halford, Professor of Sociology & Director of Web Science Institute, University of Southampton

Abstract: Recent years have seen ongoing battles between proponents of big data analytics, using new forms of digital data to make computational and statistical claims about the social world, and many social scientists who remain sceptical about the value of big data, its associated methods and claims to knowledge. This talk suggest that we must move beyond this, and offers some possible ways forward. The first part of the talk takes inspiration from a mode of argumentation identified as ‘symphonic social science’ which, it is suggested, offers a potential way forward. The second part of talk considers how we might put this into practice, with a particular emphasis on visualisation and the role that this could play in overcoming disciplinary hierarchies and enabling in-depth interdisciplinary collaboration.

It’s a great pleasure to be here in very sunny Edinburgh, and to be speaking to such a wide ranging audience. My own background is geography, politics, english literature, sociology and in recent years computer sciences. That interdisciplinary background has been increasingly important as we start to work with data, new forms of data, new types of work with data, and new knowledge – but lets query that – from that data. All this new work raises significant challenges especially as those individual fields come from very different backgrounds. I’m going to look at this from the perspective of sociology and perhaps the social sciences, I won’t claim to cover all of the arts and humanities as well.

My talk today is based on work that I have been doing with Mike Savage on “big data” and the new forms of practice emerging around these new forms of data, and the claims being made about how we understand the social world. In this world there has been something of a stand off between data scientists and social scientists. Chris Anderson (in 2008), a writer for Wired, essentially claimed “the data will speak for itself” – you won’t need the disciplines. Many have pushed back hard on this. The push back is partly methodological: these data do not capture every aspect of our lives, they capture partial traces, often lacking in demographic detail (do we care? sociologists generally do…) and we know little of its promise. And it is very hard to work with this data without computational methods – tools for pattern recognition generally, not usually thorough sociological approaches. And present concerning, something ethically problematic, results that are presented as unproblematic. So, this is highly challenging. John Goldthorpe says “whatever big data may have for “knowing capitalism” it’s value to social science has… remained open to questions…”.

Today I want to move beyond that stand out. The divisiveness and siloing of disciplines is destructive for the disciplines – it’s not good for social science and it’s not good for big data analytics either. From a social science perspective, that position marginalises social sciences, sociology specifically, and makes us unable to take part in this big data paradigm which – love it or loathe it – has growing importance, influence, and investment. We have to take part in this for three major reasons: (1) it is happening anyway – it will march forward with or without it; (2) these new data and methods do offer new opportunities for social sciences research and; (3) we may be able to shape big data analytics as the field emerges – it is very much in formation right now. It’s also really bad for data science not to engage with the social sciences… Anderson and others made these claims ten years ago… Reality hasn’t really shown that happen. In commercial contexts – recommendations, behaviour tracking and advertising, the data and analysis is doing that. But in actually drawing understanding from the world, it hasn’t really happened. And even the evangelists have moved on… Wired itself has moved to saying “big data is a tool, but should not be considered the solution”. Jeff Hammerbacker (co-credited for coining the term “data science” in 2008, said in 2013 “the best minds of my generation are thinking about how to make people click ads… that sucks”.

We have a wobble here, a real change in the discourse. We have a call for greater engagement with domain experts. We have a recognition that data are only part of the picture. We need to build a middle ground between those two positions of data science and social science. This isn’t easy… It’s really hard for a variety of reasons. There are bodies buried here… But rather than focus on that, I want to focus on how we take big steps forward here…

The inspiration here are three major social science projects: Bowling Alone (Robert Putnam); The Spirit Level – Richard Wilkinson and Kate Pickett; Capital – Thomas Piketty. These projects have made huge differences, influencing public policy and in the case of Bowling Alone, really reshaped how governments make policy. These aren’t by sociologists. They aren’t connected as such. The connection we make in our paper is that we see a new style of social science argumentation – and we see it as a way that social scientists may engage in data analytics.

There are some big similarities between these books. They are all data driven. Think about sociologists at the end of 20th century was highly theoretical… At the beginning of the 21st century we see data driven works. And they haven’t done their own research generating data here, they have drawn on existing research data. Piketty has drawn together diverse tax data… But also Jane Austen quotes… Not just mixed methods but huge repurposing. These books don’t make claims for causality based on data, their claims for causality is supported by theory. However they present data throughout and supporting their arguments. Data is key, with images to hold the data together. There is a “visual consistency”. The books each have a key graph that essentially summarises the book. Putnam talks about social capital, Piketty talks about the rise and fall of wealth inequality in the 20th century.

In each of these texts data, method and visualisation are woven into a repeat refrain, combined with theory as a composite whole to makes powerful arguments about the nature of social life and social change over the long term. We call this a “Symphonic Aesthetic” as different instruments and refrains build, come in and go… and the whole is greater than the sum of the parts.

OK, thats an observation about the narrative… But why does that matter? We think it’s a way to engage with and disrupt big data. There are similarities: re-purposing multiple and varied “found” data sources; an emphasis on correlation; use of visualistion. There are differences too: theoretical awareness; choice of data; temporality is different – big data has huge sets of data looking at tiny focused and often real time moments. Social Science takes long term comparisons – potentially over 100 years. The role of correlation is different. Big data analytics looks for a result (at least in the early stage), in symphonic aesthetics there is a real interest in correlation through statistical and theoretical understandings. Practice of visualisation varies as well. In big data it is the results, in symphonic aesthetics it is part of the process, not the end of the process.

Those similarities are useful but there is much still to do: symphonic authors do not use new forms of digital data, their methods cannot simply be applied, big data demand new and unfamiliar skills and collaborations. So I want to talk about the prospective direction of travel around data; method; theory; visualisation practice.

So, firstly, data. If we talk about symphonic aesthetics we have to think about critical data pragmatism. That is about lateral thinking – redirection of what data exist already. And we have to move beyond naivety – we cannot claim they are “naturally occurring” mirrors/telescopes etc. They are deliberately social-technical constructions. And we need to understand what the data are and what they are not: socio-technical processes of data construction (eg carefully constructed samples); understanding and using demographic biases (go with the biases and use the data as appropriate, rather than claiming they are representative; or maybe ignore that, look at network construction, flows, mobilities – e.g. John Murrey’s work).

Secondly method. We have to be methodologically plural. Normally we do mixed methods – some quantitative, some qualitative. But most of us aren’t yet trained for computational methods, and that is a problem. Many of the most interesting things about these data – their scale, complexity etc. – are not things we can accommodate in our traditional methods. We need to extend our repertoire here. So social network analysis has a long and venerable history – we can apply the more intensive smaller version of large scale social network analysis. But we also need machine learning – supervised (with training sets) and unsupervised (without). This allows you to seek evidence of different perhaps even contradictory patterns. But also machine learning can help you find the structures and patterns in the data – which you may well not know in data sets at this scale.

We have this quote from Ari Goldberg (2015): “sociologists often round up the usual suspects. They enter the metaphorical crime scene every dat, armed with strong and well-theorised hypotheses about who the murderer should or at least plausibly might be.”

To be very clear I am not suggesting we outsource analysis to computational methods: we need to understand what the methods are doing and how.

Thirdly, theory. We have to use abductive reasoning – a constant interplay between data, method and theory. Initial methods may be informed by initial hunches, themes, etc. We might use those methods to see if there is something interesting there… Perhaps there isn’t, or perhaps you build upon this. That interplay and iterative process is, I suspect, something sociologists already do.

So, how do we bring this all together in practice? Most sociologists do not have a sophisticated understanding of the methods; and most computer scientists may understand the methods but not the theoretical elements. I am suggesting something end to end, with both sociologists and computer scientists working together.

It isn’t the only answer but I am suggesting that visualisation becomes an analytical method, rather than a “result”. And thinking about a space for work where both sociological and computer science expertise are equally valid rather than combatorial. At best visualisations are “instruments for reasoning about quantitative information. Often the most effective way to describe, explore and summarise a set of numbers – even a very large set – is to look at pictures of those numbers” (Tufte 1998). Visualisations as interdisciplinary boundary objects. Beyond a mode of argumentation… visualisation becomes a mode of practice.

An example of this was a visualisation of the network of a hashtag that was collaborative with my colleague Ramin, which developed over time as we asked each other questions about how the data was presented and what that means…

In conclusion, sociology flourished in the C20th. Developing methods, data and theory that gave us expertise in “the social” (a near monopoly). This is changing – new forms of data, new forms of expertise… And claims being made which we may, or may not, think are valid. And that stands on the work of sociologists. But there is some promise in the idea of symphonic aesthetic: for data science – data science has to be credible and there is recognition of that – see for instance Cathy O’Neil’s work on data science, “Weapons of Math Destruction” which also pushes in this direction. ; for sociological research – but not all of it, these won’t be the right methods for everyone; for public sociology – this being used in lots of ways already, algorithm sentencing debates, Cambridge Analytics… There is a real place for sociologists to reshape sociology in the public understanding. There are big epistemological implications here… Changing the data and methods changes what we study… But it has always been like that. Big data can do something different – not necessarily better, but different.

Q&A

Q1) I was really interested in your comments about visualisations as a method… Joanna Drucker talks about visual technology and visual discourse – and issues of visualisations as being biased towards positivistic approaches, and advocates for getting involved in the design of visualisation tools.

A1) I’m familiar with these concepts. That work I did with Ramin is early speculative work… But it builds and is based on classic social network analysis so yes, I agree, that reflects some issues.

Q2 – Tim Squirrel) I guess my question is about the trade off between access and making meaningful critiques. Often sociology is about critiquing power and methods by which power is transmitted. The more data proliferates, the more the data is locked behind doors – like the kind of data Facebook holds. And in order to access that data you ahve to compromise the kinds of critiques you can make. How do you navigate that narrow channel, to make critiques without compromising those…

Q2) The field is quite unsettled… It looks settled a year ago but I think Cambridge Analytica will have major impact… That may make the doors more closed… Or perhaps we will see these platforms – for instance Facebook – understanding that to retain credibility it has to create a segregation between their own use of the data, and research (not funded by Facebook), so that there is proper separation. But I’m not naive about how that will work in practice… Maybe we have to tread a careful line… And maybe that does mean not being critical in all the ways we might be, in every paper. Empirical data may help us make critical cases across the diverse range of scholarship taking place.

Q3 – Jake Broadhurst) Data science has been used in the social world already, how do we keep up and remain relevant?

A3) It is a pressing challenge. The academy does not have the scale or capacity to address data science in the way the private sector does. One of the big issues is ethics… And how difficult it is for academics to navigate ethics of social media and social data. And it is right that we are bound to ethical processes in a way data scientists and even journalists do not need to. But it is also absolutely right that our ethics committees have to understand new methods, and the realities of the gold standard consent and other options where that is not feasible.

The discussion we are having now, in the wake of Cambridge Analytica, is crucial. Two years ago I’d ask students what data they felt was collected, they just didn’t know. And understanding that is part of being relevant.

Q4 – Karen Gregory) If you were taking up a sociology PhD next year, how would you take that up?

A4) My official response would be that I’d do a PhD in Web Science. We have a programme at University of Southampton, taking students from a huge array of backgrounds, and giving them all the same theoretical and methodological backgrounds. They then have to have 2 supervisors, from at least 2 different disciplines for their PhD.

Q5 – Kate Orton Johnson) How do we tackle the structures of HE that prevent those interdisciplinary projects, creating space, time, collaborative push to create the things that you describe?

A5) It’s a continuous struggle. Money helps – we’ve had £10m from EPSRC and that really helps. UKRI could help – I’m sceptical but hopeful about interdisciplinary possibilities here. Having PhD supervision across really different disciplines is a beautiful thing, you learn so much and it leads to new things. Universities talk about interdisciplinary work but the reality doesn’t always match up. Money helps. Interdisciplinary research helps. Collaboration on small scales – conference papers etc. also help.

Q6 – David, research in AI and Law) I found your comments about dialogues between data scientists and social scientists… How can you achieve similar with law scholars and data scientists… Especially if trying to avoid hierachichal issues. Law and data science is a really interesting space right now… GDPR but also algorithmic accountability – legal aspects of equality, protected categories, etc. Very few users of big data have faced up to the risks of how they use the data, and potential for legal challenge on the basis of discrimination. You have to find joint enthusiasm areas, and fundable areas, and that’s where you have to start.

The Economics Agora Online: Open Surveys and the Politics of Expertise – Tod van Gunten, Lecturer in Economic Sociology, University of Edinburgh

Abstract: In recent years, research centres in both the United States and United Kingdom have conducted open online surveys of professional economists in order to inform the public about expert opinion.  Media attention to a US-based survey has centred on early research claiming to show a broad policy consensus among professional economists.  However, my own research shows that there is a clear alignment of political ideology in this survey.  My talk will discuss the value and limitations of these online surveys as tools for informing the public about expert opinion.

Workshops: Parallel workshop sessions – please see descriptors below.

  • Text Analysis for the Tech Beginner
  • An Introduction to Digital Manufacture – Mike Boyd (uCreate Studio Manager, UoE)
  • ‘I have the best words’: Twitter, Trump and Text Analysis – Dave Elsmore (EDINA)
  • An Introduction to Databases, with Maria DB & Navicat – Bridget Moynihan (LLC, UoE)
  • Introduction to Data Visualisation in Processing – Jules Rawlinson (Music, ECA, UoE)
  • Jupyter Notebooks and The University of Edinburgh Noteable service – Overview and Introduction – James Reid (EDINA)
  • Obtaining and working with Facebook Data – Simon Yuill (Goldsmiths)

Round Table Discussion

  • Melissa Terras, Professor of Digital Cultural Heritage
  • Kirsty Lingstadt, Head of Digital Library and Depute Director of Library and University Collections
    Ewan McAndrew, Wikimedian in Residence
    Tim Squirell, PhD Student, Science, Technology and Innovation Studies

 

Share/Bookmark

Working with the British Library

This morning I’m at the “Working with the British Library’s Digital Content, Data and Services for your research (University of Edinburgh)” event at the Informatics Forum to hear about work that has been taking place at the British Library Labs programme, and with BL data recently. I’ll be liveblogging and, as usual, any comments, questions, 

Introduction and Welcome – Professor Melissa Terras

Welcome to this British Library Labs event, this is about work that fits into wider work taking place and coming here at Edinburgh. British Library Labs works in a space that is changing all the time, and we need to think about how we as researchers can use digital content and this kind of work – and we’ll be hearing from some Edinburgh researchers using British Library data in their work today.

“What is British Library Labs? How have we engaged researchers, artists, entrepreneurs and educators in using our digital collections” – Ben O’Steen, Technical Lead, British Library Labs

We work to engage researchers, artists, entrepreneurs and educators to use our digital collections – we don’t build stuff, we find ways to enable access and use of our data.

The British Library isn’t just our building in St Pancras, we also have a huge document supply and storage facility in Boston Spa. At St Pancras we don’t just have the collections, we have space to work, we have reading rooms, and we have five underground floors hidden away there. We also have a public mission and a “Living Knowledge Vision” which helps us to shape our work

British Library Labs has been running for four years now, funded by the Andrew Mellow Fund, and we are in our third funded phase where we are trying to make this business as usual… So the BL supports the reader who wants to read 3 things, and the reader who wants to read 300,000 things. To do that we have some challenges to face to make things more accessible – not least to help people deal with the sheer scale of the collections. And we want to avoid people having to learn unfamiliar formats and methodologies which are about the library and our processes. We also want to help people explore the feel of collections, their “shape” – what’s missing, what’s there, why and how to understand that. We also want to help people navigate data in new ways.

So, for the last few years we have been trying to help researchers address their own specific problems, but also trying to work out if that is part of a wider problem, to see where there are general issues. But a lot of what we have done has been about getting started… We have a lot of items – about 180 million – but any count e have is always an estimates. Those items include 14m books, 60m patents, 8m stamps, 3m sound recordings… So what do researchers ask for….

Well, researchers often ask for all the content we have. That hides the failure that we should have better tools to understand what is there, and what they want. That is a big ask, but that means a lot of internal change. So, we try to give researchers as much as we have… Sometimes thats TBs of data, sometimes GBs.. And data might be all sorts of stuff – not just the text but the images, the bindings, etc. If we take a digitised item we have an image of the cover, we have pictures, we have text, we also have OCR for these books – when people ask for “all” the book – is that the images, the OCR or both? One of those is much easier to provide…

Facial recognition is quite hot right now… That was one of the original reasons to access all of the illustrations – I run something called the Mechanical Curator to help highlight those images – they asked if they could have the images – so we now have 120m images on Flickr. What we knew about images was the book, and the page. All the categorisation and metadata now there has been from people and machines looking at the data. We worked with Wikimedia UK to find maps, using manual and machine learning techniques – kind of in competition – to identify those maps… And they have now been moved into georeferencing tools (bl.uk/maps) and fed back to Flickr and also into the catalgue… But that breaks the catalogue… It’s not the best way to do this, so that has triggered conversations within the library about what we do differently, what we do extra.

As part of the crowdsourcing I built an arcade machine – and we ran a game jam with several usable games to categorise or confirm categories. That’s currently in the hallway by the lifts in the building, and was the result of work with researchers.

We put our content out there under CC0 license, and then we have awards to recognise great use of our data. And this was submitted – a video of Hey There Young Sailor official music video using that content! We also have the Off the Map copetition – a curated set of data for undergraduate gaming students based on a theme… Every year there is something exceptional.

I mentioned library catalogue being challenging. And not always understanding that when you ask for everything, that isn’t everything that exists. But there are still holes…. When we look at the metadata for our 19th century books we see huge amounts of data in [square brackets] meaning the data isn’t known but is the best suggestion. And this becomes more obvious when we look at work researcher Pieter Francois did on the collection – showing spikes in publication dates at 5 year intervals… Which reflects the guesses at publication year that tend to be e.g. 1800/1805/1810. So if you take intervals to shape your data, it will be distorted. And then what we have digitised is not representative of that, and it’s a very small part of the collection…

There is bias in digitisation then, and we try to help others understand that. Right now our digitised collections are about 3% of our collections. Of the digitised material 15% is openly licensed. But only about 10% is online. About 85% of our collections cn only be accessed “on site” as licenses were written pre-internet. We have been exploring that, and exploring what that means…

So, back to use of our data… People have a hierachy of needs from big broad questions down to filtered and specific queries… We have to get to the place where we can address those specific questions. We know we have messy OCR, so that needs addressing.

We have people looking for (sometimes terrible) jokes – see Victorian Humour run by Bob Nicholson based on his research – this is stuff that can’t be found with keywords…

We have Kavina Novrakas mapping political activity in the 19th Century. This looks different but uses the same data and the same platform – using Jupyter Notebooks. And we have researchers looking at black abolitionists. We have SherlockNet trying to do image classification… And we find work all over the place building on our data, on our images… We found a card game – Moveable Type – built on our images. And David Normal building montages of images. We’ve had poetic places project.

So, we try to help people explore. We know that our services need to be better… And that our services shape expectations of the data – and can omit and hide aspects of the collections. Exploring data is difficult, especially with collections at this scale – and it often requires specific skills and capabilities.

British Library Labs working with University of Edinburgh and University of St Andrews Researchers

“Text Mining of News Broadcasts” – Dr. Beatrice Alex, Informatics (University of Edinburgh)

Today I’ll be talking about my work with speech data, which is funded by my Turing fellowship. I work in a group who have mainly worked with text, but this project has built on work with speech transcripts – and I am doing work on a project with news footage, and dialogues between humans and robots.

The challenges of working with speech includes particular characteristics: short utterances, interjections; speaker assumptions – different from e.g. newspaper text; turn taking.  Often transcripts miss sentence boundaries, punctuation or missing case distinctions. And there are errors introduced by speech recognition.

So, I’m just going to show you an example of our work which you can view online – https://jekyll.inf.ed.ac.uk/geoparser-speech/. Here you can do real time speech recognition, and this can then also be run through the Edinburgh Geoparser to look for locations and identify their locations on the map. There are a few errors and, where locations haven’t been recognised in the speech recognition they also don’t map well. The steps in this pipeline is speech recognition… ASR then Google Text Restoration, and then text and data mining.

So, at the BL I’ve been working with Luke McKernan, lead curator for news and moving images. I have had access to a small set of example news broadcast files for prototype development. This is too small for testing/validation – I’d have to be onsite at BL to work on the full collection. And I’ve been using the CallHome collection (telephone transcripts) and BBC data which is available locally at Informatics.

So looking at an example we can see good text recognition. In my work I have implemented a case restoration step (named entities and sentence initials) using rule based lexicon lookup, and also using Punctuator 2 – an open source tool which adds punctuation. That works much better but isn’t up to an ideal level there. Meanwhile the Geoparser was designed for text so works well but misses things… Improvement work has taken place but there is more to do… And we have named entity recognition in use here too – looking for location, names, etc.

The next steps is to test the effect of ASR quality on text mining – using CallHome and BBC broadcast data) using formal evaluation; improve the text mining on speech transcript data based on further error analysis; and longer term plans include applications in the healthcare sector.

Q&A

Q1) Could this technology be applied to songs?

A1) It could be – we haven’t worked with songs before but we could look at applying it.

“Text Mining Historical Newspapers” – Dr. Beatrice Alex and Dr. Claire Grover, Senior Research Fellow, Informatics (University of Edinburgh) [Bea Alex will present Claire’s paper on her behalf]

Claire is involved in an Adinistrative Data Research Centre Scotland project looking at local Scottish Newspapers, text mine it, and connect it to other work. Claire managed to get access to the BL newspapers through Cengage and Gale – with help from the University of Edinburgh Library. This isn’t all of the BL newspaper collection, but part of it. This collection of data is also now available for use by other researchers at Edinburgh. Issues we had here ws that access to more reent newspaper is difficult, and the OCR quality. Claire’s work focused on three papers in the first instance, from Aberdeen, Dundee and Edinburgh.

Claire adapted the Edinburgh Geoparser to process the OCR format of the newspapers and added local gazetteer resouces fro Aberdeen, Dundee and Edinburgh from OS OpenData. Each article was then automatically annotated with paragraph, sentence, work mark-up; named entities – people, place, organisation; location; geo coordinates.

So, for example, a scanned item from the Edinburgh Evening News from 1904 – its not a great scan but the OCR is OK but erroneous. Named entities are identified, locations are marked. Because of the scale of the data Claire took just one year from most of the papers and worked with a huge number of articles, announcments, images etc. She also drilled down into the geoparsed newspaper articles.

So for Abereen in 1922 there were over 19 million word/punctuation tokens and over 230,000 location mentions Then used frequency methods and concordances to understand the data. For instance she looked for mentions of Aberdeen placenames by frequency – and that shows the regions/districts of abersteen – Torry, Woodside, and also Union Street… Then Claire dug down again… Looking at Torry the mentions included Office, Rooms, Suit, etc, which gives a sense of the area – a place people rented accommoation in. In just the news articles (not ads etc) then for Torry it’s about Council, Parish, Councillor, politics, etc.

Looking at Concordances Claire looked at “fish”, for instance” to see what else was mentioned and, in summary, she noted that the industry was depressed after WW1; there was unemployment in Aberdeen and the fishing towns of Aberdeenshire; that there was competition rom German trawlers landing Icelandic fish; that there were hopes to work with Germany and Russia on the industry; and that government was involved in supporting the industry and taking action to improve it.

With the Dundee data we can see the Topic Modelling that Claire did for the articles – for instance clustering of cars, police, accidents etc; there is a farming and agriculture topic; sports (golf etc)… And you can look at the headlines from those topics and see how that reflect the identified topics.

So, next steps for this work will include: improving text analysis and geoparsing components; get access to more recent newspapers – but there is issing infrastructure for larger data sets but we are working on this; scale up the system to process whole data set and store text ining output; tools to summarise content; and tools for search – filtering by place, data, linguistic context – tools beyond the command line.

“Visualizing Cultural Collections as a Speculative Process” – Dr. Uta Hinrichs, Lecturer at the School of Computer Science (University of St Andrews)

“Public Private Digitisation Partnerships at the British Library” – Hugh Brown, British Library Digitisation Project Manager

“The Future of BL Labs and Digital Research at the Library” – Ben O’Steen

Conclusion and wrap up

 

Share/Bookmark

Data Fest Data Summit 2018 – Day Two LiveBlog

Today I am back at the Data Fest Data Summit 2018, for the second day. I’m here with my EDINA colleagues James Reid and Adam Rusbridge and we are keen to meet people interested in working with us, so do say hello if you are here too! 

I’m liveblogging the presentations so do keep an eye here for my notes, updated throughout the event. As usual these are genuinely live notes, so please let me know if you have any questions, comments, updates, additions or corrections and I’ll update them accordingly. 

Intro to Data Summit Day 2 – Maggie Philbin

We’ve just opened with a video on Ecometrica and their Data Lab supported work on calculating water footprints. 

I’d like to start by thanking our sponsors, who make this possible. And also I wanted to ask you about your highlights from yesterday. These include Eddie Copeland from Nesta’s talk, discussion of small data, etc. 

Data Science for Societal Good — Who? What? Why? How? –  Kirk Borne, Principal Data Scientist and Executive Advisor, Booz Allen Hamilton

Data science has a huge impact for the business world, but also for societal good. I wanted to talk about the 5 i’s of data science for social good:

  1. Interest
  2. Insight
  3. Inspiration
  4. Innovation
  5. Ignition

So, the number one, is the Interest. The data can attrat people to engage with a problem. Everything we do is digital now. And all this information is useful for something. No matter what your passion, you can follow this as a data scientist. I wanted to give an example here… My background is astrophysics and I love teaching people about the world, but my day job has always been other things. About 20 years ago I was working in data science at NASA and we saw an astronomical – and I mean it, we were NASA – growth in data. And we weren’t sure what to do with it, and a colleague told me about data mining. It seemed interesting but I just wasn’t getting what the deal was. We had a lunch talk from a professor at Stanford, and she came in and filled the board with equations… She was talking about the work they were doing at IBM in New York. And then she said “and now I’m going to tell you about our summer school” – where they take kids from inner city kids who aren’t interested in school, and teach them data science. Deafening silence from the audience… And she said “yes, we teach the staff data mining in the context of what means most for these students, what matters most. And she explained: street basketball. So IBM was working on a software called IBM Advanced Calc specifically predicting basketball strategy. And the kids loved basketball enough that they really wanted to work in math and science… And I loved that, but what she said next changed my life.

My PhD research was on colliding galaxy. It was so exciting… I loved teaching and I was so impressed with what she had done. These kids she was working with had peer pressure not to be academic, not to study. This school had a graduation rate of less than 50%. Their mark of success for their students was their graduation rate – of 98%. I was moved by that. I felt that if this data science has this much power to change lives, that’s what I want to do for the rest of my lives. So my life, and those of my peers, has been driven by passion. My career has been as much about promoting data literacy as anything else.

So, secondly, we have insight. Traditionally we collect some data points but we don’t share this data, we are not combining the signals… Insight comes from integrating all the different signals in the system. That’s another reason for applying data to societal good, to gain understanding. For example, at NASA, we looked at what could be combined to understand environmental science, and all the many applications, services and knowledge that could be delivered and drive insight from the data.

Number three on this list is Inspiration. Inspiration, passion, purpose, curiousity, these motivate people. Hackathons, when they are good, are all about that. When I was teaching the group projects where the team was all the same, did the worst and least interestingly. When the team is diverse in the widest sense – people who know nothing about Python, R, etc. can bring real insights. So, for example my company run the “Data Science Bowl” and we tackle topics like Ocean Health, Heart Health, Lung Cancer, drug discovery. There are prizes for the top ten teams, this year there is a huge computing prize as well as a cash prize. The winners of our Heart Health challenge were two Wall Street Quants – they knew math! Get involved!

Next, innovation. Discovering new solutions and new questions. Generating new questions is hugely exciting. Think about the art of the possible. The XYZ of Data Science Innovation is about precision data, precision for personalised medicine, etc.

And fifth, ignition. Be the spark. My career came out of looking through a telescope back when I lived in Yorkshire as a kid. My career has changed, but I’ve always been a scientist. That spark can create change, can change the world. And big data, IoT and data scientists are partners in sustainability. How can we use these approaches to address the 17 Sustainability Development Goals. And there are 229 Key Performers Indicators to measure performance – get involved. We can do this!

So, those are the five i’s. And I’d like to encapsulate this with the words of a poet…. Data scientists – and that’s you even if you don’t think you are one yet. You come out of the womb asking questions of the world. Humans do this, we are curious creatures… That’s why we have that data in the first place! We naturally do this!

“If you want to build a ship, don’t drum up people to gather wood adn don’t assign them tasks and work, but rather teach them to yearn for the vast and endless sea” – Antoine de Saint-Exupery.

This is what happened with those kids. Teach people to yearn for the vast and endless sea, then you’ll get the work done. Then we’ll do the hard work

Slides are available here: http://www.kirkborne.net/DataFest2018/

Q&A

Comment, Maggie Philbin) I run an organisations, Teen Tech, and that point that you are making of start where the passion actually is, is so important.

KB) People ask me about starting in data science, and I tell them that you need to think about your life, what you are passionate about and what will fuel and drive you for the rest of your life. And that is the most important thing.

Q1) You touched on a number of projects, which is most exciting?

A1) That’s really hard, but I think the Data Bowl is the most exciting thing. A few years back we had a challenge looking at how fast you can measure “heart ejection fraction – how fast the heart pumps blood out” but the way that is done, by specialists, could take weeks. Now that analysis is built into the MRI process and you can instantly re-scan if needed. Now I’m an astronomer but I get invited to weird places… And I was speaking to a conference of cardiac specialists. A few weeks before my doctor diagnosed me with a heart issue…. And that it would take a month to know for sure. I only got a text giving me the all clear just before I was about to give that talk. I just leapt onto that stage to give that presentation.

The Art Of The Practical: Making AI Real – Iain Brown, Lead Data Scientist, SAS

I want to talk about AI and how it can actually be useful – because it’s not the answer to everything. I work at SAS, and I’m also a lecturer at Southampton University, and in both roles look at how we can use machine learning, deep learning, AI in practical useful ways.

We have the potential for using AI tools for good, to improve our lives – many of us will have an Alexa for instance – but we have to feel comfortable sharing our data. We have smart machines. We have AI revolutionising how we interact with society. We have a new landscape which isn’t about one new system, but a whole network of systems to solve problems. Data is a selleble asset – there is a massive competitive advantage in storing data about customers. But especially with GDPR, how is our data going to be shared with organisations, and others. That matters for individuals, but also for organisations. As data scientists there is the “can” – how can the data be used; and the “should” – how should the data be used. We need to understand the reasons and value of using data, and how we might do that.

I’m going to talk about some exampes here, but I wanted to give an overview too. We’ve had neural networks for some time – AI isn’t new but dates back to the 1950s. .Machine learning came in in the 1980s, deep learning in the 2010s, and cognitive computing now. We’ve also had Moore’s Law changing what is theoretically possible but also what is practically feasible over that time. And that brings us to a definition “Artificial Intelligence is the science of training systems to emulate human tasks through learning and automation”. That’s my definition, you may have your own. But it’s about generating understanding from data, that’s how AI makes a difference. And they have to help the decision making process. That has to be something we can utilise.

Automation of process through AI is about listening and sensing, about understanding – that can be machine generated but it will have human involvement – and that leads to an action being made. For instance we are all familiar with taking a picture, and that can be looked at and understood. For instance with a bank you might take an image of paperwork and passports… Some large banks check validity of clients with a big book of pictures of blacklisted people… Wouldn’t it be better to use systems to achieve that. Or it could be a loan application or contract – they use application scorecards. The issue here is interpretability – if we make decisions we need to know why and the process has to be transparent so the client understands why they might have been rejected. You also see this in retail… Everything is about the segment of one. We all want to be treated as individuals… How does that work when you are one of millions of individuals. What is the next thing you want? What is the next thing you want to click on? Shop Directory, for instance, have huge ranges of products on their website. They have probably 500 pairs of jeans… Wouldn’t it be better to apply their knowledge of me to filter and tailor what I see? Another example is the customer complaint on webchat. You want to understand what has gone wrong. And you want to intervene – you may even want to do that before they complain at all. And then you can offer an apology.

There are lots of applications for AI across the board. So we are supporting our customers on the factors that will make them successful in AI, data, compute, skillset. And we embed AI in our own solutions, making them more effective and enhancing user experience. Doing that allows you to begin to predict what else might be looked at, based on what you are already seeing. We also provide our customers with extensible capabilities to help them meet their own AI goals. You’ll be aware of Alpha Go, it only works for one game, and that’s a key thing… AI has to be tailored to specific problems and questions.

For instance we are working on a system looking at optimising the experience of watching sports, eliminating the manual process of tagging in a game. This isn’t just in sport, we are also working in medicine and in lung cancer, applying AI in similar 3D imaging ways. When these images can be shared across organisations, you can start to drive insights and anomalies. It’s about collaborating, bringing data from different areas, places where an issue may exist. And that has social benefit of all of us. Another fun example – with something like wargaming you can understand the gamer, the improvements in gameplay, ways to improve the mechanics of how game play actually works. It has to be an intrinsic and extrinsic agreement to use that data to make that improvement.

If you look at a car insurer and the process and stream of that, that’s typically through a call centre. But what if you take a picture of the car as a way to quickly assess whether that claim will be worth making, and how best to handle that claim.

I value the application, the ways to bring AI into real life. How we make our experiences better. It’s been attributed to Voltaire, and also to Spiderman, that “with great power comes great responsibility”. I’d say “with great data power comes great responsibility” and that we should focus on the “should” not the “could”.

Q&A

Comment) A correction on Alpha Go: Alpha Zero plays Chess etc. It’s without any further human interaction or change.

Q1) There is this massive opportunity for collaboration in Scotland. What would SAS like to see happen, and how would you like to see people working together?

A1) I think collaboration through industry, alongside academia. Kirk made some great points about not focusing on the same perspectives but on the real needs and interest. Work can be siloed but we do need to collaborate. Hack events are great for that, and that’s where the true innovation can come from.

Q2) What about this conference in 5 years time?

A2) That’s a huge question. All sorts of things may happen, but that’s the excitement of data science.

Socially Minded Data Science And The Importance Of Public Benefits – Mhairi Aitken, Research Fellow, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh

I have been working in data science and public engagement around data and data science for about eight years and things have changed enormously in that time. People used to think about data as something very far from their everyday lives. But things have really changed, and people are aware and interested in data in their lives. And now when I hold public events around data, people are keen to come and they mention data before I do. They think about the data on their phones, the data they share, supermarket loyalty cards. These may sound trivial but I think they are really important. In my work I see how these changes are making real differences, and differences in expectations of data use – that it should be used ethically and appropriately but also that it will be used.

Public engagement with data and data science has always been important but it’s now much easier to do. And there is much more interest from funders for public engagement. That is partly reflecting the press coverage and public response to previous data projects, particularly NHS data work with the private sector. Public engagement helps address concerns and avoid negative coverage, and to understand their preferences. But we can be even more positive with our public engagement, using it to properly understand how people feel about their data and how it is used.

In 2016 myself and colleagues undertook a systematic review of public responses to sharing and linking of health data for research purposes (Aitken, M et al 2016 in BMC medical ethics, 17 (1)). That work found that people need to understand how data will be used, they particularly need to understand that there will be public benefit from their data. In addition to safeguards, secure handling, and a sense of control, they still have to be confident that their data will be used for public benefits. They are even supportive if the benefit is clear but those other factors are faulty. Trust is core to this. It is fundamental to think about how we earn public trust, and what trust in data science means.

Public trust is easy to define. But what about “public benefit”. Often when people call about data and benefits from data. People will talk about things like Tesco Clubcard when they think of benefit from data – there is a direct tangible benefit there in the form of vouchers. But what is the public benefit in a broader and less direct sense. When we ask about public benefit in the data science community we often talk about economic benefits to society through creating new data-driven innovation. But that’s not what the public think about. For the public it can be things like improvements to public services. In data-intensive health research there is an expectation of data learning to new cures or treatments. Or that there might be feedback to individuals about their own conditions or lifestyles. But there may be undefined or unpredictable potential benefits to the public – it’s important not to define the benefits too narrowly, but still to recognise that there will be some.

But who is the “public” that should benefit from data science? Is that everyone? Is it local? National? Global? It may be as many as possible but what is possible and practical? Everyone whose data is used? That may not be possible. Perhaps vulnerable or disadvantaged groups? Is it a small benefit for many, or a large benefit for a small group.  Those who may benefit most? Those who may benefit the least? The answers will be different for different data science projects. That will vary for different members of the public. But if we only have these conversations within the data science community we’ll only see certain answers, we won’t hear from groups without a voice. We need to engage the public more with our data science projects.

So, closing throughts… We need to maintain a social license for data science practices and that means continual reflection on the conditions for public support. Trust is fundamental – we don’t need to make the public trust us, we have to actually be trustworthy and that means listening, understanding and responding to concerns, and being trustworthy in our use of data. Key to this is finding public benefits of data science projects. In particular we need to think about who benefits from data science and how benefits can be maximised across society. Data scientists are good at answering questions of what can be done but we need to be focusing on what should be done and what is beneficial to do.

Q&A

Q1) How does private industry make sure we don’t leave people behind?

A1) BE really proactive about engaging people, rather than waiting for an issue to occur. Finding ways to get people interested. Making it clear what the benefits are to peoples lives There can be cautiousness about opening up debate being a way to open up risk. But actually we have to have those conversations and open up the debate, and learn form that.

Q2) How do we put in enough safeguards that people understand what they consent to, without giving them too much information or scaring them off with 70 checkboxes.

A2) It is a really interesting question of consent. Public engagement can help us understand that, and guide us around how people want to consent, and what they want to know. We are trying to answer questions where we don’t always have the answers – we have to understand what people need by asking them and engaging them.

Q3) Many in the data community are keen to crack on but feel inhibited. How do we take the work you are doing and move sooner rather than later.

A3) It is about how we design data science projects. You do need to take the time first to engage with the public. It’s very practical and valuable to do at the beginning, rather than waiting until we are further down the line…

Q3) I would agree with that… We need to do that sooner rather than later rather than being delayed deciding what to do.

Q4) You talked about concerns and preferences – what are key concerns?

A4) Things you would expect on confidentiality, privacy, how they are informed. But also what is the outcome of the project – is it beneficial or could they be discriminatory, or have a negative impact on society? It comes back to causing public benefits – they want to see outcomes and impact of a piece of work.

 

Automated Machine learning Using H2O’s Driverless AI – Marios Michailidis, Research Data Scientist, H2O.ai

I wanted to start with some of my own background. And I wanted to talk a bit about Kaggle. It is the world’s biggest preictive modelling competition platform with more than a million members. Companies host data challenges and competitors from across the world compete to solve them for prizes. Prizes can be monetary, or participation in conferences, or you might be hired by companies. And it’s a bit like Tennis – you gain points and go up in the ranking. And I was able to be ranked #1 out of a half million members t here.

So, a typical problem is image classification. Can I tell a cat from a dog from an image. That’s very doable, you can get over 95% accuracy and you can do that with deep learning and neural net. And you differentiate and classify features to enable that decision. Similarly a typical problem may be classifying different bird song from a sound recording – also very solvable. You also see a lot of text classification problems… And you can identify texts from a particular writers by their style and vocabulary (e.g. Voltaire vs Moliere). And you see sentiment analysis problems – particularly for marketing or social media use.

To win these competitions you need to understand the problem, and the metric you are being tested on. For instance there was an insurance problem where most customers were renewing, so there was more value in splitting the problem into two – one for renewals, and then a model for others. You have to have a solid testing procedure – really strong validation environment that reflects what you are being tested on. So if you are being tested on predictions for 3 months in the future, you need to test with past data, or test that the prediction is working to have the confidence that what you do will be appropriately generalisable.

You need to handle the data well. Your preprocessing, your feature engineering, which will let you get the most out of your modelling. You also need to know the problem-specific elements and algorithms. You need to know what works well. But you can look back for information to inform that. You of course need access to the right tools – the updated and latest software for best accuracy. You have to think about the hours you put in and how you optimize them. When I was #1 I was working 60 hours on top of my day job!

Collaborate – data science is a team sport! It’s not just about splitting the work across specialisms, it’s about uncovering new insights by sharing different approaches. You gain experience over time, and that lets you focus your efforts on where you can focus your effort for the best gain. And then use ensembling – combine the methods optimally for the best performance. And you can automate that…

And that brings us to H2O’s diverless AI which automates AI. It’s an AI that creates AI. It is built by a group of leading machine learning engineers, academics, data scientists, and kaggle Grandmasters. It handles data cleaning and feature engineering. It uses cutting edge machine learning algorithms. And it optimises and combines them. And this is all through a hypothesis testing driven approach. And that is so important as if I try a new feature or a new algorithm, I need to test it… And you can exhaustively find the best transformations and algorithms for your data. This allows solving of many machine learning tasks, and it is all in parallel to make it very fast.

So, how does it work? Well you have some input data and you have a target variable. You set an objective or success metric. And then you need some allocated computing power (CPU or GPU). Then you press a button and H2O driverless AI will explore the data, it will try things out, it will provide some predictions and model interpretability. You get a lot of insight including most predictive insights. And the other thing is that you can do feature engineering, you can extract this pipeline, these feature transformations, then use with your own modelling.

Now, I have a minute long demo here…. where you upload data, and various features and algorithms are being tried, and you can see the most important features… Then you can export the scoring pipeline etc.

This work has been awarded Technology of the Year by InfoWorld, it has been featured in the Gartner report.

You can find out more on our website: https://www.h2o.ai/driverless-ai/ and there is lots of transparency about how this work, how the model performs etc. You can download a free trial for 3 weeks.

Q&A

Q1) Do you provide information on the machine learning models as well?

A1) Once we finish with the score, we build the second model which is simple to predict that score. The focus on that is to explain why we have shown this score. And you can see why you have this score with this model… That second interpretability model is slightly less automated. But I encourage others to look online for similar – this is one surrogate model.

Q2) Can I reproduce the results from H2O?

A2) Yes. You can download the scoring practice, it will generate the code and environment to replicate this, see all the models, the data generated, and you can run that script locally yourself – it’s mainly Python.

Q3) That’s stuff is insane – probably very dangerous in the hands of someone just learning about machine learning! I’d be tempted to throw data in… What’s the feedback that helps you learn?

A3) There is a lot of feedback and also a lot of warning – so if test data doesn’t look enough like training data for instance. But the software itself is not educational on it’s own – you’d need to see webinars, look at online materials but then you should be in a good position to learn what it is doing and how.

Q4) You talked about feature selection and feature engineering. How robust is that?

A4) It is all based on hypothesis testing. But you can’t test everything without huge compute power. But we have a genetic algorithm to generate combinations of features, tests them, and then tries something else if that isn’t working.

Q5) Can you output as a model as eg a deserialised JSON object? Or use as an API?

A5) We have various outputs but not JSON. Best to look on the website as we have various ways to do these things.

Coming up:

 

Innovation Showcase

Matt Jewell, R&D Engineer, Amiqus

Carlos Labra, CEO & Co-Founder, Particle Analytics

Martina Pugliese, Data Science Lead, Mallzee

Steven Revill, CEO & Co-Founder, Urbantide

Share/Bookmark

Data Fest Data Summit 2018 – Liveblog

Intro to the Data Lab – Gilian Doherty, The Data Lab CEO

Welcome to Data Summit 2018. It’s great to be back, last year we had 25 people with 2000 people, but this year we’ve had 50 events and hope to reach over 3500 people. We’ve had kids downloading data from the space station, we’ve had events on smart meters, on city data… Our theme this year is “Data Warrior” – a data warrior is someone with a passion and a drive to make value from data. You are data warriors. And you’ll see some of our data warriors on screen here and across the venue.

Our whole event is made possible by our sponsors, by Scottish Enterprise and Scottish Government. So, let’s get on with it!

Our host for the next two days is the wonderful and amazing Maggie Philbin, who you may remember from Tomorrow’s World but she’s also had an amazing career in media, but she is also chair of UK Digital Skills.

Intro to the Data Summit – Maggie Philbin

Maggie is starting by talking to people in the audience to find out who they are and what they are here for… 

It will be a fantastic event. We have some very diverse speakers who will be talking about the impact of data on society. We have built in lots of opportunities for questions – so don’t hesitate! For any more information do look at the app or use the hashtag #datafest18 or #datasummit18.

I am delighted to introduce our speaker who is back by popular demand. She is going to talk about her new BBC Four series Contagion, which starts tonight.

The Pandemic – Hannah Fry

Last year I talked about data for social good. This year I’m going to talk about a project we’ve been doing to look at pandemics and how disease spreads. When we first started to think about this, we wanted to see how much pandemic disease is in people’s minds. And it turns out… Not many.

 

[Redacted pending show broadcast tonight – then this post will be updated!]

Business Transformation: using the analytics value chain – Warwick Beresford-Jones, Merkle Aquila

I’ll be talking about the value chain. This is:

Data > Insight > Action > Value (and repeat)

Those two first aspects are “generation” and the latter two are “deployment”. We are good at the first two, but not so much the action and value aspects. So we take a different approach, thinking right to left, which allows faster changes. Businesses don’t always start with an end in mind, but we do have accessible data, transformatic insights, organisational action, and integrated technology. In many businesses much of the spend is on technology, rather than the stage where change takes place, where value is generated for the business. So that a business understands why they are investing and what the purpose of this.

I want to talk more about that but first I want to talk about the NBA and the three point line, and how moving that changed the game by changing basket attempts…And that was a tactical decision of whether to score more points, or concede fewer points, enabling teams to find the benefit in taking the long shot. Cricket and Football similar use the value chain to drive benefit, but the maths work differently in terms of interpreting that data into actions and tactics.

Moving back to business… That right to left idea is about thinking about the value you want to derive, the action required to do that, and the insights required to inform those actions, then the data that enables that insight to be generated.

Sony looked at data and customer satisfaction and wanted to reduce their range down from 15 to 4 handsets. But the data showed the importance of camera technology – and many of you will now have Sony technology in the cameras in your phones, and they have built huge value for their business in that rationlisation.

BA wanted to improve check in experiences. They found business customers were frustrated at the wait, but also families didn’t feel well catered for. And they decided to trial a family check in at Heathrow – that made families happier, it streamlined business customers’ experience, and staff feedback has also been really positive. So a great example of using data to make change.

So, what questions you should be asking?

  • What are the big things that can change our business and drive value?
  • Can data analytics help?
  • How easy will it be to implement the findings?
  • How quickly can we do?

Q&A
Q1) In light of the scandal with Facebook and Cambridge Analytica, do you think that will impact people sharing their data, how their data can be used?

A1) I knew that was coming! It’s really difficult… And everyone is also looking at the impact of GDPR right now. With Facebook and LinkedIn there is an exchange there in terms of people and their data and the service. If you didn’t have that you’d get generic broadcast advertising… So it depends if people would rather see targeted and relevant advertising. But then with some of what Facebook and Cambridge Analytica is not so good…

Q2) How important is it for the analysts in an organisation to be able to explain analytics to a wider audience?

A2) Communication is critical, and I’d say equally important as the technical work.

Q3) What are the classic things people think they can do with data for their business, but actually is really hard and unrealistic?

A3) A few years ago I was meeting with a company, and they gave an example of when Manchester United had a bad run, and Paddy Power had put up a statue of Alex Ferguson with a “do not break glass sign” and they asked how you can have that game changing moment. And that is really hard to do.

Q4) You started your business at your kitchen table… And now you have 120 people working for you. How do you do that growth?

A4) It’s not as hard as you think, but you have to find the right blend of raw talent with experience – lots of tricky learning.

 

Share/Bookmark

IT Futures 2017 Liveblog

Today I’m at the IT Futures Conference 2017, an annual University of Edinburgh conference. I’m chairing a session later but I’ll otherwise be liveblogging our wonderful speakers.

John Lee is introducing the day – which is being recorded – and also noting todays hashtag which you should definitely keep your eye on today: #itfutures.

John: Today’s event is about Scaling and Transformation and there is a lot to challenge ourselves with, we hope there will be lot for us to think about and reflect upon over the Christmas break.

Our first speaker today is Melissa Terras, who recently joined us from UCL as our new Professor of Digital Cultural Heritage.

University Technology Futures: the View from a Newbia at the UoE – Professor Melissa Terras, UoE College of Arts, Humanities and Social Sciences

There are two ways to do these things: the show and tell or saying something more meaningful. I hope to do the latter today.

So, I went from studying Greek sculpture to doing hardcore machine learning in my PhD and research. I then went to UCL where I was one of the founders of the UCL centre for Digital Humanities, working on

I will be directing “digital stuff” at the College of Arts, Humanities and Social Sciences, and working heavily with the Edinburgh Futures Institute which is leading data driven innovation for the College of Arts, Humanities and Social Sciences. So, futures… There are lot of those… So many futures initiatives and organisations but also we face rather uncertain future… And we will we be looking at these issues at the EFI, how to deal with this uncertain future and the changing information environment. And of course the word comes from financial markets, it is speculative. When you think to the future you see speculative fiction imaging what might happen, but what does this mean for us as a University.

If I’d given this talk a few years ago it would have been quite different. The internet is changing as an environment and it has become a less pleasant place to be over the last few years. I’ve actually done some grieving for the internet I grew up with… I’ve been online since I was 17 and a lot has changed. But lets be more positive, what will we do to equip ourselves for this information environment?

So, lets start with the students – those people we criticise for not being able to buy a house because they are buying too many avocados… Lets start with ethics… I’ve been working on a project called Digital Library Futures – looking at usage stats of who borrows what, and that comes with issues of anonymity, huge ethical issues, huge data protection issues. These are the conversations we have to have with our students to understand what we can and should do.

I’ve said it before but… All data is history. It comes with a cultural background, a societal history… We do this in historical studies all the time, but do we do this with our informatics students? We’ve been doing some work at UCL on the Time Digital Archive (1785-2010) which looks at how men and women are talked about… If you use this as a training corpus for machine learning you are embedding the bias and historical issues into that learning. Even historic information has a real impact on current computational work and approaches.

Which brings me to diversity… There is a lovely piece of nineteenth century newspaper analytics identifying images from newspapers… But only white men. There were images of women and non-white people in those papers but machine learning hasn’t recognised them. This is so important in how we use and train machine learning and what computational methods we use…

And then there is context and understanding what you engage with… There are the sites that let you automatically insert yourself in a range of images – without any idea of provenance or context. Or the Twitter bots that will give your profile image a smile… A huge shout out here for librarians.

What about academics? Well all of the above! But also… We need to understand what is happening

How locked down the digital environment is – there are things I can’t do with my desktop, and then three days later it changes. I’m working on an EU handwriting recognition project and it’s hard to install the software I’m writing. To enable data driven innovation we have to give people flexibility – if you don’t do that people do workarounds and that’s where security issues start to come in. We need to ensure we have the access to do this work.

The other thing I wanted to mention is the Jeremy Bentham Panopticon… Whether through diary systems… And also lecture recording… And the change in rules that students can record anything and what that means for what we say… How you talk about your work changes when that is recorded. Being recorded at any time by students what does that mean for students… And what does that mean for students from, say, Turkey… Anything we do can potentially be done at any one time. You may think that I’m being paranoid. There have been all sorts of threats, death threats, scandal, etc. when something is broadcast and shared. How do we support staff and students if something goes wrong. So we have to understand that challenge, to engage with difficult topics.

I’m a great believer in looking after it’s own data… What does the university do to archive it’s own websites… What can we do to best look after our own information environment – our work, our data, our web content.

So, we have a bright future ahead. But it’s a complicated future. We have to be aware of all of this, we have a role to be the place to go for truth when truth is being debabed.. And that’s where the Edinburgh Futures Institute comes in. We are still developing our work  – keep an eye on the website, https://efi.ed.ac.uk/. It has huge potential and a real opportunity to be a beacon of light and truth at a time when the world really needs that. And I am hugely excited to be here and in a role that can help shape that.

Q&A

Q1) You talked about light and truth… What about openness… And being closed about some things… How do you provide spaces that are both open and closed and safe?

A1) I am a firm believer in Open Data and Open GLAM, but I think it’s about equipping people with the skills to understand when and how and what framework you can share under. It’s not about closing things off but about being tooled up as an individual. The Open Data and Open Science agenda tends to be about projects post-peer review when they are ready to share. I was talking with a colleague here working on the history of censorship and she isn’t on Twitter because of the abuse she’d get for her work – and that is the right decision for that context… Having those skills to decide is important.

Q2) Thinking about the GDPR coming in, as a newbie, how do you think the University is prepared, and how do staff manage their own digital environment in that context?

A2) I am on committees at Edinburgh, I was on similar at UCL, and I have sat as an external person on similar groups at Oxford. Across all universities there is a need to help stafff understand the legal requirements, and the significance of them. These things are generally understood better when something goes wrong… In a way that’s the “Daily Mail” test – will what we are doing be at risk of appearing there?! But I have been cheered by what I have seen over the last few weeks here, and where the thinking is at.

Mr Stefan Hyttfors

I thought I would start by telling you about my 21 year old son who is a university student. He lives away from home… This summer we sat down together to have this great barbeque, to talk about his plans for the summer… About what he would do for a summer job… And he said “no, I won’t get a summer job” and that surprised us as he had lots of plans, and they require money… But he said “it’s fine! I have this crypto currency wallet” and he had 2 bitcoin – which last summer was worth about $5000. And I wanted to start with that… He questioned what is money, is paper money real? It’s belief, we believe it has value because it has been there for a long time… We have symbols… the dollar, the pound, the krona, the Euro… We don’t believe in the paper anymore but we believe in the banks, we check on our phones. We don’t ever see our money as a thing… We know what they owe us, as long as we believe in that system, it works. He said he doesn’t believe in that system – it’s dysfunctional and it will be disrupted… It is an inefficient system… I believe in crypto currency. And his bitcoin is worth more like $37k, so he was right, he didn’t need a summer job.

What Melissa told us about education is right, if we want to create new citizens… We do know that in the future we have huge problems… We have climate change. We don’t know if we can cope with that yet… There are ways to change your impact: eat less meat; fly less; drive an electric car or ditch the car altogether. There is one way to trump all that: have less children! We are in this time where the best way to save the future is to stop having kids… Which is strange… Surely a better faster idea woudld be suicide? Zero carbon emissions! But this is serious… We need to understand and think about how we think about the future, about what we can do… I’, in a hotel tonight, in the hotel has a sign to reuse the towels to save the planet… But the planet will be fine for millions of years… We have to think about the future of humanity, and that’s about sustainability in all senses – environment, diversity, equality… If we don’t do that we will have more divide, more people scared about human futures.

And now we have the internet. The internet is a stupid network…. For thousands of years we collaborated in hierarchy…. Better to be part of that at any level rather than being alone. But now we have a decentralised network… It’s all of us and everything, in a mess… And since we are connected in a mess and not a hieracrchy, we don’t need a boss… So I have experience, and I can tell my son how to address issues in the world… But what if I’m wrong…. That means there is no boss, no teacher, who chas the power to say what should happen, innovation is at the edges… In universities you pushed out ideas, you had the power; companies too pushed things out. But now innovation is in the edges… There is no boss now. It’s decentralised, that’s the whole point… This is how crypto currencies are being established right now… Rather than haing trust in just one bank… Lets instead trust in all of us, keeping transactions across millions of ledgers, there is no middle man, no one database to hack anymore… This couldn’t work without network effect. In any university or country we need to have scale… This took off about 10 years ago… This summer was the tenth anniversary of the launch of the first smartphone, and it’s an amazing product launch from Steve Jobs – who points out current “smart” phones which are all about hardware, which can’t be easily changed as the world changes… He said then that we’d fixed the issue for computers but not for phones… Well we are still just at the beginning. Things are still changing..

The world is changing from hardware to software… Not just phones… From a University building to software… From products to services… This means we can’t think of the future in a linear fashion… In a corporation they talk about growth, in a country it’s GDP growth… in our lives we see our ages go up but it’s an odd way to mark things… I might instead celebrate the years I have left to live to keep me focused on what matters… Whatever we work on we do everything a little bit better all the time, we compete on scalable efficiency… If we are more efficient than competitors we are safe. This is a model that is seen as best practice right now… But that applies until we find a new way to address the issue… That is probably technology but may well not be devices… For instance I don’t need to own a car now, I can use Uber… That’s a new technology. New stuff is new! The world changes… And that always appears in “S” curves…. First it doesn’t work, we ridicule it… Then leaders are learners… That’s where we need a university to study and explore – there would be no new practice without it… Then we learn and adjust… and eventually it takes off and quickly thanks to network effects.

But what if I’m the blue (steady upwards) line here… What if I don’t know how to solve the problem… When the red line crosses the blue line, the blue line is over… This is a bit like the Christmas Pig in Sweden – all looks good until Christmas! Right now we have big organisations going out of business… disruption are our unicorn companies… You get disruption because you do something very very good with efficiency in mind… And you get disrupted because they find a totally different way to solve a problem. We say this in media – newspapers, music, film. And now we see it in retail… We see lots of large retail brands ticking along, busy, doing well… And then Amazon performing so much more successfully. Eric Hoffer says “In times of change learners inherit the earth; while the learned find themselves beautifully equipped to deal with a world that no longer exists”.

As humans we always solve our problems with technology. So 1914 we have the Ford Model T launched… We have huge adoption growth, a few years of decline during the second world war, but by 1991 we are at 91% adoption… You have 76 years to adopt the technology… But right now the S curves are like rockets! An idea appears and it is adopted hugely fast! And we don’t need to shift products anymore, we can ship ideas… Artificial Intelligence is about creating machines that do not need to be programmed… Maybe you heard about the defeat of a Go champion beaten by a Google algorithm. This isn’t chess, Go is a game with 10 to the 5 variations, which has been taught from generation to generation. And that was last year, now there’s a new version of that algorithm – Alpha Go Zero – which learns the game from nothing and in 40 days learned enough to win 100 games in a row against the previous algorithm… What AI learns from us may only slow us down…

It’s scary though! We worry “Will robots take our jobs?” but that’s stupid. We are the creators. We solve problems with technology, we are part of technology… If you think about your day, your experience, how you think about life… Think about electricity and what would happen if you took that away, what that would mean for our lives… It’s hard to imagine that though. Douglas Adam described what you have now,  that’s what has always been… But everything invented after the age of 35 is just not ormal… We take for granted the technology we have available to us. Technology is part of us. It’s not robots or human beings, it’s still us and what we want to do with technologuy…

When I was growing up computers were the size of a room… It wasn’t accessible or cheap, it was a huge mainframe… Now we’ve moved to mobile, to wearable, to technology that can be embedded in us as well… Your grandkids will talk about you, and think you know nothing… We will have new problems… Technology will tell us not to have another beer because it will knock 15 minutes off your life… Your insurance company may stop covering you… That’s a new problem… Maybe privacy becomes the currency in the new world

So, as we think ahead think about one word, think about dematerialisation. Digitisation means the marginal cost go down… It goes down over time… What is the marginal cost of taking pictures now? It’s zero! But you used to just have 24 shots to use, or maybe 36… It was a bigger cost… You didn’t take lots of them… Then you sent them off… And two years later you finish the film and send off… Now our toddlers can take 2000 self portraits a day! We talk about healthcare in those terms of unaffordability now, maybe we afford it through digitisation….

One more example we hear about is the automobile industry… Cars were complex… Now they are smaller, lighter, autonomous… We only have a driver now because the law requires a human in charge… Today when you say “look at that guy, he’s texting and driving!”, but in less than 10 years time you’ll say “look at that guy, he’s driving! People are the inefficient part… 1.2 million people die in traffic accidents… We don’t know how to drive… But how do deal with this… This traffic cop pulls up the Google Car and he doesn’t know what to do… No-one is in charge… But if we need fewer cars, we make fewer cars… That means the automobile industry will decline… We need to move from physical ownership of cars to the shared infrastructure for getting around. And that can be ok. But that won’t work when policy makers force us to stay in the past, to protect the old way of doing something..

Same with education… If you grow up in Uganda you just need access to the internet… You can take one of 250 courses at Harvard for free online… You don’t need the concrete building. It doesn’t matter how much political power you have, technology beats politics… They trump politics and borders… Online there is no Brexit… It’s not just corporations but also individuals that have access to technology. We can solve big problems this way. That means that the issue isn’t technology but humanity… Do we want sustainability, equality, space to explore… Do we want to see GDP growth. What do we believe i as a society… We have fantastic ecooic growth… GDP is growing… More people on the planet than ever before have access to technology, to healthcare, to vaccines. But non-humans… Oceans, forests, etc. are dying, we are clearning land to support us farming meat. We have huge air pollution issues. If we keep going on that blue line we won’t have water, air, forests to support us, we all depend on us eventually… No matter what you believe in…

There is one thing we can all relate to… There are 7.5Bn people on just one planet… No matter what business or education or purpose model we have, we have to solve our problems within that limit… Until 1986 we were just about sustainable…. Right now we are using 1.6 planets worth of resources… We have to create much more with much less… Some things, some business models, some GDP standards have to shrink not grow. It’s pretty clear that my son and his generation are aware of this, they see that old model doesn’t work, that it doesn’t make them happy… We have all these things, but we don’t have happiness… That’s not my opinion, that’s the WOrld Health Organisation’s opinion. We have a huge number of people with depression, we had 800m suicides last year. A lot of things are pointing the wrong waays… This is why future generation think old models are bullshit. They see that there must be a better way to do it… Stephen Hawkins says that “history teaches us what didn’t work” – we have to come up with better conclusions… If we are at this point in history when sustainability means no babies… Then we clearly have to change… From an educational perspective I think it is clear I don’t need a university, or a teacher… I need a network. Perhaps the university or the teacher can be a helpful node in this network… But it has to be about creating a better future, rather than preserving an old model.

Response – Jen Ross

What we have from Stefan is an opportunity to reflect on what we need to do as educators to consider different sorts of materiality. We have to educate not just with technology but about it. We have to see technology as deeply integrated with society ad our values. This has implications for what we do as an organisation as well… How do we want students to respond to this new world… People at this university talk about the future in a lot of interesting ways. Posing interesting questions… This year Sian Bayne and I led a course on digital futures, and the Near Future Teaching project is looking at what teaching of the future should be… These conversations are happening. And this organisation is already thinking about ethical issues… And I want to ask you about being creative and critical in these discussions, and who can you talk to about the ideas today?

Q&A

Q1) I noticed in Stefan’s presentation a self-driving car… Am I correct in saying that a self-driving car slowed when passing two females… and is that an example of bias in the algorithm.

A1) I have an autopilot on my car… and you get used to that quickly… That makes me dangerous in my wife’s car – I forget I am in charge. What Melissa raised is important in terms of bias embedded… Maybe Alpha Go can teach us something about teaching the algorithm… Maybe we can learn something new… It’s an amazing time to be alive. Thinking about the future as a destination makes the present an obstacle… We know what the future will be like because this moment is the future…

Q2) One of the interesting things about being this room is that people here work on systems… The internet isn’t stupid… That’s a live issue in the debate over net neutrality… That’s likely to break at some point… People have been trying to keep the network stupid but what happens when that breaks, what happens without net neutrality…

A2) I don’t know it has to break… But in a decentralised network there is no way to stop it… So big organisations doing things to individuals doesn’t work this way… You could only shut down blockchain by cutting power… And that’s hard to do… Most of the blockchain miners are in China and not in the big cities… I don’t believe in paranoid scenarios where you have evil Trump, evil Google… As soon as they do something bad enough… We go somewhere else… I refer to bitcoin as it’s a really interesting example. Big banks have a business model that depends on all the big people… So how do you close down a network like Bitcoin… You could do that by paying them to opt out… But that would cost £300bn right now. I do see huge problems with protectionism, because of populism, because of inequality. We have enough stuff but we don’t share it well enough… People get scared and then we go for protectionism and nationalism… I don’t claim to have an answer…

Q3) I was meeting with union heads yesterday and AI came up and the potential for disruption or job losses… I’d like to hear your view on the total amount of meaningful work and jobs over time… Any thoughts on how to deal or think about that.

A3) It’s a valid and important question. What is a meaningful job? Gallup says that only 13% of the workforce is really engaged in their role… Most people do “robot jobs”. That should mean that that opens up… As long as job loss means free time rather than our future being screwed, that’s fine… As long as people believe that we need jobs and politicians argue about creating jobs… It’s easy… Ignore technology… that will create jobs… The issue is sharing resources and the outcome… But that’s not easy… And more time means more time to think about the meaning of life. I don’t have a boss or a job as such. I’m curious, I travel, I’m essentially a student… And what I do funds my life… Lets talk about sharing resources as a problem… We have a system that has served us well… But now we are scared of missing out… That’s the thing about Trump and Brexit… People are scared… We have to realise that and address it…

And with that we go to coffee… 

 

Share/Bookmark

Digital and Information Literacy Forum 2017

Today I am at the Scottish Government for the Digital and Information Literacy Forum 2017.

Introduction from Jenny Foreman, Scottish Government: Co-chair of community of practice with Cleo Jones (who couldn’t be here today). Welcome to the 2017 Digital and Information Literacy Forum!

Scottish Government Digital Strategy – Cat Macaulay, Head of User Research and Service Design, Scottish Government

I am really excited to speak to you today. For me libraries have never just been about books, but about information and bringing people together. At high school our library was split between 3rd and 4th year section and a 5th and 6th year section, and from the moment I got there I was desperate to get into the 5th and 6th year section! It was about place and people and knowledge. My PhD later on was on interaction design and soundscapes, but in the context of the library and seeking information… And that morphed into a project on how journalists yse information at The Scotsman – and the role of the library and the librarian in their clippings library. In Goffman terms it was this backstage space for journalists to rehearse their performances. There was talk of the clippings library shutting down and I argued against that as it was more than just those clippings.

So, that’s the personal bit, but I’ll turn to the more formal bit here… I am looking forward to discussions later, particularly the panel on Fake News. Information is crucial to allowing people to meaningfully, equally and truly participate in democracy, and to be part of designing that. So, the imporatnce of digital literacy is crucial to participation in democracy. And for us in the digital directorate, it is a real priority – for reaching citizens and for librarians and information professionals to support that access to information and participation.

We first set out a digital strategy in 2011, but we have been refreshing our strategy and about putting digital at the heart of what we do. Digital is not about technology, it’s a cultural issue. We moved before from agrarian to industrial society, and we are now in the process of moving from an industrial to a digital society. Aiming to deliver inclusive economic growth, reform public services, tackle inequalities and empower communities, and prepare people for the future workplace. Digital and information literacy are core skills for understanding the world and the future.

So our first theme is the Digital Economy. We need to stimulate innovation and investment, we need to support digital technologies industr, and we need to increase digital maturity of all businesses. Scotland is so dependent on small businesses and SMEs that we need our librarians and information professionals to be able to support that maturity of all businesses.

Our second theme is Data and Innovation. For data we need to increase public trust in holding data securely and using/sharing appropriately. I have a long term medical issue and the time it takes to get appointments set up, to share information between people so geographically close to each other – across the corridor. That lack of trust is core to why we still rely on letters and faxes in these contexts.

In terms of innovation, CivTech brings together the public sector teams and tech start-ups to develeop solutions to real problems, and to grow and expand services. We want to innovate and learn from the wider tech and social media context.

The third theme is Digital Public Services, the potential to simplify and standardise ways of working. Finding common technologies/platforms build and procured once. And design services with citizens to meet their needs. Information literacy skills and critical questioning are at the heart of this. You have to have that literacy to really understand the problems, and to begin to be looking at addressing that, and co-designing.

The fourth theme is Connectivity. Improving superfast broadband, improving coverage in rural areas, increasing the 4G coverage.

The fifth theme is Skills. We need to build a digitally skilled nation. I spent many years in academia – no matter how “digital native” we might assume them, actually we’ve assumed essentially that because someone can drive a car, they can build a car. We ALL need support for finding information, how to judge it and how to use it. We all need to learn and keep on learning. We also need to promote diversity – ensuring we have more disabled people, more BAME people, more women, working in these areas, building these solutions… We need to promote and enhance that, to ensure everyone’s needs are reflected. Friends working in the third sector in Dundee frequently talk about the importance of libraries to their service users, libraries are crucial to supporting people with differing needs.

The sixth theme is Participation. We need to enable everybody to share in the social, economic and democractic opportunities of digital. We need to promote inclusion and participation. That means everyone participating.

And our final theme (seven) is Cyber Security. That is about the global reputation for Scotland as a secure place to work, learn and do business. That’s about security, but it is also about trust and addressing some of those issues I talked about earlier.

So, in conclusion, this is a strategy for Scotland, not just Scottish Government. We want to be a country that uses digital to maximum effect, to enable inclusion, to build the economy, to positively deliver for society. It is a living document and can grow and develop. Collective action is needed to ensure nobody is left behind; we all remain safe, secure and confident about the future. We all need to promote that information and digital literacy.

Q&A
Q1) I have been involved in information literacy in schools – and I know in schools and colleges that there can be real inconsistency about how things are labeled as “information literacy”, “digital literacy”, and “digital skills”. I’m slightly concerned there is only one strand there – that digital skills can be about technology skills, not information literacy.

A1) I echo what you’ve just said. I spent a year in a Life Sciences lab in a Post Doc role studying their practice. We were working on a microscopy tool… And I found that the meaning of the word “image” was understood differently by Life Scientists and Data Scientists. Common terminology really matter. And indeed semantic technologies enable us to do that in new ways. But it absolutely matters.

Q2, Kate SVCO) We are using a digital skills framework developed that I think is also really useful to frame that.

A2) I’m familiar with that work and I’d agree. Stripping away complexity and agree on common terms and approaches is a core focus of what we are doing.

Q3) We have been developing a digital skills framework for colleges and for the student lifecycle. I have been looking at the comprehensive strategy for schools and colleges by Welsh Government’s… Are there plans for similar?

A3) I know there has been work taking place but I will take that back.

Q4) I thought that the “Participation” element was most interesting here. Information literacy is key to enabling participation… Say what you like about Donald Trump but he has made the role of information literacy in democracy very vital and visible. Scotland is in a good place to support information literacy – there are many in this room have done great work in this area – but it needs resourcing to support it.

A4) My team focuses on how we design digital tools and technologies so that people can use them. And we absolutely need to look at how best to support those that struggle. But is not just about how you access digital services… How we describe these things, how we reach out to people… I remember being on a bus in Dundee and hearing a guy saying “Oh, I’ve got a Fairer Scotland Consultation leaflet… What the fuck is a Consultation?!”. I’ve had some awkward conversations with my teenage boys about Donald Trump, and Fake News. I will follow up with you afterwards – I really welcome a conversation about these issues. At the moment we are designing a whole new Social Security framework right now – not a thing most other governments have had to do – and so we really have to understand how to make that clear.

Health Literacy Action Plan Update – Blythe Robertson, Policy Lead, Scottish Government

The skills, confidence, knowledge and understanding to interact with the health system and maintain good health is essentially what we mean in Health Literacy. Right now there is a huge focus in health policy on “the conversation”. And that’s the conversation between policy makers and practitioners and people receiving health care. There is a model of health and care delivery called “More than Medicine” – this is a memorable house-shaped visual model that brings together organisational processes and arrangements, health and care professionals, etc. At the moment though the patient has to do at least as much as the medical professional, with hoops to jump through – as Cat talked about before…

Instructions can seem easy… But then we can all end up at different places [not blogged: an exercise with paper, folding, eyes closed].

Back when computers first emerged you needed to understand a lot more about computer languages, you had to understand how it worked… It was complex, there was training… What happened? Well rather than trianing everyone, instead they simplified access – with the emergence of the iPad for instance.

So, this is why we’ve been trying to address this with Making it easy: A health literacy action plan for Scotland. And there’s a lot of text… But really we have two images to sum this up… The first (a woman looking at a hurdle… We’ve tried to address this by creating a nation of hurdlers… But we think we should really let people walk through/remove those hurdles.

Some statistics for you: 43% of English working age adults will struggle to understand instructions to calculate a childhood paracetamol dose. There is lot bound up here… Childhood health literacy is important. Another stat/fact: Half of what a person is told is forgotten. And half of what is remembered is incorrect. [sources: several cited health studies which will be on Blythe’s slides]. At the heart of issue is that a lot of information is transmitted… then you ask “Do you understand?” and of course you say “yes”, even if you don’t. So, instead, you need to check information… That can be as simple as rephrasing a question to e.g. “Just so I can check I’ve explained things clearly can you tell me what you’ve understood” or similar.

We did a demonstrator programme in NHS Tayside to test these ideas… So, for instance, if you wander into Nine Wells hospital you’ll see a huge board of signs… That board is blue and white text… There is one section with yellow and blue… That’s for Visual Impairment, because that contrast is easier to see. We have the solution but… People with visual impairment come to other areas of the hospitals. So why isn’t that sign all done in the same way with high contrast lettering on the whole board? We have the solution, why don’t we just provide it across the board. That same hospital send out some appointment letters asking them to comment and tell them about any confusion… And there were many points that that happened. For instance if you need the children’s ward… You need to know to follow signs for Paediatrics first… There isn’t a consistency of naming… Or a consistency of colour. So, for instance Maternity Triage is a sign in red… It looks scary! Colours have different implications, so that really matters. You will be anxious being in hospital – consistency can help reduce the levels of anxiety.

Letters are also confusing… They are long. Some instructions are in bold, some are small notes at the bottom… That can mean a clinic running 20 minutes late… Changing what you emphasise has a huge impact. It allows the health care provision to run more smoothly and effectively. We workshopped an example/mock up letter with the Scottish Conference for Learning Disability. They came up with clear information and images. So very clear to see what is happening, includes an image of where the appointment is taking place to help you navigate – with full address. The time is presented in several forms, including a clock face. And always offer support, even if some will not need it. Always offer that… Filling in forms and applications is scary… For all of us… There has to be contact information so hat people can tell you things – when you look at people not turning up to appointments was that they didn’t know how to contact people, they didn’t know that they could change the appointment, that they wanted to contact them but they didn’t want to make a phone call, or even that because they were already in for treatment they didn’t think they needed to explain why they weren’t at their outpatients appointment.

So, a new action plan is coming called “Making it easier”. That is about sharing the learning from Making it Easy across Scotland. To embed ways to improve health literacy in policy and practice. To develop more health literacy responsive organisations and communities. Design supports and services to better meet people’s health literacy levels. And that latter point is about making services more responsive and easier to understand – frankly I’d like to put myself out of a job!

So, one area I’d like to focus on is the idea of “Connectors” – the role of the human information intermediary, is fundamental. So how can we take those competancies and roll them out across the system… In ways that people can understand… Put people in contact with digital skills, the digital skills framework… Promoting understanding. We need to signpost with confidence, and to have a sense that people can use this kind of information. Looking at librarians as a key source of information that can helps support people’s confidence.

In terms of implementation… We have at (1) a product design and at (3) “Scaled up”. But what is at step (2)? How do we get there… Instead we need to think about the process differently… Starting with (1) a need identified, then a planned structured resources and co-developed for success, and then having it embedded in the system… I want to take the barriers out of the system.

And I’m going to finish with a poem: This is bad enough by Elspeth Murray, from the launch of the cancer information reference group of the South East Scotland Cancer Network 20 January 2016.

Q&A

Q1) I’m from Strathclyde, but also work with older people and was wondering how much health literacy is part of the health and social care integration?

A1) I think ultimately that integration will help, but with all that change it is challenging to signpost things clearly… But there is good commitment to work with that…

Q2) You talked about improving the information – the letters for instance – but is there work more fundamentally questioning the kind of information that goes out? It seems archaic and expensive that appointments are done through posted physical letters… Surely better to have an appointment that is in your diary, that includes the travel information/map….

A2) Absolutely, NHS Lothian are leading on some trial work in this area right now, but we are also improving those letters in the interim… It’s really about doing both things…

Cat) And we are certainly looking at online bookings, and making these processes easier, but we are working with older systems sometimes, and issues of trust as well, so there are multiple aspects to addressing that.

Q3) Some of those issues would be practically identical for educators… Teachers or lecturers, etc…

A3) I think that’s right. Research from University of Maastrict mapped out the 21 areas across Public and Private sectors in which these skills should be embedded… And i Think those three areas of work can be applied across those area… Have to look at design around benefits, we have some hooks around there.

Cat) Absolutely part of that design of future benefits for Scotland.

Panel Discussion – Fake News (Gillian Daly – chair; Lindsay McKrell (Strathclyde); Sean McNamara (CILIPS); Allan Lindsay (Young Scott))

Sean: CILIPS supports the library and information science community in Scotland, including professional development, skills and ethics. Some years ago “information literacy” would have been more about university libraries, but now it’s across the board an issue for librarians. Librarians are less gatekeepers of information, and more about enabling those using their libraries to seek and understand information online, how to understand information and fake news, how to understand the information they find even if they are digitally confident in using the tools they use to access that information.

Allan: Young Scot is Scotland’s natural charity for information literacy. We work closely with young people to help them grow and develop, and influence us in this area. Fake News crops up a lot. A big piece of work we are involved in is he 5 Rights projects, which is about rights online – that isn’t just for young people but significantly about their needs. Digital literacy is key to that. We’ve also worked on digital skills – recently with the Carnegie Trust and the Prince’s Trust. As an information agency we reach people through our website – and we ensure young people are part of creating content in that space.

Lindsay: I’d like to talk about digital literacy as well as Fake News. Digital literacy is absolutely fundamental to supporting citizens to be all that they can be. Accessing information without censorship, and a range of news, research, citizenship test information… That is all part of public libraries service delivery and we need to promote that more. Public libraries are navigators for a huge and growing information resource, and we work with partners in government, in third sector, etc. And our libraries reach outside of working hours and remote areas (e.g. through mobile levels) so we have unique value for policy makers through that range and volume of users. Libraries are also well placed to get people online – still around 20% of people are not online – and public libraries have the skills to support people to go online, gain access, and develop their digital literacy as well. We can help people find various source of information, select between them, to interpret information and compare information. We can grow that with our reading strategies, through study skills and after school sessions. Some libraries have run sessions on fake news, but I’m not sure how well supported thse have been. We are used to displaying interesting books… But why aren’t our information resources similarly well designed and displayed – local filterable resources for instance… Maybe we should do some of this at national level,  not just at local council level. SLIC have done some great work, what we need now is digital information with a twist that will really empower citizens and their information literacy…

Gillian Daly: I was wondering, Allan, how do you tackle the idea of the “Digital Native”? This idea of inate skills of young people?

Allan: It comes up all the time… This presumption that young people can just do things digitally… Some are great but many young people don’t have all the skills they need… There are misconceptions from young people themselves about what they can and cannot do… They are on social media, they have phones… But do they have an understanding of how to behave, how to respond when things go wrong… There is a lot of responsibility for all of us that just because young people use these things, doesn’t mean they understand them all. Those misconceptions apply across the board though… Adults don’t always have this stuff sorted either. It’s dangerous to make assumptions about this stuff… Much as it’s dangerous to assume that those from lower income communities are less well informed about these things, which is often not correct at all.

Lindsay: Yes, we find the same… For instance… Young people are confident with social media… But can’t attach a document for instance…

Comment from HE org: Actually there can be learning in both directions at University. Young people come in with a totally different landscape to us… We have to have a dialogue of learning there…

Gillian: Dialogue is absolutely important… How is that being tackled here…

Sean: With school libraries, those skills to transfer from schools to higher education is crucial… But schools are lacking librarians and information professionals and that can be a barrier there… Not just about Fake News but wider misinformation about social media… It’s important that young people have those skills…

Comment: Fake News doesn’t happen by accident… It’s important to engage with IFLA guide to spot that… But I think we have to get into the territory of why Fake News is there, why it’s being done… And the idea of Media and Information Literacy – UNESCO brought those ideas together a few years ago. There is a vibrant GATNO organisation, which would benefit from more Scottish participation.

Allan: We run a Digital Modern Apprenticeship at Young Scot. We do work with apprentices to build skills, discernment and resiliance to understand issues of fake news and origins. A few weeks back a young person commented on something they had seen on social media… At school for me “Media Studies” was derided… I think we are eating our words now… If people had those skills and were equipped to understand that media and creation process. The wider media issues… Fake News isn’t in some box… We have to be able to discern mainstream news as well as “Fake News”. Those skills, confidence, and ability to ask difficult questions to navigate through these issues…

Gillian: I read a very interesting piece by a journalist recently, looking to analyse Fake News and the background to it, the context of media working practice, etc. Really interesting.

Cat: To follow that up… I distinctly remember in 1994 in The Scotsman about the number of times journalists requested clippings that were actually wrong… Once something goes wrong and gets published, it stay there and repopulates… Misquotations happen that way for instance. That sophisticated understanding isn’t about right and wrong and more about the truthfulness of information. In some ways Trump is doing a favour here, and my kids are much more attuned to accuracy now…

Gillian: I think one of the scariest things is that once the myth is out, it is so hard to dispel or get rid of that…

Comment: Glasgow University has a Glasgow Media Group and they’ve looked at these things for years… One thing they published years ago, “Bad News”, looked at for instance the misrepresentation of Trade Unionists in news sources, for a multitude of complex reasons.

Sean: At a recent event we ran we had The Ferret present – those fact checking organisations, those journalists in those roles to reflect that.

Jenny: The Ferret has fact checking on a wonderful scale to reflect the level of fakeness…

Gillian: Maybe we need to recruit some journalists to the Digital and Information Literacy Forum.

And on that, with many nods of agreement, we are breaking for lunch

Share/Bookmark

UoE Information Security Awareness Week 2017: Keynotes Session

This afternoon I’m at the Keynote Session for Information Security Awareness Week 2017 where I’ll speaking about Managing Your Digital Footprint in the context of security. I’ll be liveblogging the other keynotes this afternoon.

The event has begun with a brief introduction from Alistair Fenemore, UoE’s Chief Information Security Officer, and from his colleague David Creighton Offord, the organiser for today’s event.

Talk by John Whitehouse, PWC Cyber Security Director Scotland covering the state of the nation and the changing face of Cyber Threat

I work at PWC, working with different firms who are dealing with information security and cyber security. In my previous life I was at Standard Life. I’ve seen all sorts of security issues so I’m going to talk about some of the things I’ve seen, trends, I’ll explain a few key concepts here.

So, what is cybersecurity… People imagine people in basements with balaclavas… But it’s not that at all…

I have a video here…

(this is a late night comedy segment on the Sony hack where they ask people for their passwords, to tell them if it’s strong enough… And how they construct them… And/or the personal information they use to construct that…)

We do a lot of introductions for boards… We talk about technical stuff… But they laugh at that video and then you point out that these could all be people working in their companies…

So, there is technical stuff here, but some of the security issues are simple.

We see huge growth due to technology, and that speaks to businesses. We are going to see 1 billion connected devices by 2020, and that could go really really wrongly…

There is real concern about cyber security, and they have concerns about areas including cloud computing. The Internet of Things is also a concern – there was a study that found that the average connected device has 25 security vulnerabilities. Dick Cheney had to have his pacemaker re programmed because it was vulnerable to hacking via Bluetooth. There was an NHS hospital in England that had to pause a heart surgery when the software restarted. We have hotel rooms accessible via phones – that will come to homes… There are vulnerabilities in connected pet feeders for instance.

Social media is used widely now… In the TalkTalk breach we found that news of the breach has been leaked via speculation just 20 seconds after the breach occurs – that’s a big challenge to business continuity planning where one used to plan that you’d perhaps have a day’s window.

Big data is coming with regulations, threats… Equifax lost over 140 million records – and executives dumped significant stock before the news went public which brings a different sort of scrutiny.

Morrisons were sued by their employees for data leaked by an annoyed member of staff – I predict that big data loss could be the new PPI as mass claims for data loss take place. So maybe £1000 per customer per data breach for each customer… We do a threat intelligence service by looking on the dark net for data breach. And we already see interest in that type of PPI class suit approach.

The cyber challenge extends beyond the enterprise – on shore, off shore; 1st through to 4th parties. We’ve done work digging into technology components and where they are from… It’s a nightmare to know who all your third parties are… It’s a nightmare and a challenge to address.

So, who should you be worried about? Threat actors vary…. We have accidental loss, Maware that is not targeted, and hacker hobbyists in the lowest level of sophistication, through to state sponsored attacks at the highest level of sophistication. Sony were allegedly breached by North Korea – that firm spends astronomical amounts on security and that still isn’t totally robust. Target lost 100 million credit card details through a third party air conditioner firm, which a hacker used to get into the network, and that’s how the loss occured. And when we talk organised crime we are talking about really organised crime… One of the Ukrainian organised crime groups were offering a Ferrari for their employee of the month prize for malware. We are talking seriously Organised. And serious financial gain. And it is extremely hard to trace that money once its gone. And we see breaches going on and on and on…

Equifax is a really interesting one. There are 23 class action suits already around that one and that’s the tip of the iceberg. There has been a lot of talk of big organisations going under because of cyber security, and when you see these numbers for different companies, that looks increasingly likely. Major attacks lead to real drops in share prices and real impacts on the economy. And there are tangible and intangible costs of any attack…. From investigation and remediation through to DEO and CTO’s losing their jobs or facing prison time – at that level you can personally liable in the event of an attack.

In terms of the trends… 99% of exploited vulnerabilities (in 2014) had been identified for more than a year, some as far back as 1999. Wannacry was one of these – firms had 2 months notice and the issues still weren’t addressed by many organisations.

When we go in after a breach, typically the breach has been taking place for 200 days already – and that’s the breaches we find. That means the attacker has had access and has been able to explore the system for that long. This is very real and firms are dealing with this well and really badly – some real variance.

One example, the most successful bank robbery of all time, was the Bangladesh Central Bank was attacked in Feb 2016 through the SWIFT network .These instructions totalled over US $900 million, mostly laundered through casinos in Macau. The analysis identified that malware was tailored for the target organisation based on the printers they were using, which scrubbed all entry and exit points in the bank. The US Secret Service found that there were three groups – two inside the bank, one outside executing the attack.

Cyber security concerns are being raised, but how can we address this as organisations? How do we invest in the right ways? What risk is acceptable? One challenge for banks is that they are being asked to use Fintechs and SMEs working in technology… But some of these startups are very small and that’s a real concern for heads of securities in banks.

We do a global annual survey on security, across about 10,000 people. We ask about the source of compromise – current employees are the biggest by some distance. And current customer data, as well as IPR, tend to be the data that is at risk. We also see Health and Social Care adopting more technology, and having high concern, but spending very little to counter the risks. So, with Wannacry, the NHS were not well set up to cope and the press love the story… But they weren’t the target in any way.

A few Mythbusters for you…

Anti-Virus software… We create Malware to test our clients’ set up. We write malware that avoids AVs. Only 10-15% of malware will be caught with Anti-Virus software. There is an open source tool, Veil-Framework, that teaches you how to write that sort of Malware so that you can understand the risks. You should be using AV, but you have to be aware that malware goes beyond that (and impacts Macs too)… There is a malware SaaS business model on the darknet – as an attacker you’ll get a guarantee for your malware’s success and support to use it!

Myth 2: we still have time to react. Well, no, the lag from discovery to impacting you and your set up can be minutes.

Myth 3: well it must have been a zero day that got us! True Zero Day exploits are extremely rare/valuable. Attacker won’t use one unless target is very high value and they have no other option. They are hard to use. Even NSA admits that persistence is key to sucessful compromise, not zero day exploits. The NSA created EternalBlue – a zero day exploit – and that was breached and deployed out to these “good guys” as Wannacry.

Passwords… They are a thing of the past I think. 2-factor authentication is more where we are at. Passphrases and strength of passphrases is key. So complex strings with a number and a site name at the end is recommended these days. Changing every 30 days isn’t that useful – it’s so easy to bruteforce the password if lost – much better to have a really strong hash in the first place.

Phishing email is huge. We think about 80% of cyber attacks start that way. Beware spoofed addreses, or extremely small changes to email addresses.

We had a client that had an email from their “finance director” about urgently paying money to an account, which was only spotted because someone in finance noticed the phrasing… “the chief exec never says “Thanks”!”

Malware trends: our strong view is that you should never ever pay for a Ransomeware attack.

I have another video here…

(In this video we have people having their “mind read” for some TV show… It was uncanny… And included spending data… But it wasn’t psychic… It was data that they had looked up and discovered online… )

It’s not a nice video… This is absolutely real… This whole digital footprint. We do a service called Digital Footprinting for senior execs in companies, and you have to be careful about it as they can give so much away by what you and those around you post… It’s only getting worse and more pointed. There are threat groups going for higher value targets, they are looking for disruption. We think that the Internet of Things will open up the attack surface in whole new ways… And NACS – the Air Traffic people – they are thinking about drones and the issues there around fences and airspace… How do you prepare for this. Take the connected home… These fridges are insecure, you can detect if owner is opened or not and detect if they are at home or not… The nature of threats is changing so much…

In terms of trends the attacks are moving up the value chain… Retain bank clients aren’t interesting compared to banks finance systems, more to exchanges or clearing houses. It’s about value of data… Data is maybe $0.50 for email credentials; a driving license is maybe $25… and upwards the price goes depending on value to the attackers…

So, a checklist for you and your work: (missed this but delighted that digital footprint was item 1)

Finally, go have a look at your phone and how much data is being captured about you… Check your iPhone frequent locations. And on Android check Google Location History. The two biggest companies in the world, Google and Facebook, are free, and they are free because of all the data that they have about you… But the terms of service… Paypal’s are longer than Hamlet. If you have a voice control TV from Samsung and you sign those, you agree to always on and sharable with third parties…

So, that’s me… Hopefully that gave you something to ponder!

Q&A

Q1) What does PWC think about Deloitte’s recent attack?

A1) Every firm faces these threats, and we are attacked all the time… We get everything thrown at us… And we try to control those but we are all at risk…

Q2) What’s your opinion on cyber security insurance?

A2) I think there is a massive misunderstanding in the market about what it is… Some policies just cover recovery, getting a response firm in… When you look at Equifax, what would that cover… That will put insurers out of business. I think we’ll see government backed insurance for things like that, with clarity about what is included, and what is out of scope. So, if, say, SQL Injection is the cause, that’s probably negligence and out of scope…

Q3) What role should government have in protecting private industry?

A3) The national cyber security centre is making some excellent progress on this. Backing for that is pretty positive. All of my clients are engaging and engaged with them. It has to be at that level. It’s too difficult now at lower levels… We do work with GCHQ sharing information on upcoming threats… Some of those are state sponsored… They even follow working hours in their source location… Essentially there are attack firms…

Q4) (I’m afraid I missed this question)

A4) I think Microsoft in the last year have transformed their view… My honest view is that clients should be on Windows 10 its a gamechanger for security. Firms will do analysis on patches and service impacts… But they delayed that a bit long. I have worked at a firm with a massively complex infrastructure, and it sounds easy to patch but it can be quite difficult to do that in practice, and it can put big operational systems at risk. As a multinational bank for instance you might be rolling out to huge numbers of machines and applications.

Talk by Kami Vaniea (University of Edinburgh) covering common misconceptions around Information Security and to avoid them

My research is on the usability of security and why some failings are happening from the point of view of an average citizen. I do talks to community groups – so this presentation is a mixture of that sort of content and proper security discussion.

I wanted to start with misconceptions as system administrators… So I have a graph here of where there is value to improving your password; then the range in which having rate limits on password attempts; and the small area of benefit to the user. Without benefits you are in the deadzone.

OK, a quick question about URL construction… http://facebook.mobile.com? Is it Facebook’s website, Facebook’s mobile site, AT&T’s website, or Mobile’s website. It’s the last one by construction. It’s both of the last two if you know AT&T own mobile.com. But when you ask a big audience they mainly get it right. Only 8% can correctly differentiate http://facebook.profile.com vs http://profile.facebook.com. Many users tend to just pick a big company name regardless of location in URLs. A few know how to to correctly read subdomain URLs. We did this study on Amazon Mechanical Turk – so that’s a skewed sample of more technical people. And that URL understanding has huge problematic implications for phishing email.

We also tried http://twitter.com/facebook.com. Most people could tell that was Twitter (not Facebook). But if I used “@” instead of “/” people didn’t understand, thought it was an email…

On the topic of email… Can we trust the “from” field? No. Can we trust a “this email has been checked for viruses…” box? No. Can you trust the information on the source URL for a link in the email, that is shown in the bottom of the browser? Yes.

What about this email – a Security alert for your linked Google account email? Well this is legitimate… Because it’s coming from accounts.google.com. But you knew this was a trick question… Phishing is really tricky…

So, a shocking percentage of my students think that “from” address is legitimate… Tell your less informed friends how easily that can be spoofed…

What about Google. Does Google know what you type as you type it and before you hit enter? Yes, it does… Most search engines send text to their servers as you write it. Which means you can do fun studies on what people commonly DON’T post to Facebook!

A very common misconception is that opening web pages, emails, pdfs, and docs is like reading physical paper… So why do they need patching?

Lets look at an email example… I don’t typically get emails with “To protect your privacy, Thunderbird has blocked remote content in this message” from a student… This showed me that a 1 pixel invisible image had come with the email… which pinged the server if I opened it. I returned the email and said he had a virus. He said “no, I used to work in marketing and forgot that I had that plugin set up”.

Websites are made of many elements from many sources. Mainly dynamically… And there are loads of trackers across those sites. There is a tool called Lightbeam that will help you track the sites you go to on purpose, and all the other sites that track you. That’s obviously a privacy issue. But it is also a security problem. The previous speaker spoke about supply chains at Target, this is the web version of this… That supply chain gets huge when you visit, say, six websites.

So, a quiz question… I got to Yahoo, I hit reload… Am I running the same code as a moment ago… ? Well, it’s complicated… I had a student run a study on this… And how much changes… In a week about half of the top 200 sites had changed their javascript in a week. I see trackers change between individual reloads… But it might change, it might not…

So we as users you access a first party website, then they access third party sites… So they access ad servers and that sells that user, and ad is returned, with an image (sometimes with code). Maybe I bid to a company, that bids out again… This is huge as a supply chain and tracking issue…

So the Washington Post, for instance, covering the yahoo.com malware attack showed that malicious payloads were being delivered to around 300k users per hour, but only about 9% (27k) users per hour were affected – they were the ones that hadn’t updated their systems. How did that attack take place? Well rather than attack, they just brought an ad and ran malware code.

There is a tool called Ghostery… It’s brilliant and useful… But it’s run by the ad industry and all the trackers are set the wrong way. Untick those all and then it’s fascinating… They tell you about page load and all the components involved in loading a page…

To change topic…

Cookies! Yes, they can be used to track you across web sites. But they can’t give you malware as is. So… I will be tackling the misconception that cookies is evil… And I’m going to try to convince you otherwise. Tracking can be evil… But cookies is kind of an early example of privacy by design…

It is 1994. The internet cannot remember anyone between page loads. You have an interaction with a web server that has absolutely no memory. Cookies help something remember between page loads and web pages… Somehow a server has to know who you are… But back in 1994 you just open a page and look at it, that’s the interaction point…

But companies wanted shopping baskets, and memory between two page reloads. There is an obvious technical solution… You just give every browser a unique identifier… Great! The server remembers you. But the problem is a privacy issue across different servers… So, Netscape implemented cookies – small text strings the server could ask the browser to remember and give back to it later…

Cookies have some awesome properties: it is client visible; third party tracking is client visible too; it’s opt out (delete) option on a per-site basis; it’s only readable by the site that set it; and it allows for public discussion of tracking…

… Which is why Android/iOS both went with the unique ID option. And that’s how you can be tracked. As a design decision it’s very different…

Now to some of the research I work on… I believe in getting people to touch stuff, to interact with it… We can talk to each other, or mystify, but we need to actually have people understand this stuff. So we ran an outreach activity to build a website, create a cookie, and then read the cookie out… Then I give a second website… To let people try to understand how to change their names on one site, not the other… What happens when you view them in Incognito mode… And then exploring cookies across sites. And how that works…

Misconception: VPNs solve all privacy and security problems. Back at Indiana I taught students who couldn’t code… And that was interesting… They saw VPNs as magic fairy dust. And they had absorbed this idea that anyone can be hacked at any time… They got that… But that had resulted in “but what’s the point”. That worries me… In the general population we see media coverage of attacks on major companies… And the narrative that attacks are inevitable… So you end up with this problem…

So, I want to talk about encryption and why it’s broken and what that means by VPNs. I’m not an encryption specialist. I care about how it works for the user.

In encryption we want (1) communication between you and the other party is confidential and has not been changes, and no-one can read what you sent and no one can change what you sent; and (2) to know who we are talking about. And that second part is where things can be messed up. You can make what you think is the secure connection to the right person, but could be a secure connection to the wrong person – a man in the middle attack. A real world example… You go to a coffee shop and use wifi to request the BBC news site, but you get a wifi login page. That’s essentially a man in the middle attack. That’s not perhaps harmful, it’s normal operating procedure… VPNs basically work like this…

So, an example of what really happened to a student… I set up a page that just had them creating a very simple cookie page… I was expecting something simple… But one of them submitted a page with a bit of javascript… it is basically injecting code so if I connect to it, it will inject an ad to open in my VPN…. So in this case a student logged in to AnchorFree – magic fairy dust – and sees a website and injects code that is what I see when they submit the page in Blackboard Learn…

VPNs are not magic fairy dust. The University runs an excellent VPN – far better for coffee shops etc!

So, I like to end with some common advice:

  • Install anti virus scanner. Don’t turn off Windows 8+ automatically installed AV software… I ran a study where 50% of PhD students had switched off that software and firewalls…
  • Keep your software updated – best way to stay safe
  • Select strong passcode for important things you use all the time
  • For non-important stuff, use a password manager for less important things that you use rarely… Best to have different password between them…
  • Software I use:
    • Ad blockers – not just ads, reduce lots of extra content loading. The more websites you visit the more vulnerable you are
    • Ghostery and Privacy Badger
    • Lightbeam
    • Password Managers (LastPass, OnePassword and KeePass are most recommended
    • 2-factor like Yubikey – extra protection for e.g. Facebook.
    • If you are really serious: UMatrix and NoScript BUT it will break lots of pages…

Q&A

Q1) It’s hard to get an average citizen to do everything… How do you get around that and just get the key stuff across…

A1) Probably it’s that common advice. The security community has gotten better at looking at 10 key stuff. Google did a study with Blackhats Infosec conference about what they would do… And asked on Amazon Mechanical Turj about what they would recommend to friends. About the only common answer amongst blackhats was “update your software”. But actually there is overlap… People know they should change passwords, and should use AV software… But AV software didn’t show on the Blackhat list… But 2-factor and password managers did…

Q2) What do you think about passwords… long or complex or?

A2) We did a study maybe 8 years ago on mnemonic passwords… And found that “My name is Inigo Montoya, you killed my father, prepare to die” was by far the most common. The issue isn’t length… It’s entropy. I think we need to think server side about how many other users have used the same password (based on encrypted version), and you need something that less than 3 people use…

Q2) So more about inability to remember it…

A2) And it depends on threat type… If someone knows you, your dog, etc… Then it’s easier… But if I can pick a password for a long time I might invest in it – but if you force people to change passwords they have to remember it. There was a study that people using passwords a lot use some affirmations, such as “I love God”… And again, hard to know how you protect that.

Q3) What about magic semantic email links instead of passwords…

A3) There is some lovely work on just how much data is in your email… That’s a poor mans version of the OAuth idea of getting an identity provider to authenticate the user. It’s good for the user, but that is one bigger stake login then… And we see SMS also being a mixed bag and being subject to attack… Ask a user though… “there’s nothing important in my email”.

Q4) How do you deal with people saying “I don’t have anything to hide”?

A4) Well I start with it not being about hiding… It’s more, why do you want to know? When I went to go buy a car I didn’t dress like a professor, I dressed down… I wanted a good price… If I have a lot of time I will refer them to Daniel Salvo’s Nothing to Hide.

Talk by Nicola Osborne (EDINA) covering Digital Footprints and how you can take control of your online self

And that will be me… So keep an eye out for tweets from others on the event hashtag: #UoEInfoSec.

Share/Bookmark

Repository Fringe 2017 (#rfringe17) – Day One Liveblog

Welcome – Janet Roberts, Director of EDINA

My colleagues were explaining to me that this event came from an idea from Les Carr that should be not just one repository conference, but also a fringe – and here were are at the 10th Repository Fringe on the cusp of the Edinburgh Fringe.

So, this week we celebrate ten years of repository fringe, the progress we have made over the last 10 years to share content beyond borders. It is a space for debating future trends and challenges.

At EDINA we established the OpenDepot to provide a space for those without a repository… That has now migrated to Zenodo… and the challenges are changing, around the size of data, how we store and access that data, and what those next generation repositories will look like.

Over the next few days we have some excellent speakers as well as some fringe events, including the Wiki Datathon – so I hope you have all brought your laptops!

Thank you to our organising team from EDINA, DCC and the University of Edinburgh. Thank you also to our sponsors: Atmire; FigShare; Arkivum; ePrints; and Jisc!

Opening Keynote – Kathleen Shearer, Executive Director COARRaising our game – repositioning repositories as the foundation for sustainable scholarly communication

Theo Andrew: I am delighted to introduce Kathleen, who has been working in digital libraries and repositories for years. COAR is an international organisation of repositories, and I’m pleased to say that Edinburgh has been a member for some time.

Kathleen: Thank you so much for inviting me. It’s actually my first time speaking in the UK and it’s a little bit intimidating as I know that you folks are really ahead here.

COAR is now about 120 members. Our activities fall into four areas: presenting an international voice so that repositories are part of a global community with diverse perspective. We are being more active in training for repository managers, something which is especially important in developing countries. And the other area is value added services, which is where today’s talk on the repository of the future comes in. The vision here is about

But first, a rant… The international publishing system is broken! And it is broken for a number of reasons – there is access, and the cost of access. The cost of scholarly journals goes up far beyond the rate of inflation. That touches us in Canada – where I am based, in Germany, in the UK… But much more so in the developing world. And then we have the “Big Deal”. A study of University of Montreal libraries by Stephanie Gagnon found that of 50k subscribed-to journals, really there were only 5,893 unique essential titles. But often those deals aren’t opted out of as the key core journals separately cost the same as that big deal.

We also have a participation problem… Juan Pablo Alperin’s map of authors published in Web of Science shows a huge bias towards the US and the UK, a seriously reduced participation in Africa and parts of Asia. Why does that happen? The journals are operated from the global North, and don’t represent the kinds of research problems in the developing world. And one Nobel Prize winner notes that the pressure to publish in “luxury” journals encourages researchers to cut corners and pursue trendy fields rather than areas where there are those research gaps. That was the cake with Zika virus – you could hardly get research published on that until a major outbreak brought it to the attention of the dominant publishing cultures, then there was huge appetite to publish there.

Timothy Gowers talks about “perverse incentives” which are supporting the really high costs of journals. It’s not just a problem for researchers and how they publish, its also a problem of how we incentivise researchers to publish. So, this is my goats in trees slide… It doesn’t feel like goats should be in trees… Moroccan tree goats are taught to climb the trees when there isn’t food on the ground… I think of the researchers able to publish in these high end journals as being the lucky goats in the tree here…

In order to incentivise participation in high end journals we have created a lucrative publishing industry. I’m sure you’ve seen the recent Guardian article: “is the staggeringly profitable business of science publishing bad for science”. Yes. For those reasons of access and participation. We see very few publishers publishing the majority of titles, and there is a real

My colleague Leslie Chan, funded by the International Development Council, talked about openness not just being about gaining access to knowledge but also about having access to participate in the system.

On the positive side… Open access has arrived. A recent study (Piwowar et al 2017) found that about 45% of articles published in 2015 were open access. And that is increasing every year. And you have probably seen the May 27th 2016 statement from the EU that all research they fund must be open by 2020.

It hasn’t been a totally smooth transition… APCs (Article Processing Charges) are very much in the mix and part of the picture… Some publishers are trying to slow the growth of access, but they can see that it’s coming and want to retain their profit margins. And they want to move to all APCs. There is discussion here… There is a project called OA2020 which wants to flip from subscription based to open access publishing. It has some traction but there are concerns here, particularly about sustainability of scholarly comms in the long term. And we are not syre that publishers will go for it… Particularly one of them (Elsevier) which exited talks in The Netherlands and Germany. In Germany the tap was turned off for a while for Elsevier – and there wasn’t a big uproar from the community! But the tap has been turned back on…

So, what will the future be around open access? If you look across APCs and the average value… If you think about the relative value of journals, especially the value of high end journals… I don’t think we’ll see lesser increases in APCs in the future.

At COAR we have a different vision…

Lorcan Dempsey talked about the idea of the “inside out” library. Similarly a new MIT Future of Libraries Report – published by a broad stakeholder group that had spent 6 months working on a vision – came up with the need for libraries to be open, trusted, durable, interdisciplinary, interoperable content platform. So, like the inside out library, it’s about collecting the output of your organisation and making is available to the world…

So, for me, if we collect articles… We just perpetuate the system and we are not in a position to change the system. So how do we move forward at the same time as being kind of reliant on that system.

Eloy Rodrigues, at Open Repository earlier this year, asked whether repositories are a success story. They are ubiquitous, they are adopted and networked… But then they are also using old, pre-web technologies; mostly passive recipients; limited interoperability making value added systems hard; and not really embedded in researcher workflows. These are the kinds of challenges we need to address in next generation of repositories…

So we started a working group on Next Generation Repositories to define new technologies for repositories. We want to position repositories as the foundation for a distributed, globally networked infrastructure for scholarly communication. And on top of which we want to be able to add layers of value added services. Our principles include distributed control to guard againts failure, change, etc. We want this to be inclusive, and reflecting the needs of the research communities in the global south. We want intelligent openness – we know not everything can be open.

We also have some design assumptions, with a focus on the resources themselves, not just associated metadata. We want to be pragmatic, and make use of technologies we have…

To date we have identified major use cases and user stories, and shared those. We determined functionality and behaviours; and a conceptual models. At the moment we are defining specific technologies and architectures. We will publish recommendations in September 2017. We then need to promote it widely and encourages adoption and implementation, as well as the upgrade of repositories around the world (a big challenge).

You can view our user stories online. But I’d like to talk about a few of these… We would like to enable peer review on top of repositories… To slowly incrementally replace what researchers do. That’s not building peer review in repositories, but as a layer on top. We also want some social functionalities like recommendations. And we’d like standard usage metrics across the world to understand what is used and hw.. We are looking to the UK and the IRUS project there as that has already been looked at here. We also need to address discovery… Right now we use metadata, rather than indexing full text content… So contat can be hard to get to unless the metadata is obvious. We also need data syncing in hubs, indexing systems, etc. reflect changes in the repositories. And we also want to address preservation – that’s a really important role that we should do well, and it’s something that can set us apart from the publishers – preservation is not part of their business model.

So, this is a slide from Peter Knoth at CORE – a repository aggregator – who talks about expanding the repository, and the potential to layer all of these additional services on top.

To make this happen we need to improve the functionality of repositories: to be of and not just on the web. But we also need to step out of the article paradigm… The whole system is set up around the article, but we need to think beyond that, deposit other content, and ensure those research outputs are appropriately recognised.

So, we have our (draft) conceptual model… It isn’t around siloed individual repositories, but around a whole network. And some of our draft recommendations for technologies for next generation repositories. These are a really early view… These are things like: ResourceSync; Signposting; Messaging protocols; Message queue; IIIF presentation API; AOAuth; Webmention; and more…

Critical to the widespread adoption of this process is the widespread adoption of the behaviours and functionalities for next generation repositories. It won’t be a success if only one software or approach takes these on. So I’d like to quote a Scottish industrialist, Andrew Carnegie: “strength is derived from unity…. “. So we need to coalesce around a common vision.

Ad it isn’t just about a common vision, science is global and networked and our approach has to reflect and connect with that. Repositories need to balance a dual mission to (1) showcase and provide access to institutional research and (2) be nodes in a global research network.

To support better networking in repositories and in Venice, in May we signed an International Accord for Repository Networks, with networks from Australasia, Canada, China, Europe, Japan, Latin America, South Africa, United States. For us there is a question about how best we work with the UK internationally. We work with with OpenAIRE but maybe we need something else as well. The networks across those areas are advancing at different paces, but have committed to move forward.

There are three areas of that international accord:

  1. Strategic coordination – to have a shared vision and a stronger voice for the repository community
  2. Interoperability and common “behaviours” for repositories – supporting the development of value added services
  3. Data exchange and cross regional harvesting – to ensure redundancy and preservation. This has started but there is a lot to do here still, especially as we move to harvesting full text, not just metadata. And there is interest in redundancy for preservation reasons.

So we need to develop the case for a distributed community-managed infrastructure, that will better support the needs of diverse regions, disciplines and languages. Redundancy will safeguard against failure. With less risk of commercial buy out. Places the library at the centre… But… I appreciate it is much harder to sell a distributed system… We need branding that really attracts researchers to take part and engage in †he system…

And one of the things we want to avoid… Yesterday it was announced that Elsevier has acquired bepress. bepress is mainly used in the US and there will be much thinking about the implications for their repositories. So not only should institutional repositories be distributed, but they should be different platforms, and different open source platforms…

Concluding thoughts here… Repositories are a technology and technologies change. What its really promoting is a vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system. This is really the future of libraries in the scholarly communication community. This is what libraries should be doing. This is what our values represent.

And this is urgent. We see Elsevier consolidating, buying platforms, trying to control publishers and the research cycle, we really have to move forward and move quickly. I hope the UK will remain engaged with this. And i look forward to your participation in our ongoing dialogue.

Q&A

Q1 – Les Carr) I was very struck by that comment about the need to balance the local and the global I think that’s a really major opportunity for my university. Everyone is obsessed about their place in the global university ranking, their representation as a global university. This could be a real opportunity, led by our libraries and knowledge assets, and I’m really excited about that!

A1) I think the challenge around that is trying to support common values… If you are competing with other institutions it’s not always an incentive to adopt systems with common technologies, measures, approaches. So there needs to be a benefit for institutions in joining this network. It is a huge opportunity, but we have to show the value of joining that network It’s maybe easier in the UK, Europe, Canada. In the US they don’t see that value as much… They are not used to collaborating in this way and have been one of the hardest regions to bring onboard.

Q2 – Adam ?) Correct me if I’m wrong… You are talking about a Commons… In some way the benefits are watered down as part of the Commons, so how do we pay for this system, how do we make this benefit the organisation?

A2) That’s where I see that challenge of the benefit. There has to be value… That’s where value added systems come in… So a recommender system is much more valuable if it crosses all of the repositories… That is a benefit and allows you to access more material and for more people to access yours. I know CORE at the OU are already building a recommender system in their own aggregated platform.

Q3 – Anna?) At the sharp end this is not a problem for libraries, but a problem for academia… If we are seen as librarians doing things to or for academics that won’t have as much traction… How do we engage academia…

A3) There are researchers keen to move to open access… But it’s hard to represent what we want to do at a global level when many researchers are focused on that one journal or area and making that open access… I’m not sure what the elevator pitch should be here. I think if we can get to that usage statistics data there, that will help… If we can build an alternative system that even research administrators can use in place of impact factor or Web of Science, that might move us forward in terms of showing this approach has value. Administrators are still stuck in having to evaluate the quality of research based on journals and impact factors. This stuff won’t happen in a day. But having standardised measures across repositories will help.

So, one thing we’ve done in Canada with the U15 (top 15 universities in Canada)… They are at the top of what they can do in terms of the cost of scholarly journals so they asked us to produce a paper for them on how to address that… I think that issue of cost could be an opportunity…

Q4) I’m an academic and we are looking for services that make our life better… Here at Edinburgh we can see that libraries are the naturally the consistent point of connection with repository. Does that translate globally?

A4) It varies globally. Libraries are fairly well recognised in Western countries. In developing world there are funding and capacity challenges that makes that harder… There is also a question of whether we need repositories for every library.. Can we do more consortia repositories or similar.

Q5 – Chris) You talked about repository supporting all kinds of materials… And how they can “wag the dog” of the article

A5) I think with research data there is so much momentum there around making data available… But I don’t know how well we are set up with research data management to ensure data can be found and reused. We need to improve the technology in repositories. And we need more resources too…

Q6) Can we do more to encourage academics, researchers, students to reuse data and content as part of their practice?

A6) I think the more content we have at Commons level, the more it can be reused. We have to improve discoverability, and improve the functionality to help that content to be reused… There is huge use of machine reuse of content – I was speaking with Peter Knoth about this – but that isn’t easy to do with repositories…

Theo) It would be really useful to see Open Access buttons more visible, using repositories for document delivery, etc.

Chris Banks, Director of Library Services, Imperial CollegeFocusing upstream: supporting scholarly communication by academics

10×10 presentations (Chair: Ianthe Sutherland, University Library & Collections)

  1. v2.juliet – A Model For SHERPA’s Mid-Term Infrastructure. Adam Field, Jisc
  1. CORE Recommender: a plug in suggesting open access content. Nancy Pontika, CORE
  1. Enhancing Two workflows with RSpace & Figshare: Active Data to Archival Data and Research to Publication. Rory Macneil, Research Space and Megan Hardeman of Figshare
  1. Thesis digitisation project. Gavin Willshaw, University of Edinburgh
  1. Weather Cloudy & Cool Harvest Begun’: St Andrews output usage beyond the repository. Michael Bryce, University of St Andrews

Impact and the REF panel session

Brief for this session: How are institutions preparing for the next round of the Research Excellence Framework #REF2021, and how do repositories feature in this? What lessons can we learn from the last REF and what changes to impact might we expect in 2021? How can we improve our repositories and associated services to support researchers to achieve and measure impact with a view to the REF? In anticipation of the forthcoming announcement by HEFCE later this year of the details of how #REF2021 will work, and how impact will be measured, our panel will discuss all these issues and answer questions from RepoFringers.

Pauline Jones, REF Manager and Head of Strategic Performance and Research Policy, University of Edinburgh

Anne-Sofie Laegran, Knowledge Exchange Manager, College of Arts, Humanities and Social Sciences, University of Edinburgh

Catriona Firth, REF Deputy Manager, HEFCE

Chair: Keith McDonald, Assistant Director, Research and Innovation Directorate, Scottish Funding Council

10×10 presentations

  1. National Open Data and Open Science Policies in Europe. Martin Donnelly, DCC
  1. IIIF: you can keep your head while all around are losing theirs! Scott Renton, University of Edinburgh
  1. Reference Rot in theses: a HiberActive pilot. Nicola Osborne, EDINA
  1. Lifting the lid on global research impact: implementation and analysis of a Request a Copy service. Dimity Flanagan, London School of Economics and Political Science
  1. What RADAR did next: developing a peer review process for research plans. Nicola Siminson, Glasgow School of Art
  1. Edinburgh DataVault: Local implementation of Jisc DataVault: the value of testing. Pauline Ward, EDINA
  1. Data Management & Preservation using PURE and Archivematica at Strathclyde. Alan Morrisson, University of Strathclyde
  1. Open Access… From Oblivion… To the Spotlight? Dawn Hibbert, University of Northampton
  1. Automated metadata collection from the researcher CV Lattes Platform to aid IR ingest. Chloe Furnival, Universidade Federal de São Carlos
  1. The Changing Face of Goldsmiths Research Online. Jeremiah Spillane, Goldsmiths, University of London

Chair: Ianthe Sutherland, University Library & Collections

Share/Bookmark

ReCon 2017 – Liveblog

Today I’m at ReCon 2017, giving a presentation later (flying the flag for the unconference sessions!) today but also looking forward to a day full of interesting presentations on publishing for early careers researchers.

I’ll be liveblogging (except for my session) and, as usual, comments, additions, corrections, etc. are welcomed. 

Jo Young, Director of the Scientific Editing Company, is introducing the day and thanking the various ReCon sponsors. She notes: ReCon started about five years ago (with a slightly different name). We’ve had really successful events – and you can explore them all online. We have had a really stellar list of speakers over the years! And on that note…

Graham Steel: We wanted to cover publishing at all stages, from preparing for publication, submission, journals, open journals, metrics, alt metrics, etc. So our first speakers are really from the mid point in that process.

SESSION ONE: Publishing’s future: Disruption and Evolution within the Industry

100% Open Access by 2020 or disrupting the present scholarly comms landscape: you can’t have both? A mid-way update – Pablo De Castro, Open Access Advocacy Librarian, University of Strathclyde

It is an honour to be at this well attended event today. Thank you for the invitation. It’s a long title but I will be talking about how are things are progressing towards this goal of full open access by 2020, and to what extent institutions, funders, etc. are being able to introduce disruption into the industry…

So, a quick introduction to me. I am currently at the University of Strathclyde library, having joined in January. It’s quite an old university (founded 1796) and a medium size university. Previous to that I was working at the Hague working on the EC FP7 Post-Grant Open Access Pilot (Open Aire) providing funding to cover OA publishing fees for publications arising from completed FP7 projects. Maybe not the most popular topic in the UK right now but… The main point of explaining my context is that this EU work was more of a funders perspective, and now I’m able to compare that to more of an institutional perspective. As a result o of this pilot there was a report commissioned b a British consultant: “Towards a competitive and sustainable open access publishing market in Europe”.

One key element in this open access EU pilot was the OA policy guidelines which acted as key drivers, and made eligibility criteria very clear. Notable here: publications to hybrid journals would not be funded, only fully open access; and a cap of no more than €2000 for research articles, €6000 for monographs. That was an attempt to shape the costs and ensure accessibility of research publications.

So, now I’m back at the institutional open access coalface. Lots had changed in two years. And it’s great to be back in this spaces. It is allowing me to explore ways to better align institutional and funder positions on open access.

So, why open access? Well in part this is about more exposure for your work, higher citation rates, compliant with grant rules. But also it’s about use and reuse including researchers in developing countries, practitioners who can apply your work, policy makers, and the public and tax payers can access your work. In terms of the wider open access picture in Europe, there was a meeting in Brussels last May where European leaders call for immediate open access to all scientific papers by 2020. It’s not easy to achieve that but it does provide a major driver… However, across these countries we have EU member states with different levels of open access. The UK, Netherlands, Sweden and others prefer “gold” access, whilst Belgium, Cyprus, Denmark, Greece, etc. prefer “green” access, partly because the cost of gold open access is prohibitive.

Funders policies are a really significant driver towards open access. Funders including Arthritis Research UK, Bloodwise, Cancer Research UK, Breast Cancer Now, British Heard Foundation, Parkinsons UK, Wellcome Trust, Research Councils UK, HEFCE, European Commission, etc. Most support green and gold, and will pay APCs (Article Processing Charges) but it’s fair to say that early career researchers are not always at the front of the queue for getting those paid. HEFCE in particular have a green open access policy, requiring research outputs from any part of the university to be made open access, you will not be eligible for the REF (Research Excellence Framework) and, as a result, compliance levels are high – probably top of Europe at the moment. The European Commission supports green and gold open access, but typically green as this is more affordable.

So, there is a need for quick progress at the same time as ongoing pressure on library budgets – we pay both for subscriptions and for APCs. Offsetting agreements are one way to do this, discounting subscriptions by APC charges, could be a good solutions. There are pros and cons here. In principal it will allow quicker progress towards OA goals, but it will disproportionately benefit legacy publishers. It brings publishers into APC reporting – right now sometimes invisible to the library as paid by researchers, so this is a shift and a challenge. It’s supposed to be a temporary stage towards full open access. And it’s a very expensive intermediate stage: not every country can or will afford it.

So how can disruption happen? Well one way to deal with this would be the policies – suggesting not to fund hybrid journals (as done in OpenAire). And disruption is happening (legal or otherwise) as we can see in Sci-Hub usage which are from all around the world, not just developing countries. Legal routes are possible in licensing negotiations. In Germany there is a Projekt Deal being negotiated. And this follows similar negotiations by open access.nl. At the moment Elsevier is the only publisher not willing to include open access journals.

In terms of tools… The EU has just announced plans to launch it’s own platform for funded research to be published. And Wellcome Trust already has a space like this.

So, some conclusions… Open access is unstoppable now, but still needs to generate sustainable and competitive implementation mechanisms. But it is getting more complex and difficult to disseminate to research – that’s a serious risk. Open Access will happen via a combination of strategies and routes – internal fights just aren’t useful (e.g. green vs gold). The temporary stage towards full open access needs to benefit library budgets sooner rather than later. And the power here really lies with researchers, which OA advocates aren’t always able to get informed. It is important that you know which are open and which are hybrid journals, and why that matters. And we need to think if informing authors on where it would make economic sense to publish beyond the remit of institutional libraries?

To finish, some recommended reading:

  • “Early Career Researchers: the Harbingers of Change” – Final report from Ciber, August 2016
  • “My Top 9 Reasons to Publish Open Access” – a great set of slides.

Q&A

Q1) It was interesting to hear about offsetting. Are those agreements one-off? continuous? renewed?

A1) At the moment they are one-off and intended to be a temporary measure. But they will probably mostly get renewed… National governments and consortia want to understand how useful they are, how they work.

Q2) Can you explain green open access and gold open access and the difference?

A2) In Gold Open Access, the author pays to make your paper open on the journal website. If that’s a hybrid – so subscription – journal you essentially pay twice, once to subscribe, once to make open. Green Open Access means that your article goes into your repository (after any embargo), into the world wide repository landscape (see: https://www.jisc.ac.uk/guides/an-introduction-to-open-access).

Q3) As much as I agree that choices of where to publish are for researchers, but there are other factors. The REF pressures you to publish in particular ways. Where can you find more on the relationships between different types of open access and impact? I think that can help?

A3) Quite a number of studies. For instance is APC related to Impact factor – several studies there. In terms of REF, funders like Wellcome are desperate to move away from the impact factor. It is hard but evolving.

Inputs, Outputs and emergent properties: The new Scientometrics – Phill Jones, Director of Publishing Innovation, Digital Science

Scientometrics is essentially the study of science metrics and evaluation of these. As Graham mentioned in his introduction, there is a whole complicated lifecycle and process of publishing. And what I will talk about spans that whole process.

But, to start, a bit about me and Digital Science. We were founded in 2011 and we are wholly owned by Holtzbrink Publishing Group, they owned Nature group. Being privately funded we are able to invest in innovation by researchers, for researchers, trying to create change from the ground up. Things like labguru – a lab notebook (like rspace); Altmetric; Figshare; readcube; Peerwith; transcriptic – IoT company, etc.

So, I’m going to introduce a concept: The Evaluation Gap. This is the difference between the metrics and indicators currently or traditionally available, and the information that those evaluating your research might actually want to know? Funders might. Tenure panels – hiring and promotion panels. Universities – your institution, your office of research management. Government, funders, policy organisations, all want to achieve something with your research…

So, how do we close the evaluation gap? Introducing altmetrics. It adds to academic impact with other types of societal impact – policy documents, grey literature, mentions in blogs, peer review mentions, social media, etc. What else can you look at? Well you can look at grants being awarded… When you see a grant awarded for a new idea, then publishes… someone else picks up and publishers… That can take a long time so grants can tell us before publications. You can also look at patents – a measure of commercialisation and potential economic impact further down the link.

So you see an idea germinate in one place, work with collaborators at the institution, spreading out to researchers at other institutions, and gradually out into the big wide world… As that idea travels outward it gathers more metadata, more impact, more associated materials, ideas, etc.

And at Digital Science we have innovators working across that landscape, along that scholarly lifecycle… But there is no point having that much data if you can’t understand and analyse it. You have to classify that data first to do that… Historically we did that was done by subject area, but increasingly research is interdisciplinary, it crosses different fields. So single tags/subjects are not useful, you need a proper taxonomy to apply here. And there are various ways to do that. You need keywords and semantic modeling and you can choose to:

  1. Use an existing one if available, e.g. MeSH (Medical Subject Headings).
  2. Consult with subject matter experts (the traditional way to do this, could be editors, researchers, faculty, librarians who you’d just ask “what are the keywords that describe computational social science”).
  3. Text mining abstracts or full text article (using the content to create a list from your corpus with bag of words/frequency of words approaches, for instance, to help you cluster and find the ideas with a taxonomy emerging

Now, we are starting to take that text mining approach. But to use that data needs to be cleaned and curated to be of use. So we hand curated a list of institutions to go into GRID: Global Research Identifier Database, to understand organisations and their relationships. Once you have that all mapped you can look at Isni, CrossRef databases etc. And when you have that organisational information you can include georeferences to visualise where organisations are…

An example that we built for HEFCE was the Digital Science BrainScan. The UK has a dual funding model where there is both direct funding and block funding, with the latter awarded by HEFCE and it is distributed according to the most impactful research as understood by the REF. So, our BrainScan, we mapped research areas, connectors, etc. to visualise subject areas, their impact, and clusters of strong collaboration, to see where there are good opportunities for funding…

Similarly we visualised text mined impact statements across the whole corpus. Each impact is captured as a coloured dot. Clusters show similarity… Where things are far apart, there is less similarity. And that can highlight where there is a lot of work on, for instance, management of rivers and waterways… And these weren’t obvious as across disciplines…

Q&A

Q1) Who do you think benefits the most from this kind of information?

A1) In the consultancy we have clients across the spectrum. In the past we have mainly worked for funders and policy makers to track effectiveness. Increasingly we are talking to institutions wanting to understand strengths, to predict trends… And by publishers wanting to understand if journals should be split, consolidated, are there opportunities we are missing… Each can benefit enormously. And it makes the whole system more efficient.

Against capital – Stuart Lawson, Birkbeck University of London

So, my talk will be a bit different. The arguements I will be making are not in opposition to any of the other speakers here, but is about critically addressing our current ways we are working, and how publishing works. I have chosen to speak on this topic today as I think it is important to make visible the political positions that underly our assumptions and the systems we have in place today. There are calls to become more efficient but I disagree… Ownership and governance matter at least as much as the outcome.

I am an advocate for open access and I am currently undertaking a PhD looking at open access and how our discourse around this has been coopted by neoliberal capitalism. And I believe these issues aren’t technical but social and reflect inequalities in our society, and any company claiming to benefit society but operating as commercial companies should raise questions for us.

Neoliberalism is a political project to reshape all social relations to conform to the logic of capital (this is the only slide, apparently a written and referenced copy will be posted on Stuart’s blog). This system turns us all into capital, entrepreneurs of our selves – quantification, metricification whether through tuition fees that put a price on education, turn students into consumers selecting based on rational indicators of future income; or through pitting universities against each other rather than collaboratively. It isn’t just overtly commercial, but about applying ideas of the market in all elements of our work – high impact factor journals, metrics, etc. in the service of proving our worth. If we do need metrics, they should be open and nuanced, but if we only do metrics for people’s own careers and perform for careers and promotion, then these play into neoliberal ideas of control. I fully understand the pressure to live and do research without engaging and playing the game. It is easier to choose not to do this if you are in a position of privelege, and that reflects and maintains inequalities in our organisations.

Since power relations are often about labour and worth, this is inevitably part of work, and the value of labour. When we hear about disruption in the context of Uber, it is about disrupting rights of works, labour unions, it ignores the needs of the people who do the work, it is a neo-liberal idea. I would recommend seeing Audrey Watters’ recent presentation for University of Edinburgh on the “Uberisation of Education”.

The power of capital in scholarly publishing, and neoliberal values in our scholarly processes… When disruptors align with the political forces that need to be dismantled, I don’t see that as useful or properly disruptive. Open Access is a good thing in terms of open access. But there are two main strands of policy… Research Councils have spent over £80m to researchers to pay APCs. Publishing open access do not require payment of fees, there are OA journals who are funded other ways. But if you want the high end visible journals they are often hybrid journals and 80% of that RCUK has been on hybrid journals. So work is being made open access, but right now this money flows from public funds to a small group of publishers – who take a 30-40% profit – and that system was set up to continue benefitting publishers. You can share or publish to repositories… Those are free to deposit and use. The concern of OA policy is the connection to the REF, it constrains where you can publish and what they mean, and they must always be measured in this restricted structure. It can be seen as compliance rather than a progressive movement toward social justice. But open access is having a really positive impact on the accessibility of research.

If you are angry at Elsevier, then you should also be angry at Oxford University and Cambridge University, and others for their relationships to the power elite. Harvard made a loud statement about journal pricing… It sounded good, and they have a progressive open access policy… But it is also bullshit – they have huge amounts of money… There are huge inequalities here in academia and in relationship to publishing.

And I would recommend strongly reading some history on the inequalities, and the racism and capitalism that was inherent to the founding of higher education so that we can critically reflect on what type of system we really want to discover and share scholarly work. Things have evolved over time – somewhat inevitably – but we need to be more deliberative so that universities are more accountable in their work.

To end on a more positive note, technology is enabling all sorts of new and inexpensive ways to publish and share. But we don’t need to depend on venture capital. Collective and cooperative running of organisations in these spaces – such as the cooperative centres for research… There are small scale examples show the principles, and that this can work. Writing, reviewing and editing is already being done by the academic community, lets build governance and process models to continue that, to make it work, to ensure work is rewarded but that the driver isn’t commercial.

Q&A

Comment) That was awesome. A lot of us here will be to learn how to play the game. But the game sucks. I am a professor, I get to do a lot of fun things now, because I played the game… We need a way to have people able to do their work that way without that game. But we need something more specific than socialism… Libraries used to publish academic data… Lots of these metrics are there and useful… And I work with them… But I am conscious that we will be fucked by them. We need a way to react to that.

Redesigning Science for the Internet Generation – Gemma Milne, Co-Founder, Science Disrupt

Science Disrupt run regular podcasts, events, a Slack channel for scientists, start ups, VCs, etc. Check out our website. We talk about five focus areas of science. Today I wanted to talk about redesigning science for the internet age. My day job is in journalism and I think a lot about start ups, and to think about how we can influence academia, how success is manifests itself in the internet age.

So, what am I talking about? Things like Pavegen – power generating paving stones. They are all over the news! The press love them! BUT the science does not work, the physics does not work…

I don’t know if you heard about Theranos which promised all sorts of medical testing from one drop of blood, millions of investments, and it all fell apart. But she too had tons of coverage…

I really like science start ups, I like talking about science in a different way… But how can I convince the press, the wider audience what is good stuff, and what is just hype, not real… One of the problems we face is that if you are not engaged in research you either can’t access the science, and can’t read it even if they can access the science… This problem is really big and it influences where money goes and what sort of stuff gets done!

So, how can we change this? There are amazing tools to help (Authorea, overleaf, protocol.io, figshare, publons, labworm) and this is great and exciting. But I feel it is very short term… Trying to change something that doesn’t work anyway… Doing collaborative lab notes a bit better, publishing a bit faster… OK… But is it good for sharing science? Thinking about journalists and corporates, they don’t care about academic publishing, it’s not where they go for scientific information. How do we rethink that… What if we were to rethink how we share science?

AirBnB and Amazon are on my slide here to make the point of the difference between incremental change vs. real change. AirBnB addressed issues with hotels, issues of hotels being samey… They didn’t build a hotel, instead they thought about what people want when they traveled, what mattered for them… Similarly Amazon didn’t try to incrementally improve supermarkets.. They did something different. They dug to the bottom of why something exists and rethought it…

Imagine science was “invented” today (ignore all the realities of why that’s impossible). But imagine we think of this thing, we have to design it… How do we start? How will I ask questions, find others who ask questions…

So, a bit of a thought experiment here… Maybe I’d post a question on reddit, set up my own sub-reddit. I’d ask questions, ask why they are interested… Create a big thread. And if I have a lot of people, maybe I’ll have a Slack with various channels about all the facets around a question, invite people in… Use the group to project manage this project… OK, I have a team… Maybe I create a Meet Up Group for that same question… Get people to join… Maybe 200 people are now gathered and interested… You gather all these folk into one place. Now we want to analyse ideas. Maybe I share my question and initial code on GitHub, find collaborators… And share the code, make it open… Maybe it can be reused… It has been collaborative at every stage of the journey… Then maybe I want to build a microscope or something… I’d find the right people, I’d ask them to join my Autodesk 360 to collaboratively build engineering drawings for fabrication… So maybe we’ve answered our initial question… So maybe I blog that, and then I tweet that…

The point I’m trying to make is, there are so many tools out there for collaboration, for sharing… Why aren’t more researchers using these tools that are already there? Rather than designing new tools… These are all ways to engage and share what you do, rather than just publishing those articles in those journals…

So, maybe publishing isn’t the way at all? I get the “game” but I am frustrated about how we properly engage, and really get your work out there. Getting industry to understand what is going on. There are lots of people inventing in new ways.. YOu can use stuff in papers that isn’t being picked up… But see what else you can do!

So, what now? I know people are starved for time… But if you want to really make that impact, that you think is more interested… I undesrtand there is a concern around scooping… But there are ways to do that… And if you want to know about all these tools, do come talk to me!

Q&A

Q1) I think you are spot on with vision. We want faster more collaborative production. But what is missing from those tools is that they are not designed for researchers, they are not designed for publishing. Those systems are ephemeral… They don’t have DOIs and they aren’t persistent. For me it’s a bench to web pipeline…

A1) Then why not create a persistent archived URI – a webpage where all of a project’s content is shared. 50% of all academic papers are only read by the person that published them… These stumbling blocks in the way of sharing… It is crazy… We shouldn’t just stop and not share.

Q2) Thank you, that has given me a lot of food for thought. The issue of work not being read, I’ve been told that by funders so very relevant to me. So, how do we influence the professors… As a PhD student I haven’t heard about many of those online things…

A2) My co-founder of Science Disrupt is a computational biologist and PhD student… My response would be about not asking, just doing… Find networks, find people doing what you want. Benefit from collaboration. Sign an NDA if needed. Find the opportunity, then come back…

Q3) I had a comment and a question. Code repositories like GitHub are persistent and you can find a great list of code repositories and meta-articles around those on the Journal of Open Research Software. My question was about AirBnB and Amazon… Those have made huge changes but I think the narrative they use now is different from where they started – and they started more as incremental change… And they stumbled on bigger things, which looks a lot like research… So… How do you make that case for the potential long term impact of your work in a really engaging way?

A3) It is the golden question. Need to find case studies, to find interesting examples… a way to showcase similar examples… and how that led to things… Forget big pictures, jump the hurdles… Show that bigger picture that’s there but reduce the friction of those hurdles. Sure those companies were somewhat incremental but I think there is genuinely a really different mindset there that matters.

And we now move to lunch. Coming up…

UNCONFERENCE SESSION 1 

This will be me, so don’t expect an update for the moment…

SESSION TWO: The Early Career Researcher Perspective: Publishing & Research Communication

Getting recognition for all your research outputs – Michael Markie

Make an impact, know your impact, show your impact – Anna Ritchie

How to share science with hard to reach groups and why you should bother – Becky Douglas

What helps or hinders science communication by early career researchers? – Lewis MacKenzie

PANEL DISCUSSION

UNCONFERENCE SESSION 2

SESSION THREE: Raising your research profile: online engagement & metrics

Green, Gold, and Getting out there: How your choice of publisher services can affect your research profile and engagement – Laura Henderson

What are all these dots and what can linking them tell me? – Rachel Lammey

The wonderful world of altmetrics: why researchers’ voices matter – Jean Liu

How to help more people find and understand your work – Charlie Rapple

PANEL DISCUSSION

 

Share/Bookmark

Somewhere over the Rainbow: our metadata online, past, present & future

Today I’m at the Cataloguing and Indexing Group Scotland event – their 7th Metadata & Web 2.0 event – Somewhere over the Rainbow: our metadata online, past, present & future.

Paul Cunnea, CIGS Chair is introducing the day noting that this is the 10th year of these events: we don’t have one every year but we thought we’d return to our Wizard of Oz theme.

On a practical note, Paul notes that if we have a fire alarm today we’d normally assemble outside St Giles Cathedral but as they are filming The Avengers today, we’ll be assembling elsewhere!

There is also a cupcake competition today – expect many baked goods to appear on the hashtag for the day #cigsweb2. The winner takes home a copy of Managing Metadata in Web-scale Discovery Systems / edited by Louise F Spiteri. London : Facet Publishing, 2016 (list price £55).

Engaging the crowd: old hands, modern minds. Evolving an on-line manuscript transcription project / Steve Rigden with Ines Byrne (not here today) (National Library of Scotland)

 

Ines has led the development of our crowdsourcing side. My role has been on the manuscripts side. Any transcription is about discovery. For the manuscripts team we have to prioritise digitisation so that we can deliver digital surrogates that enable access, and to open up access. Transcription hugely opens up texts but it is time consuming and that time may be better spent on other digitisation tasks.

OCR has issues but works relatively well for printed texts. Manuscripts are a different matter – handwriting, ink density, paper, all vary wildly. The REED(?) project is looking at what may be possible but until something better comes along we rely on human effort. Generally the manuscript team do not undertake manual transcription, but do so for special exhibitions or very high priority items. We also have the challenge that so much of our material is still under copyright so cannot be done remotely (but can be accessed on site). The expected user community generally can be expected to have the skill to read the manuscript – so a digital surrogate replicates that experience. That being said, new possibilities shape expectations. So we need to explore possibilities for transcription – and that’s where crowd sourcing comes in.

Crowd sourcing can resolve transcription, but issues with copyright and data protection still have to be resolved. It has taken time to select suitable candidates for transcription. In developing this transcription project we looked to other projects – like Transcribe Bentham which was highly specialised, through to projects with much broader audiences. We also looked at transcription undertaken for the John Murray Archive, aimed at non specialists.

The selection criteria we decided upon was for:

  • Hands that are not too troublesome.
  • Manuscripts that have not been re-worked excessively with scoring through, corrections and additions.
  • Documents that are structurally simple – no tables or columns for example where more complex mark-up (tagging) would be required.
  • Subject areas with broad appeal: genealogies, recipe book (in the old crafts of all kinds sense), mountaineering.

Based on our previous John Murray Archive work we also want the crowd to provide us with structure text, so that it can be easily used, by tagging the text. That’s an approach that is borrowed from Transcribe Bentham, but we want our community to be self-correcting rather than doing QA of everything going through. If something is marked as finalised and completed, it will be released with the tool to a wider public – otherwise it is only available within the tool.

The approach could be summed up as keep it simple – and that requires feedback to ensure it really is simple (something we did through a survey). We did user testing on our tool, it particularly confirmed that users just want to go in, use it, and make it intuitive – that’s a problem with transcription and mark up so there are challenges in making that usable. We have a great team who are creative and have come up with solutions for us… But meanwhile other project have emerged. If the REED project is successful in getting machines to read manuscripts then perhaps these tools will become redundant. Right now there is nothing out there or in scope for transcribing manuscripts at scale.

So, lets take a look at Transcribe NLS

You have to login to use the system. That’s mainly to help restrict the appeal to potential malicious or erroneous data. Once you log into the tool you can browse manuscripts, you can also filter by the completeness of the transcription, the grade of the transcription – we ummed and ahhed about including that but we though it was important to include.

Once you pick a text you click the button to begin transcribing – you can enter text, special characters, etc. You can indicate if text is above/below the line. You can mark up where the figure is. You can tag whether the text is not in English. You can mark up gaps. You can mark that an area is a table. And you can also insert special characters. It’s all quite straight forward.

Q&A

Q1) Do you pick the transcribers, or do they pick you?

A1) Anyone can take part but they have to sign up. And they can indicate a query – which comes to our team. We do want to engage with people… As the project evolves we are looking at the resources required to monitor the tool.

Q2) It’s interesting what you were saying about copyright…

A2) The issues of copyright here is about sharing off site. A lot of our manuscripts are unpublished. We use exceptions such as the 1956 Copyright Act for old works whose authors had died. The selection process has been difficult, working out what can go in there. We’ve also cheated a wee bit

Q3) What has the uptake of this been like?

A3) The tool is not yet live. We thin it will build quite quickly – people like a challenge. Transcription is quite addictive.

Q4) Are there enough people with palaeography skills?

A4) I think that most of the content is C19th, where handwriting is the main challenge. For much older materials we’d hit that concern and would need to think about how best to do that.

Q5) You are creating these documents that people are reading. What is your plan for archiving these.

A5) We do have a colleague considering and looking at digital preservation – longer term storage being more the challenge. As part of normal digital preservation scheme.

Q6) Are you going for a Project Gutenberg model? Or have you spoken to them?

A6) It’s all very localised right now, just seeing what happens and what uptake looks like.

Q7) How will this move back into the catalogue?

A7) Totally manual for now. It has been the source of discussion. There was discussion of pushing things through automatically once transcribed to a particular level but we are quite cautious and we want to see what the results start to look like.

Q8) What about tagging with TEI? Is this tool a subset of that?

A8) There was a John Murray Archive, including mark up and tagging. There was a handbook for that. TEI is huge but there is also TEI Light – the JMA used a subset of the latter. I would say this approach – that subset of TEI Light – is essentially TEI Very Light.

Q9) Have other places used similar approaches?

A9) TRanscribe Bentham is similar in terms of tagging. The University of Iowa Civil War Archive has also had a similar transcription and tagging approach.

Q10) The metadata behind this – how significant is that work?

A10) We have basic metadata for these. We have items in our digital object database and simple metadata goes in there – we don’t replicate the catalogue record but ensure it is identifiable, log date of creation, etc. And this transcription tool is intentionally very basic at th emoment.

Coming up later…

Can web archiving the Olympics be an international team effort? Running the Rio Olympics and Paralympics project / Helena Byrne (British Library)

Managing metadata from the present will be explored by Helena Byrne from the British Library, as she describes the global co-ordination of metadata required for harvesting websites for the 2016 Olympics, as part of the International Internet Preservation Consortium’s Rio 2016 web archiving project

Statistical Accounts of Scotland / Vivienne Mayo (EDINA)

Vivienne Mayo from EDINA describes how information from the past has found a new lease of life in the recently re-launched Statistical Accounts of Scotland

Lunch

Beyond bibliographic description: emotional metadata on YouTube / Diane Pennington (University of Strathclyde)

Diane Pennington of Strathclyde University will move beyond the bounds of bibliographic description as she discusses her research about emotions shared by music fans online and how they might be used as metadata for new approaches to search and retrieval

Our 5Rights: digital rights of children and young people / Dev Kornish, Dan Dickson, Bethany Wilson (5Rights Youth Commission)

Young Scot, Scottish Government and 5Rights introduce Scotland’s 5Rights Youth Commission – a diverse group of young people passionate about their digital rights. We will hear from Dan and Bethany what their ‘5Rights’ mean to them, and how children and young people can be empowered to access technology, knowledgeably, and fearlessly.

Playing with metadata / Gavin Willshaw and Scott Renton (University of Edinburgh)

Learn about Edinburgh University Library’s metadata games platform, a crowdsourcing initiative which has improved descriptive metadata and become a vital engagement tool both within and beyond the library. Hear how they have developed their games in collaboration with Tiltfactor, a Dartmouth College-based research group which explores game design for social change, and learn what they’re doing with crowd-sourced data. There may even be time for you to set a new high score…

Managing your Digital Footprint : Taking control of the metadata and tracks and traces that define us online / Nicola Osborne (EDINA)

Find out how personal metadata, social media posts, and online activity make up an individual’s “Digital Footprint”, why they matter, and hear some advice on how to better manage digital tracks and traces. Nicola will draw on recent University of Edinburgh research on students’ digital footprints which is also the subject of the new #DFMOOC free online course.

16:00 Close

Sticking with the game theme, we will be running a small competition on the day, involving cupcakes, book tokens and tweets – come to the event to find out more! You may be lucky enough to win a copy of Managing Metadata in Web-scale Discovery Systems / edited by Louise F Spiteri. London : Facet Publishing, 2016 – list price £55! What more could you ask for as a prize?

The ticket price includes refreshments and a light buffet lunch.

We look forward to seeing you in April!

Share/Bookmark