Somewhere over the Rainbow: our metadata online, past, present & future

Today I’m at the Cataloguing and Indexing Group Scotland event – their 7th Metadata & Web 2.0 event – Somewhere over the Rainbow: our metadata online, past, present & future.

Paul Cunnea, CIGS Chair is introducing the day noting that this is the 10th year of these events: we don’t have one every year but we thought we’d return to our Wizard of Oz theme.

On a practical note, Paul notes that if we have a fire alarm today we’d normally assemble outside St Giles Cathedral but as they are filming The Avengers today, we’ll be assembling elsewhere!

There is also a cupcake competition today – expect many baked goods to appear on the hashtag for the day #cigsweb2. The winner takes home a copy of Managing Metadata in Web-scale Discovery Systems / edited by Louise F Spiteri. London : Facet Publishing, 2016 (list price £55).

Engaging the crowd: old hands, modern minds. Evolving an on-line manuscript transcription project / Steve Rigden with Ines Byrne (not here today) (National Library of Scotland)

 

Ines has led the development of our crowdsourcing side. My role has been on the manuscripts side. Any transcription is about discovery. For the manuscripts team we have to prioritise digitisation so that we can deliver digital surrogates that enable access, and to open up access. Transcription hugely opens up texts but it is time consuming and that time may be better spent on other digitisation tasks.

OCR has issues but works relatively well for printed texts. Manuscripts are a different matter – handwriting, ink density, paper, all vary wildly. The REED(?) project is looking at what may be possible but until something better comes along we rely on human effort. Generally the manuscript team do not undertake manual transcription, but do so for special exhibitions or very high priority items. We also have the challenge that so much of our material is still under copyright so cannot be done remotely (but can be accessed on site). The expected user community generally can be expected to have the skill to read the manuscript – so a digital surrogate replicates that experience. That being said, new possibilities shape expectations. So we need to explore possibilities for transcription – and that’s where crowd sourcing comes in.

Crowd sourcing can resolve transcription, but issues with copyright and data protection still have to be resolved. It has taken time to select suitable candidates for transcription. In developing this transcription project we looked to other projects – like Transcribe Bentham which was highly specialised, through to projects with much broader audiences. We also looked at transcription undertaken for the John Murray Archive, aimed at non specialists.

The selection criteria we decided upon was for:

  • Hands that are not too troublesome.
  • Manuscripts that have not been re-worked excessively with scoring through, corrections and additions.
  • Documents that are structurally simple – no tables or columns for example where more complex mark-up (tagging) would be required.
  • Subject areas with broad appeal: genealogies, recipe book (in the old crafts of all kinds sense), mountaineering.

Based on our previous John Murray Archive work we also want the crowd to provide us with structure text, so that it can be easily used, by tagging the text. That’s an approach that is borrowed from Transcribe Bentham, but we want our community to be self-correcting rather than doing QA of everything going through. If something is marked as finalised and completed, it will be released with the tool to a wider public – otherwise it is only available within the tool.

The approach could be summed up as keep it simple – and that requires feedback to ensure it really is simple (something we did through a survey). We did user testing on our tool, it particularly confirmed that users just want to go in, use it, and make it intuitive – that’s a problem with transcription and mark up so there are challenges in making that usable. We have a great team who are creative and have come up with solutions for us… But meanwhile other project have emerged. If the REED project is successful in getting machines to read manuscripts then perhaps these tools will become redundant. Right now there is nothing out there or in scope for transcribing manuscripts at scale.

So, lets take a look at Transcribe NLS

You have to login to use the system. That’s mainly to help restrict the appeal to potential malicious or erroneous data. Once you log into the tool you can browse manuscripts, you can also filter by the completeness of the transcription, the grade of the transcription – we ummed and ahhed about including that but we though it was important to include.

Once you pick a text you click the button to begin transcribing – you can enter text, special characters, etc. You can indicate if text is above/below the line. You can mark up where the figure is. You can tag whether the text is not in English. You can mark up gaps. You can mark that an area is a table. And you can also insert special characters. It’s all quite straight forward.

Q&A

Q1) Do you pick the transcribers, or do they pick you?

A1) Anyone can take part but they have to sign up. And they can indicate a query – which comes to our team. We do want to engage with people… As the project evolves we are looking at the resources required to monitor the tool.

Q2) It’s interesting what you were saying about copyright…

A2) The issues of copyright here is about sharing off site. A lot of our manuscripts are unpublished. We use exceptions such as the 1956 Copyright Act for old works whose authors had died. The selection process has been difficult, working out what can go in there. We’ve also cheated a wee bit

Q3) What has the uptake of this been like?

A3) The tool is not yet live. We thin it will build quite quickly – people like a challenge. Transcription is quite addictive.

Q4) Are there enough people with palaeography skills?

A4) I think that most of the content is C19th, where handwriting is the main challenge. For much older materials we’d hit that concern and would need to think about how best to do that.

Q5) You are creating these documents that people are reading. What is your plan for archiving these.

A5) We do have a colleague considering and looking at digital preservation – longer term storage being more the challenge. As part of normal digital preservation scheme.

Q6) Are you going for a Project Gutenberg model? Or have you spoken to them?

A6) It’s all very localised right now, just seeing what happens and what uptake looks like.

Q7) How will this move back into the catalogue?

A7) Totally manual for now. It has been the source of discussion. There was discussion of pushing things through automatically once transcribed to a particular level but we are quite cautious and we want to see what the results start to look like.

Q8) What about tagging with TEI? Is this tool a subset of that?

A8) There was a John Murray Archive, including mark up and tagging. There was a handbook for that. TEI is huge but there is also TEI Light – the JMA used a subset of the latter. I would say this approach – that subset of TEI Light – is essentially TEI Very Light.

Q9) Have other places used similar approaches?

A9) TRanscribe Bentham is similar in terms of tagging. The University of Iowa Civil War Archive has also had a similar transcription and tagging approach.

Q10) The metadata behind this – how significant is that work?

A10) We have basic metadata for these. We have items in our digital object database and simple metadata goes in there – we don’t replicate the catalogue record but ensure it is identifiable, log date of creation, etc. And this transcription tool is intentionally very basic at th emoment.

Coming up later…

Can web archiving the Olympics be an international team effort? Running the Rio Olympics and Paralympics project / Helena Byrne (British Library)

Managing metadata from the present will be explored by Helena Byrne from the British Library, as she describes the global co-ordination of metadata required for harvesting websites for the 2016 Olympics, as part of the International Internet Preservation Consortium’s Rio 2016 web archiving project

Statistical Accounts of Scotland / Vivienne Mayo (EDINA)

Vivienne Mayo from EDINA describes how information from the past has found a new lease of life in the recently re-launched Statistical Accounts of Scotland

Lunch

Beyond bibliographic description: emotional metadata on YouTube / Diane Pennington (University of Strathclyde)

Diane Pennington of Strathclyde University will move beyond the bounds of bibliographic description as she discusses her research about emotions shared by music fans online and how they might be used as metadata for new approaches to search and retrieval

Our 5Rights: digital rights of children and young people / Dev Kornish, Dan Dickson, Bethany Wilson (5Rights Youth Commission)

Young Scot, Scottish Government and 5Rights introduce Scotland’s 5Rights Youth Commission – a diverse group of young people passionate about their digital rights. We will hear from Dan and Bethany what their ‘5Rights’ mean to them, and how children and young people can be empowered to access technology, knowledgeably, and fearlessly.

Playing with metadata / Gavin Willshaw and Scott Renton (University of Edinburgh)

Learn about Edinburgh University Library’s metadata games platform, a crowdsourcing initiative which has improved descriptive metadata and become a vital engagement tool both within and beyond the library. Hear how they have developed their games in collaboration with Tiltfactor, a Dartmouth College-based research group which explores game design for social change, and learn what they’re doing with crowd-sourced data. There may even be time for you to set a new high score…

Managing your Digital Footprint : Taking control of the metadata and tracks and traces that define us online / Nicola Osborne (EDINA)

Find out how personal metadata, social media posts, and online activity make up an individual’s “Digital Footprint”, why they matter, and hear some advice on how to better manage digital tracks and traces. Nicola will draw on recent University of Edinburgh research on students’ digital footprints which is also the subject of the new #DFMOOC free online course.

16:00 Close

Sticking with the game theme, we will be running a small competition on the day, involving cupcakes, book tokens and tweets – come to the event to find out more! You may be lucky enough to win a copy of Managing Metadata in Web-scale Discovery Systems / edited by Louise F Spiteri. London : Facet Publishing, 2016 – list price £55! What more could you ask for as a prize?

The ticket price includes refreshments and a light buffet lunch.

We look forward to seeing you in April!

Share/Bookmark

Jisc Digifest 2017 Day Two – LiveBlog

Today I’m still in Birmingham for the Jisc Digifest 2017 (#digifest17). I’m based on the EDINA stand (stand 9, Hall 3) for much of the time, along with my colleague Andrew – do come and say hello to us – but will also be blogging any sessions I attend. The event is also being livetweeted by Jisc and some sessions livestreamed – do take a look at the event website for more details. As usual this blog is live and may include typos, errors, etc. Please do let me know if you have any corrections, questions or comments. 

Part Deux: Why educators can’t live without social media – Eric Stoller, higher education thought-leader, consultant, writer, and speaker.

I’ve snuck in a wee bit late to Eric’s talk but he’s starting by flagging up his “Educators: Are you climbing the social media mountain?” blog post. 

Eric: People who are most reluctant to use social media are often those who are also reluctant to engage in CPD, to develop themselves. You can live without social media but social media is useful and important. Why is it important? It is used for communication, for teaching and learning, in research, in activisim… Social media gives us a lot of channels to do different things with, that we can use in our practice… And yes, they can be used in nefarious ways but so can any other media. People are often keen to see particular examples of how they can use social media in their practice in specific ways, but how you use things in your practice is always going to be specific to you, different, and that’s ok.

So, thinking about digital technology… “Digital is people” – as Laurie Phipps is prone to say… Technology enhanced learning is often tied up with employability but there is a balance to be struck, between employability and critical thinking. So, what about social media and critical thinking? We have to teach students how to determine if an online source is reliable or legitimate – social media is the same way… And all of us can be caught out. There was piece in the FT about the chairman of Tesco saying unwise things about gender, and race, etc. And I tweeted about this – but I said he was the CEO – and it got retweeted and included in a Twitter moment… But it was wrong. I did a follow up tweet and apologised but I was contributing to that..

Whenever you use technology in learning it is related to critical thinking so, of course, that means social media too. How many of us here did our educational experience completely online… Most of us did our education in the “sage on the stage” manner, that’s what was comfortable for us… And that can be uncomfortable (see e.g. tweets from @msementor).

If you follow the NHS on Twitter (@NHS) then you will know it is phenomenal – they have a different member of staff guest posting to the account. Including live tweeting an operation from the theatre (with permissions etc. of course) – if you are medical student this would be very interesting. Twitter is the delivery method now but maybe in the future it will be Hololens or Oculus Rift Live or something. Another thing I saw about a year ago was Phil Baty (Inside Higher Ed – @Phil_Baty) talked about Liz Barnes revealing that every academic at Staffordshire will use social media and will build it into performance management. That really shows that this is an organisation that is looking forward and trying new things.

Any of you take part in the weekly #LTHEchat. They were having chats about considering participation in that chat as part of staff appraisal processes. That’s really cool. And why wouldn’t social media and digital be a part of that.

So I did a Twitter poll asking academics what they use social media for:

  • 25% teaching and learning
  • 26% professional development
  • 5% research
  • 44% posting pictures of cats

The cool thing is you can do all of those things and still be using it in appropriate educational contexts. Of course people post pictures of cats.. Of course you do… But you use social media to build community. It can be part of building a professional learning environment… You can use social media to lurk and learn… To reach out to people… And it’s not even creepy… A few years back and I could say “I follow you” and that would be weird and sinister… Now it’s like “That’s cool, that’s Twitter”. Some of you will have been using the event hashtag and connecting there…

Andrew Smith, at the Open University, has been using Facebook Live for teaching. How many of your students use Facebook? It’s important to try this stuff, to see if it’s the right thing for your practice.

We all have jobs… Usually when we think about networking and professional networking we often think about LinkedIn… Any of you using LinkedIn? (yes, a lot of us are). How about blogging on LinkedIn? That’s a great platform to blog in as your content reaches people who are really interested. But you can connect in all of these spaces. I saw @mdleast tweeting about one of Anglia Ruskin’s former students who was running the NHS account – how cool is that?

But, I hear some of you say, Eric, this blurs the social and the professional. Yes, of course it does. Any of you have two Facebook accounts? I’m sorry you violate the terms of service… And yes, of course social media blurs things… Expressing the full gamut of our personality is much more powerful. And it can be amazing when senior leaders model for their colleagues that they are a full human, talking about their academic practice, their development…

Santa J. Ono (@PrezOno/@ubcprez) is a really senior leader but has been having mental health difficulties and tweeting openly about that… And do you know how powerful that is for his staff and students that he is sharing like that?

Now, if you haven’t seen the Jisc Digital Literacies and Digital Capabilities models? You really need to take a look. You can use these to use these to shape and model development for staff and students.

I did another poll on Twitter asking “Agree/Disagree: Universities must teach students digital citizenship skills” (85% agree) – now we can debate what “digital citizenship” means… If any of you have ever gotten into it with a troll online? Those words matter, they effect us. And digital citizenship matter.

I would say that you should not fall in love with digital tools. I love Twitter but that’s a private company, with shareholders, with it’s own issues… And it could disappear tomorrow… And I’d have to shift to another platform to do the things I do there…

Do any of you remember YikYak? It was an anonymous geosocial app… and it was used controversially and for bullying… So they introduced handles… But their users rebelled! (and they reverted)

So, Twitter is great but it will change, it will go… Things change…

I did another Twitter poll – which tools do your students use on a daily basis?

  • 34% snapchat
  • 9% Whatsapp
  • 19% Instagram
  • 36% use all of the above

A lot of people don’t use Snapchat because they are afraid of it… When Facebook first appeared that response was it’s silly, we wouldn’t use it in education… But we have moved that there…

There is a lot of bias about Snapchat. @RosieHare posted “I’m wondering whether I should Snapchat #digifest17 next week or whether there’ll be too many proper grown ups there who don’t use it.” Perhaps we don’t use these platforms yet, maybe we’ll catch up… But will students have moved on by then… There is a professor in the US who was using Snapchat with his students every day… You take your practice to where your students are. According to global web index (q2-3 2016) over 75% of teens use Snapchat. There are policy challenges there but students are there every day…

Instagram – 150 M people engage with daily stories so that’s a powerful tool and easier to start with than Snapchat. Again, a space where our students are.

But perfection leads to stagnation. You have to try and not be fixated on perfection. Being free to experiment, being rewarded for trying new things, that has to be embedded in the culture.

So, at the end of the day, the more engaged students are with their institution – at college or university – the more successful they will be. Social media can be about doing that, about the student experience. All parts of the organisation can be involved. There are so many social media channels you can use. Maybe you don’t recognise them all… Think about your students. A lot will use WhatsApp for collaboration, for coordination… Facebook Messenger, some of the asian messaging spaces… Any of you use Reddit? Ah, the nerds have arrived! But again, these are all spaces you can develop your practice in.

The web used to involve having your birth year in your username (e.g. @purpledragon1982), it was open… But we see this move towards WhatsApp, Facebook Messenger, WeChat, these different types of spaces and there is huge growth predicted this year. So, you need to get into the sandbox of learning, get your hands dirty, make some stuff and learn from trying new things #alldayeveryday

Q&A

Q1) What audience do you have in mind… Educators or those who support educators? How do I take this message back?

A1) You need to think about how you support educators, how you do sneaky teaching… How you do that education… So.. You use the channels, you incorporate the learning materials in those channels… You disseminate in Medium, say… And hopefully they take that with them…

Q2) I meet a strand of students who reject social media and some technology in a straight edge way… They are in the big outdoors, they are out there learning… Will they not be successful?

A2) Of course they will. You can survive, you can thrive without social media… But if you choose to engage in those channels and spaces… You can be succesful… It’s not an either/or

Q3) I wanted to ask about something you tweeted yesterday… That Prensky’s idea of digital natives/immigrants is rubbish…

A3) I think I said “#friendsdontletfriendsprensky”. He published that over ten years ago – 2001 – and people grasped onto that. And he’s walked it back to being about a spectrum that isn’t about age… Age isn’t a helpful factor. And people used it as an excuse… If you look at Dave White’s work on “visitors and residents” that’s much more helpful… Some people are great, some are not as comfortable but it’s not about age. And we do ourselves a disservice to grasp onto that.

Q4) From my organisation… One of my course leaders found their emails were not being read, asked students what they should use, and they said “Instagram” but then they didn’t read that person’s posts… There is a bump, a challenge to get over…

A4) In the professional world email is the communications currency. We say students don’t check email… Well you have to do email well. You send a long email and wonder why students don’t understand. You have to be good at communicating… You set norms and expectations about discourse and dialogue, you build that in from induction – and that can be email, discussion boards and social media. These are skills for life.

Q5) You mentioned that some academics feel there is too much blend between personal and professional. From work we’ve done in our library we find students feel the same way and don’t want the library to tweet at them…

A5) Yeah, it’s about expectations. Liverpool University has a brilliant Twitter account, Warwick too, they tweet with real personality…

Q6) What do you think about private social communities? We set up WordPress/BuddyPress thing for international students to push out information. It was really varied in how people engaged… It’s private…

A6) Communities form where they form. Maybe ask them where they want to be communicated with. Some WhatsApp groups flourish because that’s the cultural norm. And if it doesn’t work you can scrap it and try something else… And see what

Q7) I wanted to flag up a YikYak study at Edinburgh on how students talk about teaching, learning and assessment on YikYak, that started before the handles were introduced, and has continued as anonymity has returned. And we’ll have results coming from this soon…

A7) YikYak may rise and fall… But that functionality… There is a lot of beauty in those anonymous spaces… That functionality – the peers supporting each other through mental health… It isn’t tools, it’s functionality.

Q8) Our findings in a recent study was about where the students are, and how they want to communicate. That changes, it will always change, and we have to adapt to that ourselves… Do you want us to use WhatsApp or WeChat… It’s following the students and where they prefer to communicate.

A8) There is balance too… You meet students where they are, but you don’t ditch their need to understand email too… They teach us, we teach them… And we do that together.

And with that, we’re out of time… 

Share/Bookmark

Jisc Digifest 2017 Day One – LiveBlog

Liam Earney is introducing us to the day, with the hope that we all take some away from the event – some inspiration, an idea, the potential to do new things. Over the past three Digifest events we’ve taken a broad view. This year we focus on technology expanding, enabling learning and teaching.

LE: So we will be talking about questions we asked through Twitter and through our conference app with our panel:

  • Sarah Davies, head of change implementation support – education/student, Jisc
  • Liam Earney, director of Jisc Collections
  • Andy McGregor, deputy chief innovation officer, Jisc
  • Paul McKean, head of further education and skills, Jisc

Q1: Do you think that greater use of data and analytics will improve teaching, learning and the student experience?

  • Yes 72%
  • No 10%
  • Don’t Know 18%

AM: I’m relieved at that result as we think it will be important too. But that is backed up by evidence emerging in the US and Australia around data analytics use in retention and attainment. There is a much bigger debate around AI and robots, and around Learning Analytics there is that debate about human and data, and human and machine can work together. We have several sessions in that space.

SD: Learning Analytics has already been around it’s own hype cycle already… We had huge headlines about the potential about a year ago, but now we are seeing much more in-depth discussion, discussion around making sure that our decisions are data informed.. There is concern around the role of the human here but the tutors, the staff, are the people who access this data and work with students so it is about human and data together, and that’s why adoption is taking a while as they work out how best to do that.

Q2: How important is organisational culture in the successful adoption of education technology?

  • Total make or break 55%
  • Can significantly speed it up or slow it down 45%
  • It can help but not essential 0%
  • Not important 0%

PM: Where we see education technology adopted we do often see that organisational culture can drive technology adoption. An open culture – for instance Reading College’s open door policy around technology – can really produce innovation and creative adoption, as people share experience and ideas.

SD: It can also be about what is recognised and rewarded. About making sure that technology is more than what the innovators do – it’s something for the whole organisation. It’s not something that you can do in small pockets. It’s often about small actions – sharing across disciplines, across role groups, about how technology can make a real difference for staff and for students.

Q3: How important is good quality content in delivering an effective blended learning experience?

  • Very important 75%
  • It matters 24%
  • Neither 1%
  • It doesn’t really matter 0%
  • It is not an issue at all 0%

LE: That’s reassuring, but I guess we have to talk about what good quality content is…

SD: I think materials – good quality primary materials – make a huge difference, there are so many materials we simply wouldn’t have had (any) access to 20 years ago. But also about good online texts and how they can change things.

LE: My colleague Karen Colbon and I have been doing some work on making more effective use of technologies… Paul you have been involved in FELTAG…

PM: With FELTAG I was pleased when that came out 3 years ago, but I think only now we’ve moved from the myth of 10% online being blended learning… And moving towards a proper debate about what blended learning is, what is relevant not just what is described. And the need for good quality support to enable that.

LE: What’s the role for Jisc there?

PM: I think it’s about bringing the community together, about focusing on the learner and their experience, rather than the content, to ensure that overall the learner gets what they need.

SD: It’s also about supporting people to design effective curricula too. There are sessions here, talking through interesting things people are doing.

AM: There is a lot of room for innovation around the content. If you are walking around the stands there is a group of students from UCL who are finding innovative ways to visualise research, and we’ll be hearing pitches later with some fantastic ideas.

Q4: Billions of dollars are being invested in edtech startups. What impact do you think this will have on teaching and learning in universities and colleges?

  • No impact at all 1%
  • It may result in a few tools we can use 69%
  • We will come to rely on these companies in our learning and teaching 21%
  • It will completely transform learning and teaching 9%

AM: I am towards the 9% here, there are risks but there is huge reason for optimism here. There are some great companies coming out and working with them increases the chance that this investment will benefit the sector. Startups are keen to work with universities, to collaborate. They are really keen to work with us.

LE: It is difficult for universities to take that punt, to take that risk on new ideas. Procurement, governance, are all essential to facilitating that engagement.

AM: I think so. But I think if we don’t engage then we do risk these companies coming in and building businesses that don’t take account of our needs.

LE: Now that’s a big spend taking place for that small potential change that many who answered this question perceive…

PM: I think there are saving that will come out of those changes potentially…

AM: And in fact that potentially means saving money on tools we currently use by adopting new, and investing that into staff..

Q5: Where do you think the biggest benefits of technology are felt in education?

  • Enabling or enhancing learning and teaching activities 55%
  • In the broader student experience 30%
  • In administrative efficiencies 9%
  • It’s hard to identify clear benefits 6%

SD: I think many of the big benefits we’ve seen over the last 8 years has been around things like online timetables – wider student experience and administrative spaces. But we are also seeing that, when used effectively, technology can really enhance the learning experience. We have a few sessions here around that. Key here is digital capabilities of staff and students. Whether awareness, confidence, understanding fit with disciplinary practice. Lots here at Digifest around digital skills. [sidenote: see also our new Digital Footprint MOOC which is now live for registrations]

I’m quite surprised that 6% thought it was hard to identify clear benefits… There are still lots of questions there, and we have a session on evidence based practice tomorrow, and how evidence feeds into institutional decision making.

PM: There is something here around the Apprentice Levy which is about to come into place. A surprisingly high percentage of employers aren’t aware that they will be paying that actually! Technology has a really important role here for teaching, learning and assessment, but also tracking and monitoring around apprenticeships.

LE: So, with that, I encourage you to look around, chat to our exhibitors, craft the programme that is right for you. And to kick that off here is some of the brilliant work you have been up to. [we are watching a video – this should be shared on today’s hashtag #digifest17]

Share/Bookmark

Making Edinburgh the First Global City of Learning – Prof. Jonathan Silvertown Liveblog

This afternoon I am delighted to be at the Inaugeral Lecture of Prof. Jonathan Silvertown from the School of Biological Sciences here at the University of Edinburgh.

Vice Chancellor Tim O’Shea is introducing Jonathan, who is Professor of Evolutionary Ecology and Chair in Technology Enhanced Science Education, and who came to Edinburgh from the Open University.

Now to Jonathan:

Imagine an entire city turned into an interactive learning environment. Where you can learn about the birds in the trees, the rock beneath your feet. And not just learn about them, but contribute back to citizen science, to research taking place in and about the city. I refer to A City of Learning… As it happens Robert Louis Stevenson used to do something similar, carrying two books in their pocket: one for reading, one for writing. That’s the idea here. Why do this in Edinburgh? We have the most fantastic history, culture and place.

Edinburgh has an increadible history of enlightenment, and The Enlightenment. Indeed it was said that you could, at one point, stand on the High Street and shake the hands of 50 men of genius. On the High Street now you can shake Hume (his statue) by the toe and I shall risk quoting him: “There is nothing to be learned from a professor which is not to be met within books”. Others you might have met then include Joseph Black, and also James Hutton, known as the “father of modern geology” and he walked up along the crags and a section now known as “Huttons section” (an unconformity to geologists) where he noted sandstone, and above it volcanic rock. He interpreted this as showing that rocks accumulate by ongoing processes that can be observed now. That’s science. You can work out what happened in the past by understanding what is happening now. And from that he concluded that the earth was more than 6000 years old, as Bishop Usher had calculated. In his book The Theory of the Earth he coined this phrase “No vestige of a beginning, no prospect of an end”. And that supported the emerging idea of evolutionary biology which requires a long history to work. That all happened in Edinburgh.

Edinburgh also has a wealth of culture. It is (in the New Town) a UNESCO World Heritage site. Edinburgh has the Fringe Festival, the International Festival, the Book Festival, the Jazz Festival… And then there is the rich literary heritage of Edinburgh – as J.K. Rowling says “Its impossible to live in Edinburgh without sensing it’s literary heritage”. Indeed if you walk in the Meadows you will see a wall painting celebrating The Prime of Miss Jean Brodie. And you can explore this heritage yourself through the LitLong Website and App. He took thousands of books with textmining and a gazeteer of Edinburgh Places, extracting 40,000 snippets of text associated with pinpoints on the map. And you can do this on an app on your phone. Edinburgh is an extraordinary place for all sorts of reasons…

And a place has to be mapped. When you think of maps these days, you tend to think of Google. But I have something better… Open Street Map is to a map what Wikipedia is to the Encyclopedia Britannica. So, when my wife and I moved into a house in Edinburgh which wasn’t on Ordnance Survey, wasn’t on Google Maps, but was almost immediately on OpenStreetMap. It’s Open because there are no restrictions on use so we can use it in our work. Not all cities are so blessed… Geographic misconceptions are legion, if you look at one of th emaps in the British Library you will see the Cable and Wireless Great Circle Map – a map that is both out of date and prescient. It is old and outdated but does display the cable and wireless links across the world… The UK isn’t the centre of the globe as this map shows, wherever you are standing is the centre of the globe now. And Edinburgh is international. At least year’s Edinburgh festival the Deep Time event projected the words “Welcome, World” just after the EU Referendum. Edinburgh is a global city, University of Edinburgh is a global university.

Before we go any further I want to clarify what I mean by learning when I talk about making a city of learning… Kolb (1984) is “How we transform experience into knowledge”, it is learning by discovery. And, wearing my evolutionary hat, it’s a major process of human adaptation. Kolb’s learning cycle takes us from Experience, to Reflect (observe), Conceptualise (Ideas), Experiment (Test), and back to Experience. It is of course also the process of scientific discovery.

So, lets apply that cycle of learning to iSpot, to show how that experiential learning and discovery and what extraordinary things that can do. iSpot is designed to crowdsource the identification of organisms (see Silvertown, Harvey, Greenwood, Dodd, Rosewell, Rebelo, Ansine, McConway 2015). If I see “a white bird” it’s not that exciting, but if I know its a Kittywake then that’s interesting – has it been seen before? Are they nesting elsewhere? You can learn more from that. So you observe an orgnism, you reflect, you start to get comment from others.

So, we have over 60,000 registered users of iSpot, 685k observations, 1.3 million photos, and we have identified over 30,000 species. There are many many stories contained within that. But I will share one of these. So this observation came in from South Africa. It was a picture of some seeds with a note “some children in Zululand just ate some of these seeds and are really ill”. 35 seconds later someone thousands of miles away in Capetown, others agreed on the id. And the next day the doctor who posted the image replied to say that the children were ok, but that it happens a lot and knowing what plant they were from helps them to do something. It wasn’t what we set this up to do but that’s a great thing to happen…

So, I take forward to this city of learning, the lessons of a borderless community; the virtuous circle of learning which empowers and engages people to find out more; and encourage repurposing – use the space as they want and need (we have added extra functions to support that over time in iSpot).

Learning and discovery lends itself to research… So I will show you two projects demonstrating this which gives us lessons to take forward into Edinburgh City of Learning. Evolution Megalab.org was created at the Open University to mark Darwins double centenary in 2009, but we also wanted to show that evolution is happening right now in your own garden… So the snails in your garden have colours and banding patterns, and they have known genetic patterns… And we know about evolution in the field. We know what conditions favour which snails. So, we asked the public to help us test the hypothesis about the snails. So we had about 10,000 populations of snails captured, half of which was there already, half of which was contributed by citizens over a single year. We had seen, over the last 50 years, an increase in yellow shelled snails which do not warm up too quickly. We would expect brown snails further north, yellow snails further south. So was that correct? Yes and No. There was an increase in sanddunes, but not elsewhere. But we also saw a change in patterns of banding patterns, and we didn’t know why… So we went back to pre Megalab data and that issue was provable before, but hadn’t previously been looked for.

Lessons from Megalab included that all can contribute, that it must be about real science and real questions, and that data quality matters. If you are ingenious about how you design your project, then all people can engage and contribute.

Third project, briefly, this is Treezilla, the monster map of trees – which we started in 2014 just before I came here – and the idea is that we have a map of the identity, size and location of trees and, with that, we can start to look at ecosystem impact of these trees, they capture carbon, they can ameliorate floods… And luckily my colleague Mike Dodd spotted some software that could be used to make this happen. So one of the lessons here is that you should build on existing systems, building projects on top of projects, rather than having to happen at the same time.

So, this is the Edinburgh Living Lab, and this is a collaboration between schools and the kinds of projects they do include bike counters and traffic – visualised and analysed – which gives the Council information on traffic in a really immediate way that can allow them to take action. This set of projects around the Living Lab really highlighted the importance of students being let loose on data, on ideas around the city. The lessons here is that we should be addressing real world problems, public engagement is an important part of this, and we are no longer interdisiplinary, we are “post disciplinary” – as is much of the wider world of work and these skills will go with these students from the Living Lab for instance.

And so to Edinburgh Cityscope, a project with synergy across learning, research and engagement. Edinburgh Cityscope is NOT an app, it is an infrastructure. It is the stuff out of which other apps and projects will be built.

So, the first thing we had to do was made Cityscope futureproof. When we built iSpot the iPhone hadn’t been heard of, now maybe 40% of you here have one. And we’ve probably already had peak iPhone. We don’t know what will be used in 5 years time. But there are aspects they will always need… They will need Data. What kinds of data? For synergy and place we need maps. And maps can have layers – you can relate the nitrogen dioxide to traffic, you can compare the trees…. So Edinburgh Cityscope is mapable. And you need a way to bring these things together, you need a workbench. Right now that includes Jupyter, but we are not locked in, so we can change in future if we want to. And we have our data and our code open on Github. And then finally you need to have a presentation layer – a place to disseminate what we do to our students and colleagues, and what they have done.

So, in the last six months we’ve made progress in data – using Scottish Government open data portal we have Lung Cancer registrations that can be mapped and changes seen. We can compare and investigate and our students can do that. We have the SIMD (Scottish Index of Multiple Deprivation) map… I won’t show you a comparison as it has hardly changed in decades – one area has been in poverty since around 1900. My colleague Leslia McAra is working in public engagement, with colleagues here, to engage in ways that make this better, that makes changes.

The workbench has been built. It isn’t pretty yet… You can press a button to create a Notebook. You can send your data to a phone app – pulling data from Cityscope and show it in an app. You can start a new tour blog – which anybody can do. And you create a survey for used for new information…

So let me introduce one of these apps. Curious Edinburgh is an app that allows you to learn about the history of science in Edinburgh, to explore the city. The genius idea – and I can say genius because I didn’t build it, Niki and the folks at EDINA did – is that you can create this tour from a blog. You fill in forms essentially. And there is an app which you can download for iOS, and a test version for Android – full one coming for the Edinburgh International Science Festival in April. Because this is an Edinburgh Cityscope project I’ve been able to use the same technology to create a tour of the botanical gardens for use in my teaching. We used to give out paper, now we have this app we can use in teaching, in teaching in new ways… And I think this will be very popular.

And the other app we have is Fieldtrip, a survey tool borrowed from EDINA’s FieldTrip Open. And that allows anyone to set up a data collection form – for research, for social data, for whatever. It is already open, but we are integrating this all into Edinburgh Cityscope.

So, this seems a good moment to talk about the funding for this work. We have had sizable funding from Information Services. The AHRC has funded some of the Curious Edinburgh work, and ESRC have funded work which a small part of which Edinburgh Cityscope will be using in building the community.

So, what next? We are piloting Cityscope with students – in the Festival of Creative Learning this week, in Informatics. And then we want to reach out to form a community of practice, including schools, community groups and citizens. And we want to connect with cultural institutions and industry – already working with the National Museum of Scotland. And we want to interface with the Internet of Things – anything with a chip in it really. You can interact with your heating systems from anywhere in the world – that’s the internet of things, things connected to the web. And I’m keen on creating an Internet of Living Things. The Atlas of Living Scotland displays all the biological data of Scotland on the map. But data gets out of date. It would be better to updated in real time. So my friend Kate Jones from UCL is working with Intel creating real time data from bats – allowing real time data to be captured through connected sensors. And also in that space Graham Stone (Edinburgh) is working on a project called Edinburgh Living Landscape which is about connecting up green spaces, improve biodiversity…

So, I think what we should be going for is for recognition of Edinburgh as the First UNESCO City of Learning. Edinburgh was the first UNESCO City of Literature and the people who did that are around, we can make our case for our status as City of Learning in much the same way.

So that’s pretty much the end. Nothing like this happens without lots and lots of help. So a big thanks here to Edinburgh Cityscope’s steering group and the many people in Information Services who have been actually building it.

And the final words are written for me: Four Quartets, T.S. Eliot:

“We shall not cease from exploration

And the end of all our exploring 

Will be to arrive where we started

And know the place for the first time”

Share/Bookmark

ETAG Digital Solutions for Tourism Conference 2016

This morning I’m at the Edinburgh Tourism Action Group’s Digital Solutions for Tourism Conference 2016. Why am I along? Well EDINA has been doing some really interesting cultural heritage projects for years

Introduction James McVeigh, Head of Marketing and Innovation, Festivals Edinburgh

Welcome to our sixth Digital Solutions for Tourism Conference. In those last six years a huge amount has changed, and our programme reflects that, and will highlight much of the work in Edinburgh, but also picking up what is taking place in the wider world, and rolling out to the wider world.

So, we are in Edinburgh. The home of the world’s first commercially available mobile app – in 1999. And did you also know that Edinburgh is home to Europe’s largest tech incubator? Of course you do!

Welcome Robin Worsnop, Rabbie’s Travel, Chair, ETAG

We’ve been running these for six years, and it’s a headline event in the programme we run across the city. In the past six years we’ve seen technology move from business add on to fundamental to what we do – for efficiency, for reach, for increased revenue, and for disruption. Reflecting that change this event has grown in scope and popularity. In the last six years we’ve had about three and a half thousand people at these events. And we are always looking for new ideas for what you want to see here in future.

We are at the heart of the tech industry here too, with Codebase mentioned already, Sky Scanner, and the School of Informatics at the University of Edinburgh all of which attracts people to the city. As a city we have free wifi around key cultural venues, on the buses, etc. It is more and more ubiquitous for our tourists to have access to free wifi. And technology is becoming more and more about how those visitors enhance their visit and experience of the city.

So, we have lots of fantastic speakers today, and I hope that you enjoy them and you take back lots of ideas and inspiration to take back to your businesses.

What is new in digital and what are the opportunities for tourism Brian Corcoran, Director, Turing Festival

There’s some big news for the tech scene in Edinburgh today: SkyScanner have been brought by a Chinese company for 1.5bn. And FanDual just merged with its biggest rival last week. So huge things are happening.

So, I thought today technology trends and bigger trends – macro trends – might be useful today. So I’ll be looking at this through the lens of the companies shaping the world.

Before I do that, a bit about me, I have a background in marketing and especially digital marketing. And I am director of the Turing Festival – the biggest technology festival in Scotland which takes place every August.

So… There are really two drivers of technology… (1) tech companies and (2) users. I’m going to focus on the tech companies primarily.

The big tech companies right now include: Uber, disrupting the transport space; Netflix – for streaming and content commissioning; Tesla – dirupting transport and energy usage; Buzzfeed – influential with huge readership; Spotify – changing music and music payments; banking… No-one has yet dirupted banking but they will soon… Maybe just parts of banking… we shall see.

And no-one is influencing us more than the big five. Apple, mainly through the iPhone. I’ve been awaiting a new MacBook for five years… Apple are moving computing PCs for top end/power users, but also saying most users are not content producers, they are passive users – they want/expect us to move to iPads. It’s a mobile device (running iOS) and a real shift. iPhone 7 got coverage for headphones etc. but cameras didn’t get much discussion, but it is basically set up for augmented reality with two cameras. Air Pods – the cable-less headphones – is essentially a new wearable, like/after the iWatch. And we are also seeing Siri opening up.

Over at Google… Since Google’s inception the core has been search and the Google search index and ranking. And they are changing it for the first time ever really… And building a new one… They are building a Mobile-only search index. They aren’t just building that they are prioritising it. Mobile is really the big tech trend. And in line with that we have their Pixel phone – a phone they are manufacturing themselves… That’s getting them back into wearables after their Google Glass misstep. And Google Assistant is another part of the Pixel phone – a Siri competitor… Another part of us interacting with phones, devices, data, etc. in a new way.

Microsoft is one of the big five that some thing shouldn’t be there… They have made some missteps… They missed the internet. They missed – and have written off phones (and Nokia). But they have moved to Surface – another mobile device. They have abandoned Windows and moved to Microsoft 365. They brought LinkedIn for £26bn (in cash!). One way this could effect us… LinkedIn has all this amazing data… But it is terrible at monetising it. That will surely change. And then we have HoloLens – which means we may eventually have some mixed reality actually happening.

Next in the Big Five is Amazon. Some very interesting things there… We have Alexa – the digital assistant service here. They have, as a device, Echo – essentially a speaker and listening device for your home/hotel etc. Amazon will be in your home listening to you all the time… I’m not going to get there! And we have Amazon Prime… And also Prime Instant Video. Amazon moving into television. Netflix and Amazon compete with each other, but more with traditional TV. And moving from Ad income to subscriptions. Interesting to think where TV ad spend will go – it’s about half of all ad spend.

And Facebook. They are at ad saturation risk, and pushing towards video ads. With that in mind they may also become defacto TV platform. Do they have new editorial responsibility? With Fake News etc. are they a tech company? Are they a media company? At the same time they are caving completely to Chinese state surveillance requests. And Facebook are trying to diversify their ecosystem so they continue to outlast their competitors – with Instagram, WhatsApp, Oculus, etc.

So, that’s a quick look at tech companies and what they are pushing towards. For us, as users the big moves have been towards messaging – Line, Wiichat, Messaging, WhatsApp, etc. These are huge and there has been a big move towards messaging. And that’s important if we are trying to reach the fabled millennials as our audience.

And then we have Snapchat. It’s really impenetrable for those under 30. They have 150 Daily Active Users, they have 1 bn snaps daily, 10bn videos daily. They are the biggest competitor to Facebook, to ad revenue. They have also gone for wearables – in a cheeky cool upstart way.

So, we see 10 emergent patterns:

  1. Mobile is now *the* dominant consumer technology, eclipsing PCs. (Apple makes more from the iPhone than all their other products combined, it is the most successful single product in history).
  2. Voice is becoming in an increasingly important UI. (And interesting how answers there connect to advertising).
  3. Wearables bring tech into ever-closer physical and psychological proximity to us. It’s now on our wrist, or face… Maybe soon it will be inside you…
  4. IoT is getting closer, driven by the intersection of mobile, wearables, APIs and voice UI. Particularly seeing this in smart home tech – switching the heat on away from home is real (and important – it’s -3 today), but we may get to that promised fridge that re-orders…
  5. Bricks and mortar retail is under threat, and although we have some fulfillment challenges, they will be fixed.
  6. Messaging marks generational shift in communification preferences – asynchronous prferred
  7. AR and VR will soon be commonplace in entertainment – other use cases will follow… But things can take time. Apple watch went from unclear use case to clear health, sports, etc. use case.
  8. Visual cmmunications and replacing textural ones for millenials: Snapchat defines that.
  9. Media is increasingly in the hands of tech companies – TV ads will be disrupted (Netflix etc.)
  10. TV and ad revenue will move to Facebook, Snapchat etc.

What does this all mean?

Mobile is crucial:

  • Internet marketing in tourism now must be mobile-centric
  • Ignore Google mobile index at your peril
  • Local SEO is increasing in importance – that’s a big opportunity for small operators to get ahead.
  • Booking and payments must be designed for mobile – a hotel saying “please call us”, well Millennials will just say no.

It’s unclear where new opportunities will be, but they are coming. In Wearables we see things like twoee – wearable watches as key/bar tab etc. But we are moving to a more seamless place.

Augmented reality is enabling a whole new set of richer, previously unavailable interactive experiences. Pokemon Go has opened the door to location-based AR games. That means previously unexciting places can be made more engaging.

Connectivity though, that is also a threat. The more mobile and wearables become conduits to cloud services and IoT, the more the demand for free, flawless internet connectivity will grow.

Channels? Well we’ve always needed to go where the market it. It’s easier to identify where they are now… But we need to adapt to customers behaviours and habits, and their preferences.

Moore’s law: overall processing power for computers will double every two year (Gordon Moore, INTEL, 1965)… And I wonder if that may also be true for us too.

Coming up…

Shine the Light – Digital Sector (5 minutes each) 

Joshua Ryan-Saha, The Data Lab – data for tourism

Brian Smillie, Beezer – app creation made affordable and easy

Ben Hutton, XDesign – is a mobile website enough?

Chris Torres, Director, Senshi Digital – affordable video

Case Study – Global Treasure Apps and Historic Environment Scotland Lorraine Sommerville and Noelia Martinez, Global Treasure Apps

Apps that improve your productivity and improve your service Gillian Jones, Qikserve

Virtual reality for tourism Alexander Cole, Peekabu Studios

Using Data and Digital for Market Intelligence for Destinations and Businesses Michael Kessler, VP Global Sales, Review Pro

Tech Trends and the Tourism Sector

Jo Paulson, Edinburgh Zoo and Jon-Paul Orsi, Edinburgh Zoo – Pokemon Go

Rob Cawston, National Museum of Scotland – New Galleries and Interactive Exhibitions

Wrap Up James McVeigh, Festivals Edinburgh

 

Share/Bookmark

Association of Internet Researchers AoIR2016: Day 4

Today is the last day of the Association of Internet Researchers Conference 2016 – with a couple fewer sessions but I’ll be blogging throughout.

As usual this is a liveblog so corrections, additions, etc. are welcomed. 

PS-24: Rulemaking (Chair: Sandra Braman)

The DMCA Rulemaking and Digital Legal Vernaculars – Olivia G Conti, University of Wisconsin-Madison, United States of America

Apologies, I’ve joined this session late so you miss the first few minutes of what seems to have been an excellent presentation from Olivia. 

Property and ownership claims made of distinctly American values… Grounded in general ideals, evocations of the Bill of Rights. Or asking what Ben Franklin would say… Bringing the ideas of the DCMA as being contrary to the very foundations of the United Statements. Another them was the idea of once you buy something you should be able to edit as you like. Indeed a theme here is the idea of “tinkering and a liberatory endeavour”. And you see people claiming that it is a basic human right to make changes and tinker, to tweak your tractor (or whatever). Commentators are not trying to appeal to the nation state, they are trying to perform the state to make rights claims to enact the rights of the citizen in a digital world.

So, John Deere made a statement that tractro buyers have an “implied license” to their tractor, they don’t own it out right. And that raised controversies as well.

So, the final register rule was that the farmers won: they could repair their own tractors.

But the vernacular legal formations allow us to see the tensions that arise between citizens and the rights holders. And that also raises interesting issues of citizenship – and of citizenship of the state versus citizenship of the digital world.

The Case of the Missing Fair Use: A Multilingual History & Analysis of Twitter’s Policy Documentation – Amy Johnson, MIT, United States of America

This paper looks at the multilingual history and analysis of Twitter’s policy documentation. Or policies as uneven scalar tools of power alignment. And this comes from the idea of thinking of the Twitter as more than just the whole complete overarching platform. There is much research now on moderation, but understanding this type of policy allows you to understand some of the distributed nature of the platforms. Platforms draw lines when they decide which laws to tranform into policies, and then again when they think about which policies to translate.

If you look across at a list of Twitter policies, there is an English language version. Of this list it is only the Fair Use policy and the Twitter API limits that appear only in English. The API policy makes some sense, but the Fair Use policy does not. And Fair Use only appears really late – in 2014. It sets up in 2005, and many other policies come in in 2013… So what is going on?

So, here is the Twitter Fair Use Policy… Now, before I continue here, I want to say that this translation (and lack of) for this policy is unusual. Generally all companies – not just tech companies – translate into FIGS: French, Italian, German, Spanish languages. And Twitter does not do this. But this is in contrast to the translations of the platform itself. And I wanted to talk in particularly about translations into Japanese and Arabic. Now the Japanese translation came about through collaboration with a company that gave it opportunities to expand out into Japen. Arabic is not put in place until 2011, and around the Arab Spring. And the translation isn’t doen by Twitter itself but by another organisaton set up to do this. So you can see that there are other actors here playing into translations of platform and policies. So this iconic platforms are shaped in some unexpected ways.

So… I am not a lawyer but… Fair Use is a phenomenon that creates all sorts of internet lawyering. And typically there are four factors of fair use (Section 107 of US Copyright Act of 1976): purpose and character of use; nature of copyright work; amount and substantiality of portion used; effect of use on potential market for or value of copyright work. And this is very much an american law, from a legal-economic point of view. And the US is the only country that has Fair Use law.

Now there is a concept of “Fair Dealing” – mentioned in passing in Fair Use – which shares some characters. There are other countries with Fair Use law: Poland, Israel, South Korea… Well they point to the English language version. What about Japanese which has a rich reuse community on Twitter? It also points to the English policy.

So, policy are not equal in their policynesss. But why does this matter? Because this is where rule of law starts to break down… And we cannot assume that the same policies apply universally, that can’t be assumed.

But what about parody? Why bring this up? Well parody is tied up with the idea of Fair Use and creative transformation. Comedy is protected Fair Use category. And Twitter has a rich seam of parody. And indeed, if you Google for the fair use policy, the “People also ask” section has as the first question: “What is a parody account”.

Whilst Fair Use wasn’t there as a policy until 2014, parody unofficially had a policy in 2009, an official one in 2010, updates, another version in 2013 for the IPO. Biz Stone writes about, when at Google, lawyers saying about fake accounts “just say it is parody!” and the importance of parody. And indeed the parody policy has been translated much more widely than the Fair Use policy.

So, policies select bodies of law and align platforms to these bodies of law, in varying degree and depending on specific legitimation practices. Fair Use is strongly associated with US law, and embedding that in the translated policies aligns Twitter more to US law than they want to be. But parody has roots in free speech, and that is something that Twitter wishes to align itself with.

Visual Arts in Digital and Online Environments: Changing Copyright and Fair Use Practice among Institutions and Individuals Abstract – Patricia Aufderheide, Aram Sinnreich, American University, United States of America

Patricia: Aram and I have been working with the College Art Association and it brings together a wide range of professionals and practitioners in art across colleges in the US. They had a new code of conduct and we wanted to speak to them, a few months after that code of conduct was released, to see if that had changed practice and understanding. This is a group that use copyrighted work very widely. And indeed one-third of respondents avoid, abandon, or are delayed because of copyrighted work.

Aram: four-fifths of CAA members use copyrighted materials in their work, but only one fifth employ fair use to do that – most or always seek permission. And of those that use fair use there are some that always or usually use Fair Use. So there are real differences here. So, Fair Use are valued if you know about it and undestand it… but a quarter of this group aren’t sure if Fair Use is useful or not. Now there is that code of conduct. There is also some use of Creative Commons and open licenses.

Of those that use copyright materials… But 47% never use open licenses for their own work – there is a real reciprocity gap. Only 26% never use others openly licensed work. and only 10% never use others’ public domain work. Respondents value creative copying… 19 out of 20 CAA members think that creative appropriation can be “original”, and despite this group seeking permissions they also don’t feel that creative appropriation shouldn’t neccassarily require permission. This really points to an education gap within the community.

And 43% said that uncertainty about the law limits creativity. They think they would appropriate works more, they would public more, they would share work online… These mirror fair use usage!

Patricia: We surveyed this group twice in 2013 and in 2016. Much stays the same but there have been changes… In 2016, 2/3rd have heard about the code, and a third have shared that information – with peers, in teaching, with colleagues. Their associations with the concept of Fair Use are very positive.

Arem: The good news is that the code use does lead to change, even within 10 months of launch. This work was done to try and show how much impact a code of conduct has on understanding… And really there was a dramatic differences here. From the 2016 data, those who are not aware of the code, look a lot like those who are aware but have not used the code. But those who use the code, there is a real difference… And more are using fair use.

Patricia: There is one thing we did outside of the survey… There have been dramatic changes in the field. A number of universities have changed journal policies to be default Fair Use – Yale, Duke, etc. There has been a lot of change in the field. Several museums have internally changed how they create and use their materials. So, we have learned that education matters – behaviour changes with knowledge confidence. Peer support matters and validates new knowledge. Institutional action, well publicized, matters .The newest are most likely to change quickly, but the most veteran are in the best position – it is important to have those influencers on board… And teachers need to bring this into their teaching practice.

Panel Q&A

Q1) How many are artists versus other roles?

A1 – Patricia) About 15% are artists, and they tend to be more positive towards fair use.

Q2) I was curious about changes that took place…

A2 – Arem) We couldn’t ask whether the code made you change your practice… But we could ask whether they had used fair use before and after…

Q3) You’ve made this code for the US CAA, have you shared that more widely…

A3 – Patricia) Many of the CAA members work internationally, but the effectiveness of this code in the US context is that it is about interpreting US Fair Use law – it is not a legal document but it has been reviewed by lawyers. But copyright is territorial which makes this less useful internationally as a document. If copyright was more straightforward, that would be great. There are rights of quotation elsewhere, there is fair dealing… And Canadian law looks more like Fair Use. But the US is very litigious so if something passes Fair Use checking, that’s pretty good elsewhere… But otherwise it is all quite territorial.

A3 – Arem) You can see in data we hold that international practitioners have quite different attitudes to American CAA members.

Q4) You talked about the code, and changes in practice. When I talk to filmmakers and documentary makers in Germany they were aware of Fair Use rights but didn’t use them as they are dependent on TV companies buy them and want every part of rights cleared… They don’t want to hurt relationships.

A4 – Patricia) We always do studies before changes and it is always about reputation and relationship concerns… Fair Use only applies if you can obtain the materials independently… But then the question may be that will rights holders be pissed off next time you need to licence content. What everyone told me was that we can do this but it won’t make any difference…

Chair) I understand that, but that question is about use later on, and demonstration of rights clearance.

A4 – Patricia) This is where change in US errors and omissions insurance makes a difference – that protects them. The film and television makers code of conduct helped insurers engage and feel confident to provide that new type of insurance clause.

Q5) With US platforms, as someone in Norway, it can be hard to understand what you can and cannot access and use on, for instance, in YouTube. Also will algorithmic filtering processes of platforms take into account that they deal with content in different territories?

A5 – Arem) I have spoken to Google Council about that issue of filtering by law – there is no difference there… But monitoring

A5 – Amy) I have written about legal fictions before… They are useful for thinking about what a “reasonable person” – and that can be vulnerable by jury and location so writing that into policies helps to shape that.

A5 – Patricia) The jurisdiction is where you create, not where the work is from…

Q6) There is an indecency case in France which they want to try in French court, but Facebook wants it tried in US court. What might the impact on copyright be?

A6 – Arem) A great question but this type of jurisdictional law has been discussed for over 10 years without any clear conclusion.

A6 – Patricia) This is a European issue too – Germany has good exceptions and limitations, France has horrible exceptions and limitations. There is a real challenge for pan European law.

Q7) Did you look at all of impact on advocacy groups who encouraged writing in/completion of replies on DCMA. And was there any big difference between the farmers and car owners?

A7) There was a lot of discussion on the digital right to repair site, and that probably did have an impact. I did work on Net Neutrality before. But in any of those cases I take out boiler plate, and see what they add directly – but there is a whole other paper to be done on boiler plate texts and how they shape responses and terms of additional comments. It wasn’t that easy to distinguish between farmers and car owners, but it was interesting how individuals established credibility. For farmers they talked abot the value of fixing their own equipment, of being independent, of history of ownership. Car mechanics, by contrast, establish technical expertise.

Q8) As a follow up: farmers will have had a long debate over genetically modified seeds – and the right to tinker in different ways…

A8) I didn’t see that reflected in the comments, but there may well be a bigger issue around micromanagement of practices.

Q9) Olivia, I was wondering if you were considering not only the rhetorical arguements of users, what about the way the techniques and tactics they used are received on the other side… What are the effective tactics there, or locate the limits of the effectiveness of the layperson vernacular stategies?

A9) My goal was to see what frames of arguements looked most effective. I think in the case of the John Deere DCMA case that wasn’t that conclusive. It can be really hard to separate the NGO from the individual – especially when NGOs submit huge collections of individual responses. I did a case study on non-consensual pornography was more conclusive in terms of strategies that was effective. The discourses I look at don’t look like legal discourse but I look at the tone and content people use. So, on revenge porn, the law doesn’t really reflect user practice for instance.

Q10) For Amy, I was wondering… Is the problem that Fair Use isn’t translated… Or the law behind that?

A10 – Amy) I think Twitter in particular have found themselves in a weird middle space… Then the exceptions wouldn’t come up. But having it in English is the odd piece. That policy seems to speak specifically to Americans… But you could argue they are trying to impose (maybe that’s a bit too strong) on all English speaking territory. On YouTube all of the policies are translated into the same languages, including Fair Use.

Q11) I’m fascinated in vernacular understanding and then the experts who are in the round tables, who specialise in these areas. How do you see vernacular discourse use in more closed/smaller settings?

A11 – Olivia) I haven’t been able to take this up as so many of those spaces are opaque. But in the 2012 rule making there were some direct quotes from remixers. And there a suggestion around DVD use that people should videotape the TV screen… and that seemed unreasonably onorous…

Chair) Do you forsee a next stage where you get to be in those rooms and do more on that?

A11 – Olivia) I’d love to do some ethnographic studies, to get more involved.

A11 – Patricia) I was in Washington for the DMCA hearings and those are some of the most fun things I go to. I know that the documentary filmmakers have complained about cost of participating… But a technician from the industry gave 30 minutes of evidence on the 40 technical steps to handle analogue film pieces of information… And to show that it’s not actually broadcast quality. It made them gasp. It was devastating and very visual information, and they cited it in their ruling… And similarly in John Deere case the car technicians made impact. By contrast a teacher came in to explain why copying material was important for teaching, but she didn’t have either people or evidence of what the difference is in the classroom.

Q12) I have an interesting case if anyone wants to look at it, around Wikipedia’s Fair Use issues around multimedia. Volunteers take pre-emptively being stricter as they don’t want lawyers to come in on that… And the Wikipedia policies there. There is also automation through bots to delete content without clear Fair Use exception.

A12 – Arem) I’ve seen Fair Use misappropriated on Wikipedia… Copyright images used at low resolution and claimed as Fair Use…

A12- Patricia) Wikimania has all these people who don’t want to deal with law on copyright at all! Wikimedia lawyers are in an a really difficult position.

Share/Bookmark

Association of Internet Researchers AoIR 2016: Day Two

Today I am again at the Association of Internet Researchers AoIR 2016 Conference in Berlin. Yesterday we had workshops, today the conference kicks off properly. Follow the tweets at: #aoir2016.

As usual this is a liveblog so all comments and corrections are very much welcomed. 

Platform Studies: The Rules of Engagement (Chair: Jean Burgess, QUT)

How affordances arise through relations between platforms, their different types of users, and what they do to the technology – Taina Bucher (University of Copenhagen) and Anne Helmond (University of Amsterdam)

Taina: Hearts on Twitter: In 2015 Twitter moved from stars to hearts, changing the affordances of the platform. They stated that they wanted to make the platform more accessible to new users, but that impacted on existing users.

Today we are going to talk about conceptualising affordances. In it’s original meaning an affordance is conceived of as a relational property (Gibson). For Norman perceived affordances were more the concern – thinking about how objects can exhibit or constrain particular actions. Affordances are not just the visual clues or possibilities, but can be felt. Gaver talks about these technology affordances. There are also social affordances – talked about my many – mainly about how poor technological affordances have impact on societies. It is mainly about impact of technology and how it can contain and constrain sociality. And finally we have communicative affordances (Hutchby), how technological affordances impact on communities and communications of practices.

So, what about platform changes? If we think about design affordances, we can see that there are different ways to understand this. The official reason for the design was given as about the audience, affording sociality of community and practices.

Affordances continues to play an important role in media and social media research. They tend to be conceptualised as either high-level or low-level affordances, with ontological and epistemological differences:

  • High: affordance in the relation – actions enabled or constrained
  • Low: affordance in the technical features of the user interface – reference to Gibson but they vary in where and when affordances are seen, and what features are supposed to enable or constrain.

Anne: We want to now turn to platform-sensitive approach, expanding the notion of the user –> different types of platform users, end-users, developers, researchers and advertisers – there is a real diversity of users and user needs and experiences here (see Gillespie on platforms. So, in the case of Twitter there are many users and many agendas – and multiple interfaces. Platforms are dynamic environments – and that differentiates social media platforms from Gibson’s environmental platforms. Computational systems driving media platforms are different, social media platforms adjust interfaces to their users through personalisation, A/B testing, algorithmically organised (e.g. Twitter recommending people to follow based on interests and actions).

In order to take a relational view of affordances, and do that justice, we also need to understand what users afford to the platforms – as they contribute, create content, provide data that enables to use and development and income (through advertisers) for the platform. Returning to Twitter… The platform affords different things for different people

Taking medium-specificity of platforms into account we can revisit earlier conceptions of affordance and critically analyse how they may be employed or translated to platform environments. Platform users are diverse and multiple, and relationships are multidirectional, with users contributing back to the platform. And those different users have different agendas around affordances – and in our Twitter case study, for instance, that includes developers and advertisers, users who are interested in affordances to measure user engagement.

How the social media APIs that scholars so often use for research are—for commercial reasons—skewed positively toward ‘connection’ and thus make it difficult to understand practices of ‘disconnection’ – Nicolas John (Hebrew University of Israel) and Asaf Nissenbaum (Hebrew University of Israel)

Consider this… On Facebook…If you add someone as a friend they are notified. If you unfriend them, they do not. If you post something you see it in your feed, if you delete it it is not broadcast. They have a page called World of Friends – they don’t have one called World of Enemies. And Facebook does not take kindly to app creators who seek to surface unfriending and removal of content. And Facebook is, like other social media platforms, therefore significantly biased towards positive friending and sharing actions. And that has implications for norms and for our research in these spaces.

One of our key questions here is what can’t we know about

Agnotology is defined as the study of ignorance. Robert Proctor talks about this in three terms: native state – childhood for instance; strategic ploy – e.g. the tobacco industry on health for years; lost realm – the knowledge that we cease to hold, that we loose.

I won’t go into detail on critiques of APIs for social science research, but as an overview the main critiques are:

  1. APIs are restrictive – they can cost money, we are limited to a percentage of the whole – Burgess and Bruns 2015; Bucher 2013; Bruns 2013; Driscoll and Walker
  2. APIs are opaque
  3. APIs can change with little notice (and do)
  4. Omitted data – Baym 2013 – now our point is that these platforms collect this data but do not share it.
  5. Bias to present – boyd and Crawford 2012

Asaf: Our methodology was to look at some of the most popular social media spaces and their APIs. We were were looking at connectivity in these spaces – liking, sharing, etc. And we also looked for the opposite traits – unliking, deletion, etc. We found that social media had very little data, if any, on “negative” traits – and we’ll look at this across three areas: other people and their content; me and my content; commercial users and their crowds.

Other people and their content – APIs tend to supply basic connectivity – friends/following, grouping, likes. Almost no historical content – except Facebook which shares when a user has liked a page. Current state only – disconnections are not accounted for. There is a reason to not know this data – privacy concerns perhaps – but that doesn’t explain my not being able to find this sort of information about my own profile.

Me and my content – negative traits and actions are hidden even from ourselves. Success is measured – likes and sharin, of you or by you. Decline is not – disconnections are lost connections… except on Twitter where you can see analytics of followers – but no names there, and not in the API. So we are losing who we once were but are not anymore. Social network sites do not see fit to share information over time… Lacking disconnection data is an idealogical and commercial issue.

Commercial users and their crowds – these users can see much more of their histories, and the negative actions online. They have a different regime of access in many cases, with the ups and downs revealed – though you may need to pay for access. Negative feedback receives special attention. Facebook offers the most detailed information on usage – including blocking and unliking information. Customers know more than users, or Pages vs. Groups.

Nicholas: So, implications. From what Asaf has shared shows the risk for API-based research… Where researchers’ work may be shaped by the affordances of the API being used. Any attempt to capture negative actions – unlikes, choices to leave or unfriend. If we can’t use APIs to measure social media phenomena, we have to use other means. So, unfriending is understood through surveys – time consuming and problematic. And that can put you off exploring these spaces – it limits research. The advertiser-friends user experience distorts the space – it’s like the stock market only reporting the rises except for a few super wealthy users who get the full picture.

A biography of Twitter (a story told through the intertwined stories of its key features and the social norms that give them meaning, drawing on archival material and oral history interviews with users) – Jean Burgess (Queensland University of Technology) and Nancy Baym (Microsoft Research)

I want to start by talking about what I mean by platforms, and what I mean by biographies. Here platforms are these social media platforms that afford particular possibilities, they enable and shape society – we heard about the platformisation of society last night – but their governance, affordances, are shaped by their own economic existance. They are shaping and mediating socio-cultural experience and we need to better to understand the values and socio-cultural concerns of the platforms. By platform studies we mean treating social media platforms as spaces to study in their own rights: as institutions, as mediating forces in the environment.

So, why “biography” here? First we argue that whilst biographical forms tend to be reserved for individuals (occasionally companies and race horses), they are about putting the subject in context of relationships, place in time, and that the context shapes the subject. Biographies are always partial though – based on unreliable interviews and information, they quickly go out of date, and just as we cannot get inside the heads of those who are subjects of biographies, we cannot get inside many of the companies at the heart of social media platforms. But (after Richard Rogers) understanding changes helps us to understand the platform.

So, in our forthcoming book, Twitter: A Biography (NYU 2017), we will look at competing and converging desires around e.g the @, RT, #. Twitter’s key feature set are key characters in it’s biography. Each has been a rich site of competing cultures and norms. We drew extensively on the Internet Archives, bloggers, and interviews with a range of users of the platform.

Nancy: When we interviewed people we downloaded their archive with them and talked through their behaviour and how it had changed – and many of those features and changes emerged from that. What came out strongly is that noone knows what Twitter is for – not just amongst users but also amongst the creators – you see that today with Jack Dorsey and Anne Richards. The heart of this issue is about whether Twitter is about sociality and fun, or is it a very important site for sharing important news and events. Users try to negotiate why they need this space, what is it for… They start squabling saying “Twitter, you are doing it wrong!”… Changes come with backlash and response, changed decisions from Twitter… But that is also accompanied by the media coverage of Twitter, but also the third party platforms build on Twitter.

So the “@” is at the heart of Twitter for sociality and Twitter for information distribution. It was imported from other spaces – IRC most obviously – as with other features. One of the earliest things Twitter incorporated was the @ and the links back.. You have things like originally you could see everyone’s @ replies and that led to feed clutter – although some liked seeing unexpected messages like this. So, Twitter made a change so you could choose. And then they changed again to automatically not see replies from those you don’t follow. So people worked around that with “.@” – which created conflict between the needs of the users, the ways they make it usable, and the way the platform wants to make the space less confusing to new users.

The “RT” gave credit to people for their words, and preserved integrity of words. At first this wasn’t there and so you had huge variance – the RT, the manually spelled out retweet, the hat tip (HT). Technical changes were made, then you saw the number of retweets emerging as a measure of success and changing cultures and practices.

The “#” is hugely disputed – it emerged through hashtag.org: you couldn’t follow them in Twitter at first but they incorporated it to fend off third party tools. They are beloved by techies, and hated by user experience designers. And they are useful but they are also easily coopted by trolls – as we’ve seen on our own hashtag.

Insights into the actual uses to which audience data analytics are put by content creators in the new screen ecology (and the limitations of these analytics) – Stuart Cunningham (QUT) and David Craig (USC Annenberg School for Communication and Journalism)

The algorithmic culture is well understood as a part of our culture. There are around 150 items on Tarleton Gillespie and Nick Seaver’s recent reading list and the literature is growing rapidly. We want to bring back a bounded sense of agency in the context of online creatives.

What do I mean by “online creatives”? Well we are looking at social media entertainment – a “new screen ecology” (Cunningham and Silver 2013; 2015) shaped by new online creatives who are professionalising and monetising on platforms like YouTube, as opposed to professional spaces, e.g. Netflix. YouTube has more than 1 billion users, with revenue in 2015 estimated at $4 billion per year. And there are a large number of online creatives earning significant incomes from their content in these spaces.

Previously online creatives were bound up with ideas of democratic participative cultures but we want to offer an immanent critique of the limits of data analytics/algorithmic culture in shaping SME from with the industry on both the creator (bottom up) and platform (top down) side. This is an approach to social criticism exposes the way reality conflicts not with some “transcendent” concept of rationality but with its own avowed norms, drawing on Foucault’s work on power and domination.

We undertook a large number of interviews and from that I’m going to throw some quotes at you… There is talk of information overload – of what one might do as an online creative presented with a wealth of data. Creatives talk about the “non-scalable practices” – the importance and time required to engage with fans and subscribers. Creatives talk about at least half of a working week being spent on high touch work like responding to comments, managing trolls, and dealing with challenging responses (especially with creators whose kids are engaged in their content).

We also see cross-platform engagement – and an associated major scaling in workload. There is a volume issue on Facebook, and the use of Twitter to manage that. There is also a sense of unintended consequences – scale has destroyed value. Income might be $1 or $2 for 100,000s or millions of views. There are inherent limits to algorithmic culture… But people enjoy being part of it and reflect a real entrepreneurial culture.

In one or tow sentences, the history of YouTube can be seen as a sort of clash of NoCal and SoCal cultures. Again, no-one knows what it is for. And that conflict has been there for ten years. And you also have the MCNs (Multi-Contact Networks) who are caught like the meat in the sandwich here.

Panel Q&A

Q1) I was wondering about user needs and how that factors in. You all drew upon it to an extent… And the dissatisfaction of users around whether needs are listened to or not was evident in some of the case studies here. I wanted to ask about that.

A1 – Nancy) There are lots of users, and users have different needs. When platforms change and users are angry, others are happy. We have different users with very different needs… Both of those perspectives are user needs, they both call for responses to make their needs possible… The conflict and challenges, how platforms respond to those tensions and how efforts to respond raise new tensions… that’s really at the heart here.

A1 – Jean) In our historical work we’ve also seen that some users voices can really overpower others – there are influential users and they sometimes drown out other voices, and I don’t want to stereotype here but often technical voices drown out those more concerned with relationships and intimacy.

Q2) You talked about platforms and how they developed (and I’m afraid I didn’t catch the rest of this question…)

A2 – David) There are multilateral conflicts about what features to include and exclude… And what is interesting is thinking about what ideas fail… With creators you see economic dependence on platforms and affordances – e.g. versus PGC (Professionally Generated Content).

A2 – Nicholas) I don’t know what user needs are in a broader sense, but everyone wants to know who unfriended them, who deleted them… And a dislike button, or an unlike button… The response was strong but “this post makes me sad” doesn’t answer that and there is no “you bastard for posting that!” button.

Q3) Would it be beneficial to expose unfriending/negative traits?

A3 – Nicholas) I can think of a use case for why unfriending would be useful – for instance wouldn’t it be useful to understand unfriending around the US elections. That data is captured – Facebook know – but we cannot access it to research it.

A3 – Stuart) It might be good for researchers, but is it in the public good? In Europe and with the Right to be Forgotten should we limit further the data availability…

A3 – Nancy) I think the challenge is that mismatch of only sharing good things, not sharing and allowing exploration of negative contact and activity.

A3 – Jean) There are business reasons for positivity versus negativity, but it is also about how the platforms imagine their customers and audiences.

Q4) I was intrigued by the idea of the “Medium specificity of platforms” – what would that be? I’ve been thinking about devices and interfaces and how they are accessed… We have what we think of as a range but actually we are used to using really one or two platforms – e.g. Apple iPhone – in terms of design, icons, etc. and the possibilities of interface is, and what happens when something is made impossible by the interface.

A4 – Anne) When the “medium specificity” we are talking about the platform itself as medium. Moving beyond end user and user experience. We wanted to take into account the role of the user – the platform also has interfaces for developers, for advertisers, etc. and we wanted to think about those multiple interfaces, where they connect, how they connect, etc.

A4 – Taina) It’s a great point about medium specitivity but for me it’s more about platform specifity.

A4 – Jean) The integration of mobile web means the phone iOS has a major role here…

A4 – Nancy) We did some work with couples who brought in their phones, and when one had an Apple and one had an Android phone we actually found that they often weren’t aware of what was possible in the social media apps as the interfaces are so different between the different mobile operating systems and interfaces.

Q5) Can you talk about algorithmic content and content innovation?

A5 – David) In our work with YouTube we see forms of innovation that are very platform specific around things like Vine and Instagram. And we also see counter-industrial forms and practices. So, in the US, we see blogging and first person accounts of lives… beauty, unboxing, etc. But if you map content innovation you see (similarly) this taking the form of gaps in mainstream culture – in India that’s stand up comedy for instance. Algorithms are then looking for qualities and connections based on what else is being accessed – creating a virtual circle…

Q6) Can we think of platforms as instable, about platforms having not quite such a uniform sense of purpose and direction…

A6 – Stuart) Most platforms are very big in terms of their finance… If you compare that to 20 years ago the big companies knew what they were doing! Things are much more volatile…

A6 – Jean) That’s very common in the sector, except maybe on Facebook… Maybe.

Share/Bookmark

Association of Internet Researchers AoIR 2016 РDay 1 РJos̩ van Dijck Keynote

If you’ve been following my blog today you will know that I’m in Berlin for the Association of Internet Researchers AoIR 2016 (#aoir2016) Conference, at Humboldt University. As this first day has mainly been about workshops – and I’ve been in a full day long Digital Methods workshop – we do have our first conference keynote this evening. And as it looks a bit different to my workshop blog, I thought a new post was in order.

As usual, this is a live blog post so corrections, comments, etc. are all welcomed. This session is also being videoed so you will probably want to refer to that once it becomes available as the authoritative record of the session. 

Keynote: The Platform Society – José van Dijck (University of Amsterdam) with Session Chair: Jennifer Stromer-Galley

 

Share/Bookmark

Association of Internet Researchers AoIR 2016: Day 1 – Workshops

After a few weeks of leave I’m now back and spending most of this week at the Association of Internet Researchers (AoIR) Conference 2016. I’m hugely excited to be here as the programme looks excellent with a really wide range of internet research being presented and discussed. I’ll be liveblogging throughout the week starting with today’s workshops.

I am booked into the Digital Methods in Internet Research: A Sampling Menu workshop, although I may be switching session at lunchtime to attend the Internet rules… for Higher Education workshop this afternoon.

The Digital Methods workshop is being chaired by Patrik Wikstrom (Digital Media Research Centre, Queensland University of Technology, Australia) and the speakers are:

  • Erik Borra (Digital Methods Initiative, University of Amsterdam, the Netherlands),
  • Axel Bruns (Digital Media Research Centre, Queensland University of Technology, Australia),
  • Jean Burgess (Digital Media Research Centre, Queensland University of Technology, Australia),
  • Carolin Gerlitz (University of Siegen, Germany),
  • Anne Helmond (Digital Methods Initiative, University of Amsterdam, the Netherlands),
  • Ariadna Matamoros Fernandez (Digital Media Research Centre, Queensland University of Technology, Australia),
  • Peta Mitchell (Digital Media Research Centre, Queensland University of Technology, Australia),
  • Richard Rogers (Digital Methods Initiative, University of Amsterdam, the Netherlands),
  • Fernando N. van der Vlist (Digital Methods Initiative, University of Amsterdam, the Netherlands),
  • Esther Weltevrede (Digital Methods Initiative, University of Amsterdam, the Netherlands).

I’ll be taking notes throughout but the session materials are also available here: http://tinyurl.com/aoir2016-digmethods/.

Patrik: We are in for a long and exciting day! I won’t introduce all the speakers as we won’t have time!

Conceptual Introduction: Situating Digital Methods (Richard Rogers)

My name is Richard Rogers, I’m professor of new media and digital culture at the University of Amsterdam and I have the pleasure of introducing today’s session. So I’m going to do two things, I’ll be situating digital methods in internet-related research, and then taking you through some digital methods.

I would like to situate digital methods as a third era of internet research… I think all of these eras thrive and overlap but they are differentiated.

  1. Web of Cyberspace (1994-2000): Cyberstudies was an effort to see difference in the internet, the virtual as distinct from the real. I’d situate this largely in the 90’s and the work of Steve Jones and Steve (?).
  2. Web as Virtual Society? (2000-2007) saw virtual as part of the real. Offline as baseline and “virtual methods” with work around the digital economy, the digital divide…
  3. Web as societal data (2007-) is about “virtual as indication of the real. Online as baseline.

Right now we use online data about society and culture to make “grounded” claims.

So, if we look at Allrecipes.com Thanksgiving recipe searches on a map we get some idea of regional preference, or we look at Google data in more depth, we get this idea of internet data as grounding for understanding culture, society, tastes.

So, we had this turn in around 2008 to “web as data” as a concept. When this idea was first introduced not all were comfortable with the concept. Mike Thelwell et al (2005) talked about the importance of grounding the data from the internet. So, for instance, Google’s flu trends can be compared to Wikipedia traffic etc. And with these trends we also get the idea of “the internet knows first”, with the web predicting other sources of data.

Now I do want to talk about digital methods in the context of digital humanities data and methods. Lev Manovich talks about Cultural Analytics. It is concerned with digitised cultural materials with materials clusterable in a sort of art historical way – by hue, style, etc. And so this is a sort of big data approach that substitutes “continuous change” for periodisation and categorisation for continuation. So, this approach can, for instance, be applied to Instagram (Selfiexploration), looking at mood, aesthetics, etc. And then we have Culturenomics, mainly through the Google Ngram Viewer. A lot of linguists use this to understand subtle differences as part of distance reading of large corpuses.

And I also want to talk about e-social sciences data and method. Here we have Webometrics (Thelwell et al) with links as reputational markers. The other tradition here is Altmetrics (Priem et al), which uses online data to do citation analysis, with social media data.

So, at least initially, the idea behind digital methods was to be in a different space. The study of online digital objects, and also natively online method – methods developed for the medium. And natively digital is meant in a computing sense here. In computing software has a native mode when it is written for a specific processor, so these are methods specifically created for the digital medium. We also have digitized methods, those which have been imported and migrated methods adapted slightly to the online.

Generally speaking there is a sort of protocol for digital methods: Which objects and data are available? (links, tags, timestamps); how do dominant devices handle them? etc.

I will talk about some methods here:

1. Hyperlink

For the hyperlink analysis there are several methods. The Issue Crawler software, still running and working, enable you to see links between pages, direction of linking, aspirational linking… For example a visualisation of an Armenian NGO shows the dynamics of an issue network showing politics of association.

The other method that can be used here takes a list of sensitive sites, using Issue Crawler, then parse it through an internet censorship service. And variations on this that indicate how successful attempts at internet censorship are. We do work on Iran and China and I should say that we are always quite thoughtful about how we publish these results because of their sensitivity.

2. The website as archived object

We have the Internet Archive and we have individual archived web sites. Both are useful but researcher use is not terribly signficant so we have been doing work on this. See also a YouTube video called “Google and the politics of tabs” – a technique to create a movie of the evolution of a webpage in the style of timelapse photography. I will be publishing soon about this technique.

But we have also been looking at historical hyperlink analysis – giving you that context that you won’t see represented in archives directly. This shows the connections between sites at a previous point in time. We also discovered that the “Ghostery” plugin can also be used with archived websites – for trackers and for code. So you can see the evolution and use of trackers on any website/set of websites.

6. Wikipedia as cultural reference

Note: the numbering is from a headline list of 10, hence the odd numbering… 

We have been looking at the evolution of Wikipedia pages, understanding how they change. It seems that pages shift from neutral to national points of view… So we looked at Srebenica and how that is represented. The pages here have different names, indicating difference in the politics of memory and reconciliation. We have developed a triangulation tool that grabs links and references and compares them across different pages. We also developed comparative image analysis that lets you see which images are shared across articles.

7. Facebook and other social networking sites

Facebook is, as you probably well know, is a social media platform that is relatively difficult to pin down at a moment in time. Trying to pin down the history of Facebook find that very hard – it hasn’t been in the Internet Archive for four years, the site changes all the time. We have developed two approaches: one for social media profiles and interest data as means of stufying cultural taste ad political preference or “Postdemographics”; And “Networked content analysis” which uses social media activity data as means of studying “most engaged with content” – that helps with the fact that profiles are no longer available via the API. To some extend the API drives the research, but then taking a digital methods approach we need to work with the medium, find which possibilities are there for research.

So, one of the projects undertaken with in this space was elFriendo, a MySpace-based project which looked at the cultural tastes of “friends” of Obama and McCain during their presidential race. For instance Obama’s friends best liked Lost and The Daily Show on TV, McCain’s liked Desperate Housewives, America’s Next Top Model, etc. Very different cultures and interests.

Now the Networked Content Analysis approach, where you quantify and then analyse, works well with Facebook. You can look at pages and use data from the API to understand the pages and groups that liked each other, to compare memberships of groups etc. (at the time you were able to do this). In this process you could see specific administrator names, and we did this with right wing data working with a group called Hope not Hate, who recognised many of the names that emerged here. Looking at most liked content from groups you also see the shared values, cultural issues, etc.

So, you could see two areas of Facebook Studies, Facebook I (2006-2011) about presentation of self: profiles and interests studies (with ethics); Facebook II (2011-) which is more about social movements. I think many social media platforms are following this shift – or would like to. So in Instagram Studies the Instagram I (2010-2014) was about selfie culture, but has shifed to Instagram II (2014-) concerned with antagonistic hashtag use for instance.

Twitter has done this and gone further… Twitter I (2006-2009) was about urban lifestyle tool (origins) and “banal” lunch tweets – their own tagline of “what are you doing?”, a connectivist space; Twitter II (2009-2012) has moved to elections, disasters and revolutions. The tagline is “what’s happening?” and we have metrics “trending topics”; Twitter III (2012-) sees this as a generic resource tool with commodification of data, stock market predictions, elections, etc.

So, I want to finish by talking about work on Twitter as a storytelling machine for remote event analysis. This is an approach we developed some years ago around the Iran event crisis. We made a tweet collection around a single Twitter hashtag – which is no longer done – and then ordered by most retweeted (top 3 for each day) and presented in chronological (not reverse) order. And we then showed those in huge displays around the world…

To take you back to June 2009… Mousavi holds an emergency press conference. Voter turn out is 80%. SMS is down. Mousavi’s website and Facebook are blocked. Police use pepper spray… The first 20 days of most popular tweets is a good succinct summary of the events.

So, I’ve taken you on a whistle stop tour of methods. I don’t know if we are coming to the end of this. I was having a conversation the other day that the Web 2.0 days are over really, the idea that the web is readily accessible, that APIs and data is there to be scraped… That’s really changing. This is one of the reasons the app space is so hard to research. We are moving again to user studies to an extent. What the Chinese researchers are doing involves convoluted processes to getting the data for instance. But there are so many areas of research that can still be done. Issue Crawler is still out there and other tools are available at tools.digitalmethods.net.

Twitter studies with DMI-TCAT (Erik Borra)

I’m going to be talking about how we can use the DMI-TCAT tool to do Twitter Studies. I am here with Emile den Tex, one of the original developers of this tool, alongside Eric Borra.

So, what is DMI-TCAT? It is the Digital Methods Initiative Twitter Capture and Analysis Toolset, a server side tool which tries to capture robust and reproducible data capture and analysis. The design is based on two ideas: that captured datasets can be refined in different ways; and that the datasets can be analysed in different ways. Although we developed this tool, it is also in use elsewhere, particularly in the US and Australia.

So, how do we actually capture Twitter data? Some of you will have some experience of trying to do this. As researchers we don’t just want the data, we also want to look at the platform in itself. If you are in industry you get Twitter data through a “data partner”, the biggest of which by far is GNIP – owned by Twitter as of the last two years – then you just pay for it. But it is pricey. If you are a researcher you can go to an academic data partner – DiscoverText or Hexagon – and they are also resellers but they are less costly. And then the third route is the publicly available data – REST APIs, Search API, Streaming APIs. These are, to an extent, the authentic user perspective as most people use these… We have built around these but the available data and APIs shape and constrain the design and the data.

For instance the “Search API” prioritises “relevance” over “completeness” – but as academics we don’t know how “relevance” is being defined here. If you want to do representative research then completeness may be most important. If you want to look at how Twitter prioritises the data, then that Search API may be most relevant. You also have to understand rate limits… This can constrain research, as different data has different rate limits.

So there are many layers of technical mediation here, across three big actors: Twitter platform – and the APIs and technical data interfaces; DMI-TCAT (extraction); Output types. And those APIs and technical data interfaces are significant mediators here, and important to understand their implications in our work as researchers.

So, onto the DMI-TCAT tool itself – more on this in Borra & Reider (2014) (doi:10.1108/AJIM-09-2013-0094). They talk about “programmed method” and the idea of the methodological implications of the technical architecture.

What can one learn if one looks at Twitter through this “programmed method”? Well (1) Twitter users can change their Twitter handle, but their ids will remain identical – sounds basic but its important to understand when collecting data. (2) the length of a Tweet may vary beyond maximum of 140 characters (mentions and urls); (3) native retweets may have their top level text property stortened. (4) Unexpected limitations  support for new emoji characters can be problematic. (5) It is possible to retrieve a deleted tweet.

So, for example, a tweet can vary beyond 140 characters. The Retweet of an original post may be abbreviated… Now we don’t want that, we want it to look as it would to a user. So, we capture it in our tool in the non-truncated version.

And, on the issue of deletion and witholding. There are tweets deleted by users, and their are tweets which are withheld by the platform – and the withholding is a country by country issue. But you can see tweets only available in some countries. A project that uses this information is “Politwoops” (http://politwoops.sunlightfoundation.com/) which captures tweets deleted by US politicians, that lets you filter to specific states, party, position. Now there is an ethical discussion to be had here… We don’t know why tweets are deleted… We could at least talk about it.

So, the tool captures Twitter data in two ways. Firstly there is the direct capture capabilities (via web front-end) which allows tracking of users and capture of public tweets posted by these users; tracking particular terms or keywords, including hashtags; get a small random (approx 1%) of all public statuses. Secondary capture capabilities (via scripts) allows further exploration, including user ids, deleted tweets etc.

Twitter as a platform has a very formalised idea of sociality, the types of connections, parameters, etc. When we use the term “user” we mean it in the platform defined object meaning of the word.

Secondary analytical capabilities, via script, also allows further work:

  1. support for geographical polygons to delineate geographical regions for tracking particular terms or keywords, including hashtags.
  2. Built-in URL expander, following shortened URLs to their destination. Allowing further analysis, including of which statuses are pointing to the same URLs.
  3. Download media (e.g. videos and images (attached to particular Tweets).

So, we have this tool but what sort of studies might we do with Twitter? Some ideas to get you thinking:

  1. Hashtag analysis – users, devices etc. Why? They are often embedded in social issues.
  2. Mentions analysis – users mentioned in contexts, associations, etc. allowing you to e.g. identify expertise.
  3. Retweet analysis – most retweeted per day.
  4. URL analysis – the content that is most referenced.

So Emile will now go through the tool and how you’d use it in this way…

Emile: I’m going to walk through some main features of the DMI TCAT tool. We are going to use a demo site (http://tcatdemo.emiledentex.nl/analysis/) and look at some Trump tweets…

Note: I won’t blog everything here as it is a walkthrough, but we are playing with timestamps (the tool uses UTC), search terms etc. We are exploring hashtag frequency… In that list you can see Bengazi, tpp, etc. Now, once you see a common hashtag, you can go back and query the dataset again for that hashtag/search terms… And you can filter down… And look at “identical tweets” to found the most retweeted content. 

Emile: Eric called this a list making tool – it sounds dull but it is so useful… And you can then put the data through other tools. You can put tweets into Gephi. Or you can do exploration… We looked at Getty Parks project, scraped images, reverse Google image searched those images to find the originals, checked the metadata for the camera used, and investigated whether the cost of a camera was related to the success in distributing an image…

Richard: It was a critique of user generated content.

Analysing Social Media Data with TCAT and Tableau (Axel Bruns)

Analysing Network Dynamics with Agent Based Models (Patrik Wikström)

Tracking the Trackers (Anne Helmond, Carolin Gerlitz, Esther Weltevrede and Fernando van der Vlist)

Multiplatform Issue Mapping (Jean Burgess & Ariadna Matamoros Fernandez)

Analysing and visualising geospatial data (Peta Mitchell)

 

Share/Bookmark

A Mini Adventure to Repository Fringe 2016

After 6 years of being Repository Fringe‘s resident live blogger this was the first year that I haven’t been part of the organisation or amplification in any official capacity. From what I’ve seen though my colleagues from EDINA, University of Edinburgh Library, and the DCC did an awesome job of putting together a really interesting programme for the 2016 edition of RepoFringe, attracting a big and diverse audience.

Whilst I was mainly participating through reading the tweets to #rfringe16, I couldn’t quite keep away!

Pauline Ward at Repository Fringe 2016

Pauline Ward at Repository Fringe 2016

This year’s chair, Pauline Ward, asked me to be part of the Unleashing Data session on Tuesday 2nd August. The session was a “World Cafe” format and I was asked to help facilitate discussion around the question: “How can the respository community use crowd-sourcing (e.g. Citizen Science) to engage the public in reuse of data?” – so I was along wearing my COBWEB: Citizen Observatory Web and social media hats. My session also benefited from what I gather was an excellent talk on “The Social Life of Data” earlier in the event from the Erinma Ochu (who, although I missed her this time, is always involved in really interesting projects including several fab citizen science initiatives).

 

I won’t attempt to reflect on all of the discussions during the Unleashing Data Session here – I know that Pauline will be reporting back from the session to Repository Fringe 2016 participants shortly – but I thought I would share a few pictures of our notes, capturing some of the ideas and discussions that came out of the various groups visiting this question throughout the session. Click the image to view a larger version. Questions or clarifications are welcome – just leave me a comment here on the blog.

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

 

If you are interested in finding out more about crowd sourcing and citizen science in general then there are a couple of resources that made be helpful (plus many more resources and articles if you leave a comment/drop me an email with your particular interests).

This June I chaired the “Crowd-Sourcing Data and Citizen Science” breakout session for the Flooding and Coastal Erosion Risk Management Network (FCERM.NET) Annual Assembly in Newcastle. The short slide set created for that workshop gives a brief overview of some of the challenges and considerations in setting up and running citizen science projects:

Last October the CSCS Network interviewed me on developing and running Citizen Science projects for their website – the interview brings together some general thoughts as well as specific comment on the COBWEB experience:

After the Unleashing Data session I was also able to stick around for Stuart Lewis’ closing keynote. Stuart has been working at Edinburgh University since 2012 but is moving on soon to the National Library of Scotland so this was a lovely chance to get some of his reflections and predictions as he prepares to make that move. And to include quite a lot of fun references to The Secret Diary of Adrian Mole aged 13 ¾. (Before his talk Stuart had also snuck some boxes of sweets under some of the tables around the room – a popularity tactic I’m noting for future talks!)

So, my liveblog notes from Stuart’s talk (slightly tidied up but corrections are, of course, welcomed) follow. Because old Repofringe live blogging habits are hard to kick!

The Secret Diary of a Repository aged 13 ¾ – Stuart Lewis

I’m going to talk about our bread and butter – the institutional repository… Now my inspiration is Adrian Mole… Why? Well we have a bunch of teenage repositories… EPrints is 15 1/2; Fedora is 13 ½; DSpace is 13 ¾.

Now Adrian Mole is a teenager – you can read about him on Wikipedia [note to fellow Wikipedia contributors: this, and most of the other Adrian Mole-related pages could use some major work!]. You see him quoted in two conferences to my amazement! And there are also some Scotland and Edinburgh entries in there too… Brought a haggis… Goes to Glasgow at 11am… and says he encounters 27 drunks in one hour…

Stuart Lewis at Repository Fringe 2016

Stuart Lewis illustrates the teenage birth dates of three of the major repository softwares as captured in (perhaps less well-aged) pop hits of the day.

So, I have four points to make about how repositories are like/unlike teenagers…

The thing about teenagers… People complain about them… They can be expensive, they can be awkward, they aren’t always self aware… Eventually though they usually become useful members of society. So, is that true of repositories? Well ERA, one of our repositories has gotten bigger and bigger – over 18k items… and over 10k paper thesis currently being digitized…

Now teenagers also start to look around… Pandora!

I’m going to call Pandora the CRIS… And we’ve all kind of overlooked their commercial background because we are in love with them…!

Stuart Lewis at Repository Fringe 2016

Stuart Lewis captures the eternal optimism – both around Mole’s love of Pandora, and our love of the (commercial) CRIS.

Now, we have PURE at Edinburgh which also powers Edinburgh Research Explorer. When you looked at repositories a few years ago, it was a bit like Freshers Week… The three questions were: where are you from; what repository platform do you use; how many items do you have? But that’s moved on. We now have around 80% of our outputs in the repository within the REF compliance (3 months of Acceptance)… And that’s a huge change – volumes of materials are open access very promptly.

So,

1. We need to celebrate our success

But are our successes as positive as they could be?

Repositories continue to develop. We’ve heard good things about new developments. But how do repositories demonstrate value – and how do we compare to other areas of librarianship.

Other library domains use different numbers. We can use these to give comparative figures. How do we compare to publishers for cost? Whats our CPU (Cost Per Use)? And what is a good CPU? £10, £5, £0.46… But how easy is it to calculate – are repositories expensive? That’s a “to do” – to take the cost to run/IRUS cost. I would expect it to be lower than publishers, but I’d like to do that calculation.

The other side of this is to become more self-aware… Can we gather new numbers? We only tend to look at deposit and use from our own repositories… What about our own local consumption of OA (the reverse)?

Working within new e-resource infrastructure – http://doai.io/ – lets us see where open versions are available. And we can integrate with OpenURL resolvers to see how much of our usage can be fulfilled.

2. Our repositories must continue to grow up

Do we have double standards?

Hopefully you are all aware of the UK Text and Data Mining Copyright Exception that came out from 1st June 2014. We have massive massive access to electronic resources as universities, and can text and data mine those.

Some do a good job here – Gale Cengage Historic British Newspapers: additional payment to buy all the data (images + XML text) on hard drives for local use. Working with local informatics LTG staff to (geo)parse the data.

Some are not so good – basic APIs allow only simple searchers… But not complex queries (e.g. could use a search term, but not e.g. sentiment).

And many publishers do nothing at all….

So we are working with publishers to encourage and highlight the potential.

But what about our content? Our repositories are open, with extracted full-text, data can be harvested… Sufficient but is it ideal? Why not do bulk download from one click… You can – for example – download all of Wikipedia (if you want to).  We should be able to do that with our repositories.

3. We need to get our house in order for Text and Data Mining

When will we be finished though? Depends on what we do with open access? What should we be doing with OA? Where do we want to get to? Right now we have mandates so it’s easy – green and gold. With gold there is PURE or Hybrid… Mixed views on Hybrid. Can also publish locally for free. Then for gree there is local or disciplinary repositories… For Gold – Pure, Hybrid, Local we pay APCs (some local option is free)… In Hybrid we can do offsetting, discounted subscriptions, voucher schemes too. And for green we have UK Scholarly Communications License (Harvard)…

But which of these forms of OA are best?! Is choice always a great thing?

We still have outstanding OA issues. Is a mixed-modal approach OK, or should we choose a single route? Which one? What role will repositories play? What is the ultimate aim of Open Access? Is it “just� access?

How and where do we have these conversations? We need academics, repository managers, librarians, publishers to all come together to do this.

4. Do we now what a grown-up repository look like? What part does it play?

Please remember to celebrate your repositories – we are in a fantastic place, making a real difference. But they need to continue to grow up. There is work to do with text and data mining… And we have more to do… To be a grown up, to be in the right sort of environment, etc.

 

Q&A

Q1) I can remember giving my first talk on repositories in 2010… When it comes to OA I think we need to think about what is cost effective, what is sustainable, why are we doing it and what’s the cost?

A1) I think in some ways that’s about what repositories are versus publishers… Right now we are essentially replicating them… And maybe that isn’t the way to approach this.

And with that Repository Fringe 2016 drew to a close. I am sure others will have already blogged their experiences and comments on the event. Do have a look at the Repository Fringe website and at #rfringe16 for more comments, shared blog posts, and resources from the sessions. 

Share/Bookmark