A Summer of New Digital Footprints…

It has been a while since I’ve posted something other than a liveblog here but it has been a busy summer so it seems like a good time to share some updates…

A Growing Digital Footprint

Last September I was awarded some University of Edinburgh IS Innovation Fund support to develop a pilot training and consultancy service to build upon the approaches and findings of our recent PTAS-funded Managing Your Digital Footprint research project.

During that University of Edinburgh-wide research and parallel awareness-raising campaign we (my colleague – and Digital Footprint research project PI – Louise Connelly of IAD/Vet School, myself, and colleagues across the University) sought to inform students of the importance of digital tracks and traces in general, particularly around employment and “eProfessionalism”. This included best practice advice around use of social media, personal safety and information security choices, and thoughtful approaches to digital identity and online presences. Throughout the project we were approached by organisations outside of the University for similar training, advice, and consulting around social media best practices and that is how the idea for this pilot service began to take shape.

Over the last few months I have been busy developing the pilot, which has involved getting out and about delivering social media training sessions for clients including NHS Greater Glasgow and Clyde (with Jennifer Jones); for the British HIV Association (BHIVA) with the British Association for Sexual Health and HIV (BASHH) (also with Jennifer Jones); developing a “Making an Impact with your Blog” Know How session for the lovely members of Culture Republic; leading a public engagement session for the very international gang at EuroStemCell, and an “Engaging with the Real World” session for the inspiring postgrads attending the Scottish Graduate School of Social Science Summer School 2016. I have also been commissioned by colleagues in the College of Arts, Humanities and Social Sciences to create an Impact of Social Media session and accompanying resources (the latter of which will continue to develop over time). You can find resources and information from most of these sessions over on my presentations and publications page.

These have been really interesting opportunities and I’m excited to see how this work progresses. If you do have an interest in social media best practice, including advice for your organisation’s social media practice, developing your online profile, or managing your digital footprint, please do get in touch and/or pass on my contact details. I am in the process of writing up the pilot and looking at ways myself and my colleagues can share our expertise and advice in this area.

Adventures in MOOCs and Yik Yak

So, what next?

Well, the Managing Your Digital Footprint team have joined up with colleagues in the Language Technology Group in the School of Informatics for a new project looking at Yik Yak. You can read more about the project, “A Live Pulse: Yik Yak for Understanding Teaching, Learning and Assessment at Edinburgh“, on the Digital Education Research Centre website. We are really excited to explore Yik Yak’s use in more depth as it is one of a range of “anonymous” social networking spaces that appear to be emerging as important alternative spaces for discussion as mainstream social media spaces lose favour/become too well inhabited by extended families, older contacts, etc.

Our core Managing Your Digital Footprint research also continues… I presented a paper, co-written with Louise Connelly, at the European Conference on Social Media 2016 this July on “Students’ Digital Footprints: curation of online presences, privacy and peer supportâ€�. This summer we also hosted visiting scholar Rachel Buchanan of University of Newcastle, Australia who has been leading some very interesting work into digital footprints across Australia. We are very much looking forward to collaborating with Rachel in the future – watch this space!

And, more exciting news: my lovely colleague Louise Connelly (University of Edinburgh Vet School) and I have been developing a Digital Footprint MOOC which will go live later this year. The MOOC will complement our ongoing University of Edinburgh service (run by IAD) and external consultancy word (led by us in EDINA) and You can find out much more about that in this poster, presented at the European Conference on Social Media 2016, earlier this month…

Preview of Digital Footprint MOOC Poster

Alternatively, you could join me for my Cabaret of Dangerous Ideas 2016 show….

Cabaret of Dangerous Ideas 2016 - If I Googled You, What Would I Find? Poster

The Cabaret of Dangerous Ideas runs throughout the Edinburgh Fringe Festival but every performance is different! Each day academics and researchers share their work by proposing a dangerous idea, a provocative question, or a challenge, and the audience are invited to respond, discuss, ask difficult questions, etc. It’s a really fun show to see and to be part of – I’ve now been fortunate enough to be involved each year since it started in 2013. You can see a short video on #codi2016 here:

In this year’s show I’ll be talking about some of those core ideas around managing your digital footprint, understanding your online tracks and traces, and reflecting on the type of identity you want to portray online. You can find out more about my show, If I Googled You What Would I Find, in my recent “25 Days of CODI” blog post:

25 Days of CoDI: Day 18

You’ll also find a short promo film for the series of data, identity, and surveillance shows at #codi2016 here:

So… A very busy summer of social media, digital footprints, and exciting new opportunities. Do look out for more news on the MOOC, the YikYak work and the Digital Footprint Training and Consultancy service over the coming weeks and months. And, if you are in Edinburgh this summer, I hope to see you on the 21st at the Stand in the Square!



A Mini Adventure to Repository Fringe 2016

After 6 years of being Repository Fringe‘s resident live blogger this was the first year that I haven’t been part of the organisation or amplification in any official capacity. From what I’ve seen though my colleagues from EDINA, University of Edinburgh Library, and the DCC did an awesome job of putting together a really interesting programme for the 2016 edition of RepoFringe, attracting a big and diverse audience.

Whilst I was mainly participating through reading the tweets to #rfringe16, I couldn’t quite keep away!

Pauline Ward at Repository Fringe 2016

Pauline Ward at Repository Fringe 2016

This year’s chair, Pauline Ward, asked me to be part of the Unleashing Data session on Tuesday 2nd August. The session was a “World Cafe” format and I was asked to help facilitate discussion around the question: “How can the respository community use crowd-sourcing (e.g. Citizen Science) to engage the public in reuse of data?” – so I was along wearing my COBWEB: Citizen Observatory Web and social media hats. My session also benefited from what I gather was an excellent talk on “The Social Life of Data” earlier in the event from the Erinma Ochu (who, although I missed her this time, is always involved in really interesting projects including several fab citizen science initiatives).


I won’t attempt to reflect on all of the discussions during the Unleashing Data Session here – I know that Pauline will be reporting back from the session to Repository Fringe 2016 participants shortly – but I thought I would share a few pictures of our notes, capturing some of the ideas and discussions that came out of the various groups visiting this question throughout the session. Click the image to view a larger version. Questions or clarifications are welcome – just leave me a comment here on the blog.

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016


If you are interested in finding out more about crowd sourcing and citizen science in general then there are a couple of resources that made be helpful (plus many more resources and articles if you leave a comment/drop me an email with your particular interests).

This June I chaired the “Crowd-Sourcing Data and Citizen Science” breakout session for the Flooding and Coastal Erosion Risk Management Network (FCERM.NET) Annual Assembly in Newcastle. The short slide set created for that workshop gives a brief overview of some of the challenges and considerations in setting up and running citizen science projects:

Last October the CSCS Network interviewed me on developing and running Citizen Science projects for their website – the interview brings together some general thoughts as well as specific comment on the COBWEB experience:

After the Unleashing Data session I was also able to stick around for Stuart Lewis’ closing keynote. Stuart has been working at Edinburgh University since 2012 but is moving on soon to the National Library of Scotland so this was a lovely chance to get some of his reflections and predictions as he prepares to make that move. And to include quite a lot of fun references to The Secret Diary of Adrian Mole aged 13 ¾. (Before his talk Stuart had also snuck some boxes of sweets under some of the tables around the room – a popularity tactic I’m noting for future talks!)

So, my liveblog notes from Stuart’s talk (slightly tidied up but corrections are, of course, welcomed) follow. Because old Repofringe live blogging habits are hard to kick!

The Secret Diary of a Repository aged 13 ¾ – Stuart Lewis

I’m going to talk about our bread and butter – the institutional repository… Now my inspiration is Adrian Mole… Why? Well we have a bunch of teenage repositories… EPrints is 15 1/2; Fedora is 13 ½; DSpace is 13 ¾.

Now Adrian Mole is a teenager – you can read about him on Wikipedia [note to fellow Wikipedia contributors: this, and most of the other Adrian Mole-related pages could use some major work!]. You see him quoted in two conferences to my amazement! And there are also some Scotland and Edinburgh entries in there too… Brought a haggis… Goes to Glasgow at 11am… and says he encounters 27 drunks in one hour…

Stuart Lewis at Repository Fringe 2016

Stuart Lewis illustrates the teenage birth dates of three of the major repository softwares as captured in (perhaps less well-aged) pop hits of the day.

So, I have four points to make about how repositories are like/unlike teenagers…

The thing about teenagers… People complain about them… They can be expensive, they can be awkward, they aren’t always self aware… Eventually though they usually become useful members of society. So, is that true of repositories? Well ERA, one of our repositories has gotten bigger and bigger – over 18k items… and over 10k paper thesis currently being digitized…

Now teenagers also start to look around… Pandora!

I’m going to call Pandora the CRIS… And we’ve all kind of overlooked their commercial background because we are in love with them…!

Stuart Lewis at Repository Fringe 2016

Stuart Lewis captures the eternal optimism – both around Mole’s love of Pandora, and our love of the (commercial) CRIS.

Now, we have PURE at Edinburgh which also powers Edinburgh Research Explorer. When you looked at repositories a few years ago, it was a bit like Freshers Week… The three questions were: where are you from; what repository platform do you use; how many items do you have? But that’s moved on. We now have around 80% of our outputs in the repository within the REF compliance (3 months of Acceptance)… And that’s a huge change – volumes of materials are open access very promptly.


1. We need to celebrate our success

But are our successes as positive as they could be?

Repositories continue to develop. We’ve heard good things about new developments. But how do repositories demonstrate value – and how do we compare to other areas of librarianship.

Other library domains use different numbers. We can use these to give comparative figures. How do we compare to publishers for cost? Whats our CPU (Cost Per Use)? And what is a good CPU? £10, £5, £0.46… But how easy is it to calculate – are repositories expensive? That’s a “to do” – to take the cost to run/IRUS cost. I would expect it to be lower than publishers, but I’d like to do that calculation.

The other side of this is to become more self-aware… Can we gather new numbers? We only tend to look at deposit and use from our own repositories… What about our own local consumption of OA (the reverse)?

Working within new e-resource infrastructure – http://doai.io/ – lets us see where open versions are available. And we can integrate with OpenURL resolvers to see how much of our usage can be fulfilled.

2. Our repositories must continue to grow up

Do we have double standards?

Hopefully you are all aware of the UK Text and Data Mining Copyright Exception that came out from 1st June 2014. We have massive massive access to electronic resources as universities, and can text and data mine those.

Some do a good job here – Gale Cengage Historic British Newspapers: additional payment to buy all the data (images + XML text) on hard drives for local use. Working with local informatics LTG staff to (geo)parse the data.

Some are not so good – basic APIs allow only simple searchers… But not complex queries (e.g. could use a search term, but not e.g. sentiment).

And many publishers do nothing at all….

So we are working with publishers to encourage and highlight the potential.

But what about our content? Our repositories are open, with extracted full-text, data can be harvested… Sufficient but is it ideal? Why not do bulk download from one click… You can – for example – download all of Wikipedia (if you want to).  We should be able to do that with our repositories.

3. We need to get our house in order for Text and Data Mining

When will we be finished though? Depends on what we do with open access? What should we be doing with OA? Where do we want to get to? Right now we have mandates so it’s easy – green and gold. With gold there is PURE or Hybrid… Mixed views on Hybrid. Can also publish locally for free. Then for gree there is local or disciplinary repositories… For Gold – Pure, Hybrid, Local we pay APCs (some local option is free)… In Hybrid we can do offsetting, discounted subscriptions, voucher schemes too. And for green we have UK Scholarly Communications License (Harvard)…

But which of these forms of OA are best?! Is choice always a great thing?

We still have outstanding OA issues. Is a mixed-modal approach OK, or should we choose a single route? Which one? What role will repositories play? What is the ultimate aim of Open Access? Is it “just� access?

How and where do we have these conversations? We need academics, repository managers, librarians, publishers to all come together to do this.

4. Do we now what a grown-up repository look like? What part does it play?

Please remember to celebrate your repositories – we are in a fantastic place, making a real difference. But they need to continue to grow up. There is work to do with text and data mining… And we have more to do… To be a grown up, to be in the right sort of environment, etc.



Q1) I can remember giving my first talk on repositories in 2010… When it comes to OA I think we need to think about what is cost effective, what is sustainable, why are we doing it and what’s the cost?

A1) I think in some ways that’s about what repositories are versus publishers… Right now we are essentially replicating them… And maybe that isn’t the way to approach this.

And with that Repository Fringe 2016 drew to a close. I am sure others will have already blogged their experiences and comments on the event. Do have a look at the Repository Fringe website and at #rfringe16 for more comments, shared blog posts, and resources from the sessions. 


Social Media for Learning in Higher Education 2015 (#SocMedHE15) Conference – LiveBlog

Today I’m here at Sheffield Hallam University today for Social Media for Learning in Higher Education 2015 (follow #SocMedHE15) where myself and Louise Connelly (from UoE Royal (Dick) Veterinary School) will be presenting some of our Managing Your Digital Footprint research later today.

I’ll be liveblogging but, as the wifi is a little variable, there may be a slight delay in these posts. As usual, as this is a liveblog,


At the moment we are being welcomed to the day by Sheffield Hallam’s Pro Vice Chancellor who is welcoming us to the day and highlighting that there are 55 papers from 38 HEIs. The hope is that today will generate new conversations and communities, and for those to keep going – and the University is planning to run the conference again next year.

Keynote by Eric Stoller

We are starting with a very heavily Star Wars themed video introducing Eric and his talk….

When he thinks about his day it has no clear pattern, and includes a lot of watching videos, exploring what others are doing… And I’m a big fan on Twitter polls (he polls the room – a fair few of us use them) and when you poll people about how universities are using social media we are seeing use for marketing and communications, teaching and learning, a whole range of activities…

There are such a range of channels out there… Snapchat, how many of you are Snapchatters? (fair few) and how many of you take screen shots? How about Reddit… yeah, there are a few of us, usually the nerdy folk… YikYak… I’m avoiding that to avoid Star Wars spoilers right now… Lots of sites out there…

And now what we say online matters. That is game changing… We have conversations in this auditorium and that doesn’t get shared beyond the room… But online our comments reaches out beyond this room… And that can be where we get into trouble around our digital identity. We can really thank Marc Prensky for really messing things up here with his Digital Natives idea… Dave White brilliantly responded to that, though few seemed to read it!

But there are some key issues here. Social media blurs professional and personal identities…

My dad was checking out Facebook but he’s not on Facebook, he was using my mothers account… My parents have given me a range of interesting examples of people blurring between different spaces… So my mom added me on Facebook.. Is she my friend? I think she has a different designation. I got on there and she already had 8 friends – how did they get there first? Anyway she is experiencing Facebook in a way that I haven’t for years… My mom joined Facebook in 2014 (“I wanted to make sure it wasn’t a fad”) and when you have 8 friends you truly see everything… She sees people that she doesn’t know making fun of, saying snarky things to, her child (me)… We’ve never really had a space where we have that blurring of people. So, my mom hops into a comment thread to defend me… And then people make fun of her… So I have to defend her… We haven’t really adapted and evolved our ways of being professional, of managing relationships for this space yet.

One thing we haven’t come to terms with is the idea of leadership in social media. No matter who you are you can educate, promote, etc. One of my favourite leaders on social media is in the US, president of the University of Cincinnati (@PrezOno). He has a lot of followers and engagement. Typically if your academics, your leaders, are using social media and sharing their work and insights, that says a lot about the organisational culture you are trying to build and encourage.

When you are thinking about employability (and man, you can’t miss this University’s employability office)… It’s about personal brand – what you post and say matters… It’s being human.

Facebook has been around 11 years now, it’s massive… There are over 1 billion users… In fact in September there were over 1 billion in a single day. But people don’t use it in the same ways they did previously… Look at institutions with an older cohort age then Facebook is where it’s at.

I have this quote from the University of Edinburgh’s Managing Your Digital Footprint account that 90% of bosses use Facebook to vet candidates… Which is potentially an issue… As students don’t always post that carefully or with an awareness of how their comments appear later on…

As a consultant I tell people not to fall in love with one platform, but I’m a little in love with Twitter. And there are really interesting things taking place there. We have things like #LTHEchat – a discussion of technology in education. And this is a space where comments are kind of preserved… But that can include silly comments, things we don’t want to stick around. And I love when universities connect students to alumni… We have to think about criticality and digital literacy in these spaces too…

Different spaces also work for different uses… Some love Vine, those 6 second videos. And when we think about teaching we want to talk about story telling some of the YouTube vloggers are a create place to learn about creating narrative and story. So, for instance, Casey Neilson, a vlogger who has also directed commercials for brands like Nike, is a great person to watch. For example his video on Haters and Losers… [we are now watching videos]

How many of you are on LinkedIn? [we mostly are] I assume those not on LinkedIn don’t have a job… There is huge amounts of useful stuff on there, including organisational pages… But it doesn’t always have a great reputation [shows a meme about adding you as a connection]. This is a space where we get our recommendations, our endorsements. Right now LinkedIn is a powerful place. LinkedIn is the only major social media site where there are more users ages 30-49 than 18-29 year olds [stat from Pew Research]. How many here work in employability or careers? You get that thing where students only approach you 5 minutes before they leave… They should really be getting on LinkedIn earlier. People can be weird about adding their students – it’s not about adding your students as friends, its an opportunity to recommend and support each other – much better there than Rate My Professor.

I wanted to show this tweet from the Association of Colleges that “soft skills should be called human skills. Soft makes it sound inferior, which we all know they’re not”. Those soft skills are part of what we do with social media…

When I moved to the UK – my wife got a promotion – and I, as a consultant, had all my networks in the US… But I also had social media contacts in the UK… And I was able to use LinkedIn groups, connections, etc. to build relationships in the UK, to find my way into the higher education sector here. I was talking to a LinkedIn rep last week at Princeton… What do you think the number one activity is on LinkedIn? It’s lurking… And I did a lot of strategic lurking…

So, we have these new spaces but we also have some older online spaces to remember…. So, for instance, what happens when you Google yourself? And that’s important to do… Part of what students are doing when they build up their profile online is to be searchable… To have great presence there.

And email still matters. How many of you love email? [one does] And how many of us have checked email today? [pretty much all]. We are all professional email checkers in a way… Email works if we do it right… But we don’t. We send huge long messages, we reply all to unsubscribe… It’s not surprising if students don’t get that [cue a tweet that shows an email tactically bearing a subject line about free football tix miraculously was received by students].

How many of you are concerned about privacy on social media? It’s always a huge concern. We have spaces like Snapchat – ephemeral except some of you take screen shots – and Yik Yak. We’ve already had issues with Yik Yak – a lecturer walked out when she saw horrible things people were posting about here… But Yik Yak tends to be sex and drugs and Netflix… Also a lot of revision…

And we have Periscope. Twitter owns it now, so who knows where that will go… It’s a powerful tool to have… You can livestream video from anywhere, which used to be hugely difficult and expensive. And you get comments and discussion.

And you don’t need to always do social media by posting, there is so much to listen and learn from…

The student experience is holistic. Social media, just like it blurs personal and professional selves, the same thing happens with teaching and learning and higher education. There are not separate entities in an organisation now… academic advising, careers services, induction/orientation, first year success, mental health/wellness…. So much learning happens in this space, and it’s not necessarily formal…

There is no such thing as a digital native… there are people learning and trying things…

So, now, some Q&A.


Q1) When you see lecturers named on YikYak… Can you really just ignore it?

A1) On YikYak the community can downvote unpleasant bad things. In the US a threat can be prosecuted [also in the UK, where hate speech laws also apply]. But if I say something insulting it’s not necessarily illegal… It’s just nasty… You get seasonal trolling – exam time, venting… But we have to crack the nut about why people are doing and saying this stuff… It’s not new, the app just lets us see it. So you can downvote. You can comment (positively). We saw that with Twitter, and we still see that on Twitter. People writing on pointed issues still get a lot of abuse… Hate speech, bullying, it’s not new… it’s bigger than social media… It’s just reflected by social media.

Q2) On the conference hashtag people are concerned about going into the open spaces… and particularly the ads in these spaces…

A2) I am a big fan of adblock in Chrome. But until this stuff becomes a public utility, we have to use the tools that have scale and work the best. There are tools that try to be Facebook and Twitter without the ads… It’s like telling people to leave a party and go to an empty room… But if you use Google you are being sold… I have so much commercial branded stuff around me. When our communications are being sold… That gets messy… Instagram a while back wanted to own all the photos shared but there was a revolt from photographers and they had to go back on that… The community changed that. And you have to block those who do try to use you or take advantage (e.g. generating an ad that says Eric likes University of Pheonix, you should too… ).

Q3) I find social media makes me anxious, there are so many issues and concerns here…

A3) I think we are in a world where we need discipline about not checking our phone in the middle of the night… Don’t let these things run your life… If anything causes you anxiety you have to manage that, you have to address that… You all are tweeting, my phone will have notifications… I’ll check it later… That’s fine… I don’t have to reply to everyone…

Q4) You talked about how we are all professional emailers… To what extent is social media also part of everybody’s job now? And how do we build social media in?

A4) In higher ed we see digital champions in organisations… Even if not stated. Email is assumed in our job descriptions… I think social media is starting to weave in in the same ways… We are still feeling out how social media fits into the fabric of our day… The learning curve at the beginning can feel steep if everything is new to you… Twitter took me a year or two to embed in my day, but I’ve found it effective, efficient, and now it’s an essential part of my day. But it’s nice when communication and engagement is part of a job description, it frees people to do that with their day, and ties it to their review process etc.

Workshops 1: Transforming learning by understanding how students use social media as a different space – Andrew Middleton, Head of Academic Practice and Learning Innovation, LEAD, Sheffield Hallam University

I’m assuming that, having come to a conference on social media in learning, you are passionate about learning and teaching… And I think we have to go back to first principles…

Claudia Megele (2015) has, I think, got it spot on about pedagoguey. We are experiencing “a paradigm shift that requires a comprehensive rethink and reconceptualisation of higher education in a rapidly changing socio-technological context where the definition straddles formal and informal behaviours” [check that phrasing].

When we think about formal, that tends to mean spaces like we are in at the moment. Michael Errow makes the point that non-formal is different, something other than the formal teaching and learning space. In a way one way to look at this is to think about disruption, and disrupting the formal. Because of the media and technologies we use, we are disrupting the formal… In that keynote everyone was in what Eric called the “praying” position – all on our phones and laptops… We have changed in these formal spaces… Through our habits and behaviours we are changing our idea of formal, creating our own (parallel) informal space. What does that mean for us as teachers… We have to engage in this non-formal space. From provided to self-constructed, from isolated to connected learning, from directed to self-determined, from construction to co-construction, from impersonal to social, and from the abstract and theoretical to authentic and practical (our employers brief our students through YouTube, through tweet chats – eg a student oncology tweet chat)


11:20-11:35 – Refreshment Break

11:35-12:05 – Short Papers 1

12:10-12:40 – Short Papers 2

12:40-13:40 – Lunch

13:40-14:40 – Workshops 2 (afternoon) 

14:40-14:55 – Refreshment Break

14:55-15:25 – Short Papers 3

15:30-16:00 – Short Papers 4

16:00 – Conference ends


Apply to become EDINA’s new Social Media Officer!

I am very excited to announce that the advert for our new EDINA Social Media Officer job (full time, 2 year fixed term) has just gone live on the University of Edinburgh jobs site! Read the full ad, and apply, here.

As some of you will be aware I moved into a new role at EDINA, as Jisc MediaHub Service Manager and Digital Education Manager, back in February (a role that I share with my lovely new colleague Lorna Campbell). I am still passionate about social media and communication of course, but I have officially handed on the Social Media Officer baton ready for someone new…

So, what can I say to encourage you to apply?

Well firstly, EDINA is a lovely place to work – we are a friendly bunch and the organisation is big enough to include a diverse range of people with super skills and expertise, but it’s still small enough to get to know everyone, find out what we’re all working on, etc. As an organisation we work on some fantastic online services and really innovative projects, which means that there are loads of great opportunities to communicate and engage using both mainstream and emerging social media channels.

As EDINA is based at the University of Edinburgh we also benefit from the wisdom and opportunities across Information Services, and the wider organisation. Although you’ll see more on pay, terms, and holiday entitlement in the job ad I should add that EDINA also benefits from some excellent in-house baking as part of an ongoing charity bake sale!

The Social Media Officer was created back in 2009 and I have to say that I hugely enjoyed my time in the role so heartily recommend it! My colleagues have always been enthusiastic about exploring new technologies and ways to communicate, and are a skilled and experienced bunch so, whilst the job has evolved reflecting the maturity of social media tools, and their use as core communications channels, but it remains an exciting post with lots of interesting opportunities. And the role sits in our User Support team, a very welcoming crew genuinely committed to providing the best experience for our users, including thousands of students, staff and researchers across (and sometimes beyond) the UK HE and FE sectors.

As you’ll see from the ad, our new Social Media Officer will have a particular focus on communicating our EU FP7-funded COBWEB: Citizen Observatory Web project, which means engaging with citizen science and local communities across several UNESCO Biosphere pilot locations in Wales, Greece and Germany. That also means working with a wider range of communications channels and approaches, and working with colleagues in an excellent group of partner organisations across Europe – and that means there’s likely to be a wee bit of travel too!

So, please do take a look at the job ad, see if it might be right for you (or someone you know), and get applying!

Edit: Please note that applications close at 5pm on Tuesday 9th June 2015.

All those important links… 


University Business magazine mention for EDINA and UoE

Last month I had a request through for an interview on social media for University Business magazine, which focuses on (as the title suggests), the business and administration side of universities. That request proved to be a really good opportunity to look back and reflect on what has been happening with social media across the last 5-10 years, including some awesome innovative activities at the University of Edinburgh, many of which – such as social media guidance and advise – EDINA have been part of.

Front cover image of University Business Magazine.

The front cover of the latest issue (81) of University Business magazine.

I’m really pleased to see that some of my comments on the use of social media at Edinburgh and in the wider HE sector have made it into the latest issue (Issue 81, pp 65-8). And I’m particularly glad to see that the Managing Your Digital Footprint campaign is part of those comments as it is a really ambitious project that will hopefully have findings of use for the much wider sector.

You can read the full article – which looks at social media at a number of institutions – online here (pages 65-68).


New project: Managing Your Digital Footprint

This Monday (29th September 2014) the Managing Your Digital Footprint project launched across the University of Edinburgh.  I’m hugely excited about this project as it is a truly cross-University initiative that has been organised by a combination of academic departments, support services and the student association all working together, indeed huge thanks and respect are due to Louise Connelly at IAD for bringing this ambitious project together.

I am representing EDINA across both of the project’s strands: a digital footprint awareness-raising campaign for all students (UG, PGT, ODL, PhD) which is led by the Institute for Academic Development (IAD) in collaboration with EDINA, the Careers Service, EUSA, Information Services, and other University departments; and a research project, a collaboration between IAD, the School of Education, EDINA and EUSA, which will examine how students are managing their digital footprints, where such management is lacking, and what this might mean for future institutional planning to build student competence in this area.

Before saying more about the project it is useful to define what a “digital footprint” might be. The best way to start that is with this brilliant wee video made specially for the campaign:

Click here to view the embedded video.

Digital footprints, or the tracks and traces you leave across the internet, are a topic that frequently comes up in my day to day role as social media officer, and is also the focus of a guest week I provide for the MSc in Digital Education’s IDEL (Introduction to Digital Environments for Learning) module. Understanding how your privacy and personal data (including images, tags, geo locations) are used is central to making the most appropriate, effective, and safe use of social media, or any other professional or personal presences online. Indeed if you look to danah boyd’s work on teens on Facebook, or Violet Blue’s writings on real name policies on Google+ you begin to get a sense of the importance of understanding the rules of engagement, and the complexities that can arise from a failure to engage, or from misunderstanding and/or a desire to subvert the rules and expectations of these spaces. What you put online, no matter how casually, can have a long-term impact on the traces, the “footprints” that you leave behind long after you have moved on from the site/update/image/etc.

When I give talks or training sessions on social media I always try to emphasize the importance of doing fewer things well, and of providing accurate and up to date bios, ensuring your privacy settings are as you expect them to be, and (though it can be a painful process) properly understanding the terms and conditions to sites that you are signing up for, particularly for professional presences. Sometimes I need to help those afraid to share information to understand how to do so more knowledgeably and safely, sometimes it is about helping very enthusiastic web/social media users to reflect on how best to manage and review their presences. These are all elements of understanding your own digital footprints – though there are many non-social media related examples as well. And it is clear that, whilst this particular project is centered on the University of Edinburgh, there is huge potential here for the guidance, resources, reflections and research findings from the Managing Your Digital Footprint project to inform best practice in teaching, support and advice, and policy making across the HE sectors.

So, look out for more on my contributions to the Managing Your Digital Footprint campaign – there should be something specifically looking at issues around settings very soon. In the meantime  anyone reading this who teaches/supports or who is a student at the University of Edinburgh should note that there will also be various competitions, activities, workshops, resources and advice throughout 2014-2015, which will focus on how to create and manage a positive online presence (digital footprint), and which should support students in their: professional networking; finding the right job; collaborating with others; keeping safe online; managing your privacy and the privacy of others; how to set up effective social media profiles; using social media for research and impact.

Digital Footprint campaign logo

The Digital Footprint project logo – anyone based at the University of Edinburgh will be seeing a lot of this over the coming months!

The research strand of the project is also underway but don’t expect anything more about that for a wee while – there will be a lot of data collection, analysis and writing up to do before we are ready to share findings. I’ll make sure to share appropriate updates and links here as appropriate. And, of course, questions and comments are welcome – just add yours to this post.

Find out more


European Conference on Social Media (#ECSM2014) Day 2 LiveBlog

Today I am, once again, at the European Conference on Social Media (#ECSM2014) at the University of Brighton. I will be presenting my paper, “Learning from others mistakes: how social media etiquette distorts informal learning onlineâ€� this afternoon but until then I will be blogging the talks I attend. As usual this is a live blog so please let me know if you spot any errors or omissions and I’ll be happy to fix them. 

Taking Education into Cyberspace – Chaos, Crisis and Community – John Traxler

I think my title here is slightly polluted by my perspective as a lecture in mobile learning, but I was trying to capture two thoughts that were colliding in my head: how increasingly educationally problematic cyberspace actually is; but also the idea of moving away from increasingly pointless short term technology driven projects and working with long term projects with the UN etc. concentrating on the impact of cyberspace in other cultures and languages. And part of that is because technology is culturally specific.

So, to some respect, looking back on my work it has been about the possibilities for the use of mobile technologies and cyberspace in learning. About extending the reach of education, opening up education to those who may otherwise have challenges to access. So that would apply to my work in Kenya, where issues are about infrastructure, but also can be about socio-economic and cultural barriers. But we also see mobile technology enabling more personalised, more location specific, opportunities for education. And mobile technology can change the ways in which education can be understood or theorised. In the UK Diane Laurillard’s conversational model is widely accepted and that tends to be about a simple set of checkboxes in some senses now, so it is important to continue pushing the theory, extending it. And I think that it is important to engage learners, particularly disenfranchised learners. So we want to challenge existing theories and reach out and reconsider education.

But in some ways that can be a backwards looking process, the idea that it is different… an elite technology that should be researchered and then trickled down, the JISC type model of thinking about technology/funding, and that approach can get us into a treadmill of forever trying out “innovation” technologies and miss the bigger picture that these scarce technologies are actually adopted by the wider world as commonplace, familiar. The rest of the world may be using this stuff that may be challenging what we do as educationalists. Even if you regard the world of education as merely servicing the economy – and I’m not saying that I do – then the economy is changing wildly. The economy is driven by digital technologies at a personal model – the tools, technologies, etc. that we use personally – but also in the sense of business models, the way that resources are changing. Bandwidth is like discovering oil – the 4G spectrum sales by government is like North Sea Oil all over again. It is changing the economy, and the things that move around in the economy.

And in terms of what happens at national level, I work with UNRA which works with the Palestinian refugee community and in that context the Israeli state governs mobile infrastructure so the technology there is a political issue.

So, digital changes what we trade, and keep, and value. So an example here, if you take tazers, which you can buy as retail items in the US, now have decorative holsters with MP3 players – the manufacturers say “putting the cute in electrocute”! That’s a whole artefact that never existed before the digital world!

Mobile also changes the nature of work, of supervision. The work that shapes the economy is being shaped by the ubiquitous mobile access to work, the changing patterns of access to information and connectivity. And we are increasingly see the idea of “performative support” – where information, guidance, support comes from within cyberspace. This is a step beyond just-in-time learning. It is like the Hitchikers Guide idea of a Babel Fish and that really challenges the idea of learning, the reasons for learning. That has advantages but it also deskills people, if judgements are made for you, you can lose your autonomy in planning your work routes or priorities for instance. Or your skills may no longer have the same value.

And we see increasing amounts of user generated learning in cyberspace. So, for example, podcasts. You can learn pretty much anything, and from sources outside of academia. I listen to a great deal of late bizantan and medieval history for instance, very little from academia or from the BBC. But those sources may not be accurate or authoritative. And you also see communities – like World of Warcraft – of discussion, production, translation, so many interactions. Although I could argue these people are developing meta cognitive skills, but also we see communities with a shared interest, understanding, corpus, which seems to replicate what we do in Academia but do so wholly separately. Similarly we can think about citizen journalism: the idea that people can capture images, text, audio of an event as it happens. They can share and transmit it without mediation from government or media. People mistakenly talk about it being democratic, I think it’s more demotic. It’s not mediated by traditional institutions BUT it is mediated by Facebook or YouTube or other large and often fairly opaque organisations. But this is a change. The spin of the London bombings citizen journalism was about plucky Londoners, blitz spirit etc. But from another perspective, from the middle east say, you could spin this as brave jihadists spreading chaos. And that points to the importance of criticality. The abundance of materials means that our students really have to be able to sort and sift these types of media. We see increasing transience of information – the cannon is not defined by middle aged European white men but something more democratic but that raises challenges. We see partial, complex, transient viewpoints and information and we have to be able to deal with that.

But that’s a really middle class European view. And I’m interested in other views, and at a number of levels. i think education and cyberspace interact with language, identity, culture. If I look at the way UNESCO or USAID look at education, they see it as delivered by the centre or as delivered by the state. Computers used to industrialise education to some extent. But most of Africa is safe from e-learning. But most of Africa is not safe from mobile and that is problematic. The interface alters the relationship between languages – QWERTY keyboards or alphanumeric keyboards shift the balance from, say, chinese characters, and the english language or transliterated language. It changes the expression of language, and alters the balance. If English is easier to use you may use it in preference to Cyrillic or Arabic etc. I saw this with young kids in Cambodia – and there there was also a cultural cache in using English/American tools and language.

And we see indigenous languages and peoples and technologies connecting with each other. Fragile language communities connected to the global economy can, again, privilege English and threaten those languages. I have worked with communities in Namibia, and their language is about both words and gesture… so for past and future tense they gesture rather than having different words. But mobile interfaces are not designed for their gestures – probably not ours either. We thought, as part of an EU project, that we could customise interfaces to localise them but I am ashamed of the idea that replacing a teacup from a coffee cup is enough, that concept. There does seem to be a real difference between functional or procedural languages versus object orientated languages and how we communicate in cyberspace. Is cyberspace irremediably infected with our values? But then those fragile language communities also appropriate technologies to preserve languages. The Tuva have a dictionary of their language, many of the Native American Nations have dictionaries… but then the issue of ownership for this captured, preserved language information comes up and potentially raises new issues of fragility.

But in terms of communities in Europe, the point that worries me there, is that what is accessed through cyberspace is our vision and not theirs. The state often tries to impose values. We had a project with Roma traveller communities and mobile learning… were we being helpful or were we trying to overwrite their values and communication traditions?

And we also see the idea of Skeumorphism – old fashioned technologies or analogies in interfaces – the floppy disc to save a file – so cyberspace polluted by language but also by the iconography of our past and working history, and not anyone else’s. And cyberspace and the education that opens up is complex. The arab or muslim world is not fragile but there are concerns that our technologies are somehow a trojan horse for western christian cultural values for instance.

But returning to a more conventional thread here… mobile technologies are changing our perceptions of time. You could argue timeliness was invented by John Knox. The literature talks of that paradigm of time being challenged by mobile technologies – we can reschedule our lives, we don’t have to obey Greenwich or Newtonian time. A colleague of mine writes about TV channels in Norway… everyone used to watch the same thing and that gave them a sense of identity… there isn’t that ontological security anymore. And if you look at how connected we are to other parts of the world… I was in Australia, my family were going to bed, in another country my publisher was just getting up… and that’s challenging. We work in fixed institutions in a world that is no longer fixed.

A few years ago I read an article called No Dead Air by Martin Bull? talking about how there is a change with mobile technologies – we carry our own communities, music, and exist in a sort of bubble. The places we inhabit are reconfigured by the opportunities cyberspace give us. That’s a real challenge for education, our institutions are fixed and located. There is also literature of how technology is changing social practices, learning new gestures to live in new spaces. So body languages when we overhear things on the train, enforced eavesdropping. We have a new set of what Goffman calls new “tie signs”, gestures to signify importance or discomfort – around, say, placement of mobiles on tables. And we have this idea of “absent presence” (Guergan) where people are in the room, but also on email, twitter, etc. But an upside to that too – that same concept brings absent others into the room, into the presence.

And we have new ethics, new humour, hierarchies, all different in different communities. I am sure there is humour that doesn’t fly in the World of Warcraft community, say. And we don’t always understand them. And one example we get is the idea of the “missed call”- the call you are not supposed to answer! (e.g. from a taxi driver). We also have the idea of “moral panics” – around literacy, around spelling, about child sex, etc.

So if education is to realise the opportunities of cyberspace we need to think about technology as going into a foreign country. You see JISC Legal developing approaches like this. Facebook, if it were a country, would be third largest in the world, so it really is another country. And we see different attachment to devices – a girl in the Guardian was quoted as saying she’d rather lose a kidney than her mobile – it’s not like the desktop route to cyberspace. And there was a reference to mobiles to being like our privates, in terms of our privacy, protective instinct, etc. You also see naming of children in KwaZulu-Natal like “handsfree”, “simcard” etc. The world we see on mobiles, is not what we are used to…

And another of the downsides… here is a tool designed for guilty New York cat owners tracking their cat. But that also means surveillance of children, by state, you could refer to Leotard, or Foucoult’s Panopticon here. And you can make an argument that cyberspace is a kind of post modernity partial, subjective, Bauman’s liquid modernity… you can be apocalyptic about it. Modernity is founded on language and learning as benign, as good things… and this depiction can undermine that.


Q: You talked about QWERTY keyboards leading to english dominance but have development of other interfaces, haptic interfaces made a difference, or could it?

A: I suspect not as I think the market is against it… not that I’ve heard of…

Q: Even with Japanese and Chinese manufacturers making this

A: Market is not universal though so can happen in one place, but not translating to other native communities, other languages.

Q: Mobiles are about multitasking… but meditation can be another way to become smart… do you see any contradiction between these two ways of becoming smart?

A: Well I have an issue of the idea of mobile learning as a kind of creed, something united there in learning or how we deliver it, I’m more inclined to talk about learning with mobiles. I’m also not sure about multitasking… some researchers would say we are time slicing in ever smaller parts.

Q: In your last part of the talk you talked about mobile as fragmenting experience…are there positive educational aspects there.

A: there is a reformist view of it being the same old stuff, but sexier. Or an apocalyptic view that the institution and education system is bust. There is also a sort of broader view that the world is beset by crisis… debt, deforestation, etc… what is the relationship with technologies… are we complicit.

Q: So I guess I was thinking in terms of actual practice. Many of us are within the academy, teaching… we are in a state of transition… students can pull in Google if we are lucky, Facebook if we are unlucky, during our teaching…

A: that’s the bit I’m not sure about, whether we can co-opt or appropriate what is going on, or whether that is a symptom that the education system is bust!

Q: The thing about saying it’s gone bust… if you see education about transmission of information then of course it is bust. But if it is about inspiring people, understanding the process of certain skills… then it is not bust at all. The technology is only a tool for delivery.

A: That would be a reformist view I think. There is all that information, we can recognise the restrictions, the limitations. We can adapt the metacognitive skills, the inspiration… but do we have a monopoly there versus, say, the World of Warcraft?!

Welcome to Porto – Anabela Mesquita

We are now hearing about the European Conference on Social Media 2015 (scheduled for 9-10th July) location, Porto, from Anabela Mesquita who will be hosting next year’s event in Portugal. I won’t capture that in detail here but having chatted with Anabela over the last few days I am quite sure that it will be a lovely location and that she and her organisation, the Polytechnic Institute of Porto, Portugal, will be wonderful hosts for the second ECSM. Anabela promises sunshine, good food, a beautiful river and the sea.

The event will take place at ISCAP, founded in 1886, one of seven schools in the Polytechnic Institute of Porto. ISCAP is business school there with almost 4000 students across undergraduate, graduate, specialised and post graduate programmes and short courses, crossing areas of business, marketing, commerce, and languages. It includes four research centres: Intercultural Studies; Economic Sciences and Taxation; Communication and Education; Technologies and Information Services. Social media bridges all of those courses and research centres. ISCAP participates in several European projects including a number in lifelong learning areas.

Issues of Using Information Communication Technologies in Higher Education – Paul Oliver and Emma Clayes, University of Highlands and Islands, UK

When we looked into the literature into the use of ICTs in HE we found Reynol (2013) found a complex relationship between Facebook and student engagement and that Facebook use can be negatively related to academic performance and time spent preparing for class. Gikas and Grant (2013) found students concerned about the lack of formal training or support given by their institutions. We took these and other studies into account in our design of this study.

We felt that there were common concerns arising around use of ICTs, especially social media, in education but ethics and views of staff involved were two areas that we felt had been overlooked. So we wanted to focus on practical and ethical issues and focusing on the schools of music and social sciences.

We decided to use surveys to explore student and staff views. We decided to use focus groups as previous studies had used these. And we wanted discussions focused around issues we were interested in, so 6 questions were drafted. We used quota sampling and that was very much about convenience sampling – so no particular social media enthusiasms of those volunteering really. And we conducted two focus groups for each schools, that was to reflect the in-person as well as the online student expereince/course delivery models. The conversations were transcribed and then key themes identified for positive/negative views in particular.

So, what were the findings? Well it seemed only staff were concerned with ethical issues, for instance whether all students would be included in these technologies and the importance of not excluding some students. But there were concerns across both staff and students around ease of access, as many experiences challenges accessing VLEs. And although many were positive about the use of social media, they also reported distractions associated with the use of social media.

So, the social sciences staff were daily positive around the use of ICTs in Higher Education, particularly social media. Some concerns around our VLE and it’s functionality and ease of access. And also concerns about students needing to get used to the VLE. One staff member commented that we are preparing students for the world of work, and that means they do not get to choose what technologies they use, they need to be able to use the chosen tool. Another staff member was concerns about the tone of communication in different spaces, and boundaries there – for instance on Facebook.

Alongside that positivity there were concerns about potential problems of inclusion, legal issues such as those arising from inappropriate posts, and concerns around bullying.

For social sciences students the majority expressed favourable views on ease of access of social media, particularly in comparison to institutional ICTs. They commented, for instance, on the difficulty of commenting and navigating discussions on Blackboard for instance. But they voiced concerns of distractions. They commented that they found it difficult to work from home with the distraction of things like Facebook.

With the staff from music there were really two extremes. One used Facebook with their students because that was the best way to get in touch with them. Considered the space the real world, what others do, and that’s beneficial for students. But another staff member uses Blackboard and was only happy to use Facebook if a specific page for the course. And another spoke about social media being called social media for a reason, it is for social use not for educational use.

For the students there were complaints about accessing webmail and the VLE from home. That was a big issue for students and, being based in the Highlands and Islands they can be very widely distributed geographically so that issue of access was a surprise that way. And there were mixed views around feeling comfortable with using social media for education – not all were equally comfortable with the idea. There were also some interesting ideas – one suggested banning Facebook to eliminate distractions. Another suggested a mobile app that feeds social media through it, or to integrate all ICTs into Facebook. One suggested the great idea of letting social media feed into Blackboard, which seemed like a constructive idea.

So, in conclusion, there was a really mixed set of views here. Students and staff have different but important views with regards to the use of ICTs in education. Access really seems to be critical – blackboard is a good product but having reliable use and access is a really key barrier for staff and students. The study did highlight potential problems that institutions may face with regards to ethical and practical issues. We did have concerns about inclusion voice but very few people voiced these, we were surprised at the lack of concerns. And there was an asymmetry of use – some staff used social media very freely and openly whilst others wanted many more barriers in that use. That variation was an issue, could give a sense of exclusion to some. I think we need to think about guidance. We used to have a blanket ban on social media, now it’s quietly encouraged but I think guidance and training is needed. We need to think about digital inclusion too.

More reflection and metrics on what takes place would be good. However, it may be that social media may always be somewhat informally used in education… as long as alternatives are in place is that a problem? And is it possible to set up features on institutional VLEs to obtain the best of both worlds? To make those key communications elements easier to use, more social media like.

And whilst there are practical issues here we also need to think about what is actually needed or wanted by students. Some really felt social media was a distraction – we can assume all students want social media engagement but that’s not necessarily the case. The most fruitful area moving forward is to think of that bridging the formal and the informal…


Q: When talking about social media were the students thinking about engaging with staff and peers on Facebook, or using pages for courses etc. If mixed use it may explain mixed results?

A: Some of our questions were about thinking about variation of approach, how staff engage with students in different ways in different classes.But we found that Facebook tends to be used by students only, set up by them and with no tutor interaction – and it’s not clear the tutors want that.

Q: some institutions use Facebook pages for particular courses, as a private space, so that conversation is focused in one place.

A: That can work but there are real issues of access and inclusion. But it’s the bridging of informal and formal that we need to look at.

Q: Are blackboard looking at logins via Facebook

A: We’d like to see that. In terms of ethics that’s the difficulty as Blackboard is a safe enclosed space.

Ranking the authenticity of social network members – Dan Ophir, Ariel University, Israel

I am looking for something exceptional – exceptional behaviour – to rank authenticity. I am using some tools here including syntax analysis, quantitative semantics, etc. The aim of this is to find the truth, the authentic internet users. Some parallels here with polygraphs perhaps.

The methodology here is based on a computer assisted cognitive behavioural therapy methodology. CBT was originally developed for psychological treatment and can also be used in measuring the probability of an individuals identity, their conversational or behavioural markers. You can see this in chat examples – where exaggeration might be a marker – or in cross-examination transcripts where certain use of language or emotional responses can, through CBT methodologies, help to identify the individual.

BNF (Bacchus Normal Form) is a computer science concept. In computer science we use programming languages to create a form of truth, very defined concepts. So the Bacchus Normal Form is about simple notation symbols. This is about defining different elements. For instance you define a digit. Then you may define a number as being a digit (having already defined that). This is about declaring terms and doing so in clear and consistent ways using a particular syntax. Thats the principle of this BNF, a metalanguage for other languages. But the challenge is to have a natural language syntax for analysis. So, for instance, we can describe a text with BNF – breaking terms into sentences, noun phrases, verb phrases, auxiliary, adverb, etc. So, from these natural language syntax we can build a derivation/parsing tree to understand the sentence in a way that avoids misunderstanding.

Another concept in our world is quantitative semantics. Ranking the words in the vocabulary according to some measure of significance. So, again, we can use BNF system to understand quantitative semantics such as determining terms, extremal terms, maximal terms, etc. This helps us understand the strength of a term. So we see a gradual escalation of terms. You can understand positive or negative terms, you are ranking the semantics on a spectrum of values. You can also look at connecting terms, auxiliary terms.

Now we move into the psychological model, which is supported by the lexical model, so we can use the 10 Cognitive Distortion Thought Categories that are at the centre of CBT methods. With these tools you can take a sentence and detect the thought categories present. And you can use BNF structures to define those thought categories so that the computer has a precise definition of what I am looking for. I am seeking sentences constructed in a particular way in order to understand the user and to rank that user. Different thought categories will therefore have different structures – again definable for use by the computer in parsing user texts.

So, we have these patterns, and they are tested for validity. We can then use pattern matching, based on these patterns, in order to analyse texts. So from this  you can use substitution to recommend correction or more moderate terms; you can evaluate and measure deviation, etc.

From all these models we can build a workflow for processing texts in order to make our rankings, some aspects will be iterative as the computer makes a decision. So, the english version of this work lets you rank intensity of the meaning of the words.

And with that I am off to the next session as it relates to COBWEB. Look out for tweets on the remainder of Dan’s talk from other attendees. 

NatureNet – Crowdsourcing design citizen science data using a tabletop in a Nature Preserve – Tom Yeh, University of Colorado, Boulder, USA

I will be talking about a socio technical infrastructure here for nature. So, citizen science is, broadly, about democratising science education and fostering students understanding of how science can be relevant to their lives and communities. And this is a type of crowdsourcing where individuals engage in scientific processes without needing any specific scientific background or training.

So, NatureNet is a citizen science system for studying bio-diversity in nature preserve settings. So, at this conference we have heard lots of presentations on particular platforms. Our project is based around mobile devices, desktop machines and, particularly, table top technologies. Some of these platforms like Twitter and Facebook tend to occur in non Face to Face ways. We wanted to see what fitted those gaps, that opportunity to use table tops and face to face interactions.

So, we were working with the Aspen Nature Park. It is a hugely popular attraction in the summer. So, you can check out a phone at the site, you can take pictures, observations, ask questions – which many do, etc. and collect notes as you walk around. And you can comment and discuss the observation. So, we identified four main motivators to encourage participation in this project:

1. Personal interest

2. Self advocacy

3. Self promotion

So, when the visitor comes back to the visitor centre they can access the table top, they can explore resources, have discussion… they can engage in a face to face way around the table – rather than all having heads down on phones. They can see the pictures they have taken. They can do a kind of face to face social media here, they can engage and share there, they can comment. And answers and discussions can take place, feedback can come back on those questions and comments gathered in the field.

Now, that’s the model the first time they visit, but what happens after that? They have different motivations to take part in the future. If you paid attention in Jennifer Preece’s talk earlier [which I missed] you’ll remember that participation is about membership, feedback, ownership, and acknowledgement. So, for instance after you visit the park, you can look at the website and might reflect back, engage etc.

So, this whole project is about participation in scientific endeavours. And another way to motivating people…

4. gamification

So in terms of crowdsourcing design. These are similar design processes to individual and team design processes but also includes social networking. So we came up with a design model that allows people to add comments and discussion. And we get our users leaving comments and feedback as part of this system, and we use this feedback in our design model.

So, this design model is about collecting ideas, allow commenting on ideas, select ideas, implement ideas – to test effectiveness, integrate ideas, evaluate ideas, modify design. So next time people come back to the park they should see ideas being integrated back into the platform. That will give them some ownership of the platform and some acknowledgement for their participation. One suggestion we have had is for participants to be able to track comments and whether they have or have not been responded to. And we also want some voting on those comments – not just about the science here but crowdsourcing the platform.

This isn’t just a stand alone thing but about the development of scientific dispositions (Clegg and Kolodner 2014, Borda 2007, etc.). In terms of how this can be developed in learners Calabrese-Barton (1998) and Chinn and Malhotra (2001) found that engaging learners in authentic inquiry relevant to their lives enables then to develop scientific dispositions. But Fisher and Giaccardi (2006), Hong and Page (2004), Maher et al (2014 in press), Page (2007), Yip et al (2013) found that engaging learners in the development of tools and activities that support their scientific engagement is also crucial. And that’s why we are doing this, and seeing the tools as continuingly evolving.


Q: Is this specific to the Aspen location?

A: It is now but we also hope to text in two other sites in order to compare how it works there.

Q: We have a project called the Open Science Lab, it came out of a project around personal inquiry with young people. The tool coming out of that is being developed by Mike Sharples and it would be worth you being aware of that if you are not already.

Q: What is the scientific aspects in this project – you are crowdsourcing the interface development but how do comments and questions etc. feed back to scientists/data collection?

A: involved naturalists in the park to crowdsource design of learning activities in the par, but we hope to develop that out towards other citizen science activities. But we want the ideas to help shape relevance of scientific inquiry. People don’t easily identify these sorts of ideas… almost tricking them into giving good ideas.

Combining Social Media and Collaborative E-Learning for Developing Personal Knowledge Management – Tiit Elenurm, Estonian Business School

I started using e-learning tools in the year 2000. At first my focus was on collaborative application of elearning. At that time we used baker(?) but moved over to Moodle and there have been lots of shifts in tools over time, always trying new collaborative aspects, focused on knowledge exchange. So I will look at some of those and how they relate to e-learning.

So this leads to my research question of “What re the experiential learning opportunities and challenges of combining social media and learning applications in the academic context of business studies?” and I’m particularly interested in entrepreneurs. I will talk about 6 applications of social media and learning and their use in developing personal knowledge management skills of entrepreneurs.

So, in 1962 Marshall McLuhan (1962) was the first to popularise the term “global village”. For an entrepreneur the main challenge is about whether we rely only on face to face interactions, when could we use  social media for becoming and remaining successful entrepreneurs. We could set up a successful venture in our local area, but if we want to work with someone in Australia than only face to face contact would be expensive, so we need to be able to gain trust using social media. So we really wanted to study this.

And my point of view around these applications is to think about the benefit of entrepreneurs. We have to understand the entrepreneurial orientations, and whilst some literature suggests only one orientation we have a model of three which we think you see:

– Imitative orientation – looking out to what works as their trigger

– Individual innovative origentation – they maybe do not need so much networking

– Co-creative orientation – students and entrepreneurs focused on core creative work – And when we think about limitations and benefits of applications this is probably particularly important as a group.

So we use self-assessment questionnaire for specifying entrepreneurial orientations – a departure point for local and cross-border business opportunity when linking the entrepreneurship education to social media applications. So, when we have run various training courses related to social media, less so with eCommerce or eMarketing as that’s often reflecting positions of established businesses. We really want to reach at the idea of business opportunities of a networker in a broader sense – networking for self-development in order to understand new business trends and opportunities, networking for building personal brands, support for starting businesses, and also how to defend network against colonising marketeers/players.

So, if I place myself in the position of a small start up or entrepreneur I don’t expect to be an expert in every social media site or domain. Choices have to be made, and the same is true for trainers – there needs to be a more limited relevant focus.

So I looked at six learning and social media combination tools, their challenges and opportunities:

1. discussion forums in the noodle learning environment – for knowledge sharing between students studying international business and knowledge management, very much encouraged by tutors and teaching staff. Encouraged to discuss and exchange with other students. There are many good tools but initiallymuch discussion about how much transparency there should be around homework assignments and grades, what can be learned from. These are good tools and don’t require mainstream social media but it doesn’t cover everything.

2. Assignment for finding and reflecting MOOCs  – now we are trying to take this next step to open up learning. We have tested as assignments in courses, for students to find MOOCs and take them. So, in further we have a special elective where students study entrepreneurship MBA and to find MOOC course to fill a gap in our curriculum – they have to prove it will do that. They have to study that MOOC. And then we have blended learning sessions where they have to demonstrate lessons learnt from the MOOC. If they prove to us and other students that that has been a valuable contribution, then we give points/credit from us. That I think will open up the learning space much more than this first approach. And it is really opening up the curriculum! Perhaps we will develop the curriculum with these courses if we agree.

3. Sharing user experience about preferred social media sites and new online networking opportunities in the course blog – reflecting changing trends in social media use.

4. Ticider for online brainstorming – I have used the tool for years… but some students really opened up my mind here! They asked fellow students from exchange from Barconi to do it together… so this could be a closed or an open community… perhaps on future  more opportunities for open approaches like this.

5. X-Culture online project work – who creates online teams? In the US the lecturer chooses teams, we are trying this out to see what works

6. Cross-border online teams for assisting enterprises in their internationalisation efforts – teams have to work together from Helsinki and Barconi and that is a challenging task to do, finding right skills.

And, in conclusion when creating experiential learning paths of learners by applying social media, useful to take into consideration the readiness of the learners for co-creative entrepeanurship, their online knowledge sharing experience and their disposition to trust co-operation partners in cyberspace. So, in some ways these experiments where students create teams, experiment with them, they are very valuable. x-culture also valuable though as about building trust and teams with people they have never met.

So, the other main important conclusion for me is that minds should be opened up, not just of lecturers but also of students. Many use social media to connect with those who they already know and connect to. Very few proactively use it to find new partners and new contacts. So we have to encourage them to look forwards, not just backwards. To look to their career and knowledge management prospects. And there is the challenge of finding the balance of deliberation and self-regulation in social media and learning. If student judgement high for MOOCs task then why have university. And what is balance of face to face and virtual reality/activity.

Exploring User Behaviour and Needs in Q&A Communities – Smitashree Choudhury, Knowledge Media Institute, Open University

We are mainly a computer science department and we wanted to conduct a small research study on user behaviour needs. We wanted to undersnad user needs in the online communities – why they need contributions, why they contribute. And exploring the relationship between actual behaviour and possible latent needs driving those behaviour. And we wanted to consider if theories of human motivation might explain user needs and behaviour (Maslow).

There have been a number of studies of motivations for using different social media sites. We used Maslow’s theory, a fundamental motivational theory. And that makes use of Maslow’s hierarchy of needs and mapping to online needs. From physiological; security needs; needs for belongingness; need for self-esteem; need for self-actualisation. But it needs some translation to the online world. Physiological needs may be about basic needs such as access to the internet. Security may be about security or privacy. And belongingness will be about groups and sense of belonging and participation. And self-esteem connects to reputation, honour, badges in use in these sides. In terms of self actualisation the online communities may fulfil that, but looking at behavioural indicators.

So we started by looking at data from SAP community network. This is a global network where problems are shared. They reward users who contribute significantly, they have a monitoring system for that. There are 32k users. Has run from 2004-2010. There were 427k posts, 34 different forums, 95k threads and many more replies.

So we wanted to seek features that might indicate motivation and behaviour. Factors including community age, how long a user is active in a community; post frequently; initiation; reply; self-reply; number of questions answers; in-degree – how popular are the users and who do they get replies from; tie strength; forum focus – different communities attended by user; topic focus; content quality (reputation points).

So, some statistics of those features… as in many communities a small number of users create a large amount of the content. 10% of users contributed 74% of content. 50% of users were active for less than 10 days of activity. 30% of users never replied to others. 35% of users have never asked a question, maybe they come to contribute and help others. And 70% of users had no reputation points – gives an idea of qualiy distribution.

So we did a simple exploratory analysis of features to factors (EFA) to try to see where correlations might occur. From that we basically found four or five factors that describe all of the behaviours.

1: socially active users/engagers

2. Askers and replies – a measure of community contribution.

3. short-term but active users

4. experienced users

5. Reputation/expert users

So with these factors we found users with these attributes cluster quite differently… we can see that helping behaviours are quite evident in those factors. So, do we see the need hierarchy? In order to investigate we extracted the patterns into time patterns – 16w timeline for each user to see progress for each user… It showed users having multiple behavioural characteristics over time. If we went to user level we saw aggregate community level… the community shows same level of needs in each category.

Then we looked at need evolution. 16% started with basic information needs. 51% start interactively by both initiating and contributing to other users. 12% of users start with high reputation score. 46% of users maintain same order of needs throughout community life. 25% moves from lower to higher oder needs. 28% moves in the reverse direction.

So, this suggests that the users in online communities may not follow a rigid hierarchy… Even Marlow says that it is a combination of needs – you may have more than one at play at the same time. There are also some limitations here, we did not directly involve the community here which may have changed things here. But behavioural analysts does provide insight into users intention of participation at different times.

Summary of issues raised during the conference and presentation of the best PhD Paper and Best Poster Awards – Sue Greener and Asher Rospigliosi

Asher: Sue and I wanted to bring together our thoughts… we noted a lot of stuff up on a wall, which you can see in this image, and we really want to focus on a few key things: do not make assumptions being a big part of this…

Sue: We have a few challenges here… So, firstly… the issue of ubiquity is known, is part of our world but it does really raise challenges…we came across several people at this conference who have twitter accounts but have never used them! Is that normal for academics to research something we haven’t tried ourselves? If we are seriously researching social media, as Farida said, you need to do this stuff first. We’ve said this about elearning for years, but how do we make meaning in our research without trying stuff out.

ASher: we also wanted to talk about ease and access… and issues around Twitter. So many of us use Twitter as a source of data – it’s easy to access, open. accessible. We can’t reach that closed data from other platforms… we can get Twitter data without complexity… maybe… but actually we need computer science to manage that and that changes what kind of projects we do, what skills we need. And again these are issues raised by Farida, as were issues of who we hear about in this space, how reliable data is…. and there can be huge differentiation between what dat you get depending on your source, your API, whether you have firehose access.

Sue: thirdly the issue of clash of worlds, clash of dimensions! I’ve had lots of comments about a mix of elements, not so much a blend. We saw that social media links across disciplines… I think that can be good, to bring people together. But we can find clashes there… in the education world Facebook might be great but it’s not owned by the sector, we have to think about the commercial world… risk management… we need to consider commerce, learning world of academics, and learning world of students. And student experiences can be quite different. And you have the institutional perspective… and the analytical perspective. You have governments watching, tracking, potentially shaping our destiny without us even knowing. So we have to critically examine that before we can say “I know that”.

Asher: fourthly the pace of change of the technology world, of social media is breathtaking. Several times I thought about the route to get a PhD… how long that takes to establish methodological approach, collect data, reflect on that data… if that had been on MySpace and you came out with a PhD around that it might be a bit disconcerting… stuff like SnapChat, Instagram, WeChat… We started by talking about Twitter and being involved… increasingly the new technologies and interfaces will change rapidly. I don’t know anyone using Google Glass yet but I’m sure it won’t be long before whole conferences may be full of people here… and so there is the issue of relevance and currency. I would say that you should be open, recording what you think at the moment or shortly afterwards – like Nicola has done in her liveblog – because you have established and shared what you are doing, particularly important if you are doing a PhD in this area.

Sue: Fifthly there is the issue of language, terminology and definitions too. This is a really shifting time… we don’t have the definitions… we find it hard as academics to talk about what we write.

Asher: this morning we had this idea of, Sue calls it, “QWERTY Lock” and how that may influence our behaviour.

Sue: And David Gurteen asked us to think about better smarter conversations online… we need a shared language to talk about social media and what’s going on, and we need to establish that…

Asher: Farida talked about images, how under represented they are in the literature. Ben Schneiderman also raised the issue of visual literacy when talking about visualisation and big data. And this week, along with twitter, we have also seen a number of images being shared, a lot of information there.

Sue: So there are five challenges to take away… but the main thing here is that, isn’t this an exciting time to be exploring and researching social media! And, as you may imagine we have been collecting everything as we go  – tweets, images, storify etc. So go to http://blogs.brighton.ac.uk/brightsoc/ for all of those.

Asher: So, stay in touch with us and each other and we welcome any feedback you may have…

Comment from audience: I’ve really enjoyed this. When we first talked about this year at ALT-C, an e-learning conference with many of the same faces again… and it was so fun to be at an event with such a mix of areas and with topics outside of my normal work area!

Sue Nugus: we are going to give the prizes for the best PhD Paper and the best Poster. We had some great posters today. At some conferences people can feel that posters don’t count in some sort of way, but thats not true – you can learn so much from the posters and speaking with the poster authors. And I am so pleased that we had such excellent posters that really reinforced that! And the best poster goes to Sue Beckingham and the team from Sheffield Hallam University for their poster The SHU Social Media CoLab.

We also wanted to thank Avril Loveless for chairing and organising the PhD Colloquium. There were some fantastic presentations which gave the judges a very hard time. But the unanimous winner was Jennifer Forestal from Northwestern University, and her paper was from “Demos to Data: Social Media, Software Architecture and Public Space”.

Finally I would like to thank Asher and Sue for being so up for organising this first ECSM conference, they have been wonderful.

Asher: And huge thanks both to Sue Nugus, to Sue Gardner and to all of the academic and technical support teams here who have helped make the event possible!

And with that we are all done! It was a really stimulating and useful conference for me and I look forward, hopefully, to going along to ECSM 2015 and meeting with this lovely community again soon!


European Conference on Social Media 2014 (#ECSM2014) Day One LiveBlog

Today I am at the European Conference on Social Media (#ECSM2014) at the University of Brighton. I will be presenting my paper, “Learning from others mistakes: how social media etiquette distorts informal learning onlineâ€� this afternoon but until then I will be blogging the talks I attend. As usual this is a live blog so please let me know if you spot any errors or omissions and I’ll be happy to fix them. 

After a welcome to the event from Sue Nugus of acpi, we are now hearing from Bruce Brown, Pro Vice Chancellor of Research at University of Brighton, welcoming us and stating that everything is up for grabs right now, a really important historic moment in time making this a really important conference which we are delighted to be hosting! Over 35 countries are represented here today, welcome! We are a post 92 University here but we have had a lot of success in research, and have a really exciting research agenda here particularly around arts and humanities. If I mention “impact” to UK colleagues here I can see a bit of a dark cloud looming… I chair the main panel for Arts and Humanities nationally, in which I have a group in Arts and Society and Commerce who met in Edinburgh yesterday, and I think you will be pleasantly surprised by just how much impact there is in these fields. So, I wish you well for a great conference.

Asher Rospigliosi, University of Brighton

Myself and Sue Greener, who’ll join me in a moment, have been working together for the last 12 years or so. Although we are located in a business school we have focused on e-learning, on the impact of the internet on everyday life. We were therefore very keen to look beyond the business world, to the wider range of how social media is impacting on life. We deliberately start with Farida Vis, who we are delighted to have here speaking about big data, for that reason. We also wanted to recognise the impact on business and changing business practices which is why we are delighted that our second keynote comes from David Gurteen.

Dr Sue Greener, University of Brighton

And the other side of what we are looking at today is learning, because we learn through social media all the time. So learn, discuss… and read about what happened at yesterday’s Social Media Showcase.

Farida Vis – The Evolution of Research on Social Media

As has already been said this is a really important moment, and something of a crunch point of academia, industry and government really coming together around social media. Social media research is becoming mainstream and visible across research and across sectors in different ways.

So, a few provocations…

Increasingly social media is becoming synonymous with big data. The tracks and traces we leave online mean that social media research is increasingly needing to engage with or at least acknowledge this big data. And real time analytics are an important part of this. What do they mean for academia and the time frames we are used to? How quickly can we produce findings, and findings which are robust… there are ways in which our work is being broken up and being challenged.

I was pleased to see the word cloud of keywords for papers and note lots of mentions of Facebook and LinkedIn and not so much Twitter. That would be good to see… in the literature we are seeing a real focus on particular platforms… Twitter seems to be a dominant platform there but social media is not Twitter, we have to be careful how we extrapolate from one platform to others… I think this is partly to do with attention and real time aspects. Other platforms that get researched a lot less have a very different dynamic. A site like Pinterest isn’t as concerned with real time, it works quite differently. We have to be careful how we build this field collectively.

So, where are the research questions, when we talk about social media? And big data? Often we are data driven – what is available to us not a series of critical research questions that lead to data, to tools. And social media research, at least in the early days, was a lot about how to get a handle on this data, how to deal with it… but we are now moving to a phase where we need to think about the theory. We can no longer get away with being theory-light.

And some other issues that come up time and time again, not least in relation to the Facebook contagion study, are issues around research ethics – do we need new ethical frameworks, do we need more agile ethics, how do we apply traditional ethics in a new research space. There are questions of methods. There are issues of sampling. And I think we still haven’t really grappled with is data sharing… when you deal with social media data it is data you cannot share with other researchers and that has real implications… For instance Twitter are really honing in on data use. Twitter, when they went public, have become very much concerned with selling data which is their business plan. That means for us as researchers we have real challenges with sharing proprietary data sets. And real issues with regards to open data and transparency, and with the funding council. Making applications for research funding you are expected to talk about data sharing and that means proprietary data is a real problem.

It’s brilliant to see so much research on social media… but less good to see a lack of funding for social media research. Both the AHRC and ESRC talked about funding a research centre last year, but for various reasons that funding never made it to a call… the funding calls could do a lot more to fund specific social media research. The ESRC are moving into their third phase of Big Data funding, but none specifically for social media, despite it being a major big data topic.

So, what is the future for this research field? In some ways we have this tension between huge enthusiasm and interest, there is a lot of excitement and innovation happening, but that has to be underpinned by a funding but also training framework to underpin this research.

I just want to talk for a while about where I have come from in this research field, and where I see this going… and how some future of social media may be going. I got involved in social media fairly early on. I did a PhD on the Israel-Palestine conflict and focusing on the representation of victims. And in 2005 when Hurricane Katrina happened the media representation of victims there, particularly two press pieces representing black people as thieves, white people as victims. There was a real backlash from the blogosphere and I found that community, that voice online, really fascinating and exciting, providing a voice for those not being represented.

Similarly in 2008 the Fitna: the battle YouTube controversy similarly sparked response from a community that was not getting it’s voice heard elsewhere. Again this was very interesting, and I was moving through the platforms. And in 2011 the London Riots were getting blamed on social media, particularly Twitter, and I became involved in work investigating those claims, the Reading the Riots project.

So, my research was becoming about data, big data sets, and that meant requiring new tools, new approaches, collaborations with others. When I looked at Flickr in 2005 the scale was several hundred images, doable by hand, small scale. By Fitna there were 1413 videos and 700 individuals. You cannot collect all of those. And in social media there is this beguiling idea that because you can see the data, it will be easy to capture that data. So for YouTube I had to work with computer scientists to get at that data.

And by 2011 we were asked by the Guardian to look at the riots tweets – a data set of 2.6 million tweets – and that meant a whole lot of computer science. So over that period we were really moving into needing far more fire power, more computing power, and computer science input.

So, coming back to reading the riots… the Guardian were given this data set. Twitter were uncomfortable, as a brand, were uncomfortable to be linked to the riots particularly before the Olympics. They were happy to be linked to the Arab Spring, but not those riots. But the Guardian didn’t know what to do with that data, and this work was in the context of a parliamentary enquiry… We formed a multidisciplinary team, lead by Rob Proctor, but that was work with real and immediate relevance.

Something very personal to add here… I feel that I am something of a “border runner”, working within academia, with government, with industry. In my own time I sit on a World Economic Forum Council on Social Media. What is interesting in this moment is trying to have these discussions across these sectors, bringing perspectives from academia to industry… and I think that border running is really important.

So, back to big data. Gardner (in Sicular, 2013) define big data as being about volume, velocity and variety. And there is a huge industry built around “social data” and “listening platforms” but many of these are Black Box systems, not suitable for academic work where you want to understand what takes place beyond the screen. So there is a great set of provocations and challenges to big data from boyd and Crawford: about the mythology that big data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth and authenticity based on scale. They highlight the importance of critiquing claims of objectivity of data.

There are issues of the overwhelming focus on quantitative methods. And does data answer questions it was not designed to answer? How can we be sure we are asking the right research questions? We shouldn’t put data before research questions. And there are inherent biases in large linked error prone datasets, a really complex area. And there is a focus on text and numbers that can be mined algorithmically. Natural Language Processing works on stuff that can be mined, but what happens with that data we can’t easily mine? And I will talk a little on data fundamentalism…

Data fundamentalism is about the notion that correlation always indicates causation, that massive data sets and predictive analytics always reflect “objective truth”. The idea and belief in the existence of objects. And in that we can fail to situate ourselves in relation to that big data. And where are the critical big data studies? This is an important call to to arms I think.

So, how do we ground online data? It’s important to foreground data and what we think the data can tell us. There is a tension in where people want to ground their data. When we talk about social media we need to think about whether we want to ground the participants as citizens, in their offline context as people. Governments do want to understand individuals as people. So, do we ground social media users in the real world, as citizens, in the online world. Or do we want to ground our online users in that online world, social media users as social media users. So a Facebook user in the context of other Facebook users… this idea of grounding in the online world was pioneered by Richard Rogers in his research methodologies. So, for instance, in the riots one of the big key Twitter users was “Lord Voldemort” and, whilst there is a real person behind that account, it really points to those tensions of how we understand the grounding, whether offline or on lie.

Important considerations:

1. Asking the right question – research should be question driven rather than data driven. And honestly there is something troubling about the Riots work – started with the data and it was donated by a company, it goes against many of my provocations here. But we have to be open to using the data that is made available – Twitter is fairly transparent in it’s data ecosystem and what is available.

2. Accept poor data quality and users gaming metrics – once online metrics have value users will try to game them. Approach this data with huge suspicion. Try to ensure that you critically investigate that data, ensure what you think you have are what you actually have.

3. Limitations of tools – they are often built in disconnected ways… they may be built by people with expertise other than your own research perspective… dealing much more with user requirements in tool building is central, but as researchers we also have to be much better at describing the limitations better.

4. Transparancy – researchers should be upfront about limitations of research and research design. Can the data answer the questions? Increasingly we struggle to know what the limitations actually are – factors include what companies give us access to, what limitations we have as researchers, as well as others we don’t envisage, even if trying to be transparent.

I wanted to talk about a paper I wrote on Big Data and APIs (Vis 2013), and those aspects we can be unaware of… I am very keen that we have to be clear about how we create this data… it isn’t ready and waiting for us. We co-create that data. We need to be much more aware of APIs, of the tools that we use. So for instance Twitter lets you access three free APIs (Streaming, Search, REST), you have to understand from the outset which you need and what implications that has, and often you may want all three APIs. There are a number of API sampling problems. Now, if you have a lot of money to spend – as commercial companies will do – you can access the “FIREHOSE” – all of the tweets. But the Streaming API is a 1% random sample of the firehose… but it’s not totally random. I spoke to them and gave them a grilling on this. Twitter could do a whole lot better to explain how that 1% is being selected, what is and is not included, so that we understand what it is we are dealing with. From the Search or Streaming API, if you are not rate limited in a timeframe, you may actually be collecting all the data. So the implications will all depend on the type of data you are tracking. If you tracked all the tweets from this conference we are unlikely to generate 3 million tweets… collecting all the tweets through Search of Streaming means we might get 100% of the data or very near to it… but for a major event like the Arab Spring or Riots it’s a very different beast.

But it gets more complicated… this data is absolutely the backbone of monetising these platforms. We are seeing new business models around enriched metadata. We did, until recently, see three big players here: Datasift, GNIP and Topsy. But GNIP has been brought by Twitter, Topsy by Apple… we can see a tweet for instance, but the metadata will tell us the context – how many followers the person has, what the connections are, etc. And that’s where the value is… so we have seen the emergence of a social data industry. We saw Social Data Week take place last year. Big Boulder, traditionally organised by GNIP but last month’s was ostensibly organised by Facebook, is another big key conference here. So this is some of the wider context in which some of our research is taking place, we are at the mercy of this industry, and how data is made available.

So… is this new enriched metadata that companies sell/want to sell actually useful? For academia, industry and government we are all interested in location and influence – geolocation and how influential users are and how their networks look, where the key nodes and influencers are for sales but also for spreading policies or curbing negative spread.

So, the difference between social media and social data. Last year Martin Hawksey spotted that when you sent a query to the Twitter search API you used to get a small amount of data back, but now are giving you about four times more data: much much more context, to help you understand better what individuals are doing. But I get suspicious when I see this… is this stuff they could give us before? where is it from? is some of it made up?

New Profile Geo Enrichment – a GNIP product that came out last year… on Twitter you can click the geolocation pin to switch on for all of your tweets to be giving an exact Lat/Long geolocation. This is the gold standard of geo location. But only 1% of Twitter users are comfortable to give away their location all the time… and this is a really skewed group of users. So 2-3% of tweets in Firehose have geolocation. And those tend to be early adopters who are comfortable with sharing that data, do not have privacy concerns. So this new GNIP tools uses your biography and the location you can state there to parse your proxy location. The crucial thing here is that many many Twitter users do give a location in their biography… so a company like GNIP claim you can hear from all Twitter users, that the data is representative… to find the people discussing your brand in which location… This tool also parses tweets that mention a location…. now you don’t have to think too hard to see some issues there. This “enriched” metadata product mashes together gold standard geo location data with all this other stuff.

And there is another issue… people often delete tweets, and they often delete tweets with exact locations. In principle the Twitter API will send data to the tool but you have responsibility to check that. People batch delete locations. They also delete content.

Back to that Geo Enrichment of profiles… they are linking data and talk about “unlocking” demographic data and other information that is not otherwise possible with activity location. But how do we conduct the cheeks and balances we want and need to do to actually use that data in research.

We are obsessed with influence, ranking, lists… and we are also increasingly concerned with how influential we are as individuals on social media, maybe not everyone in this audience but a lot of social media users. So you have companies like Klout who rank you on social media influence… but change your scores based on which tools you connect. And they would create dark profiles – harvesting data and creating profiles even if you are not interested. Mining your data and processing and profiling you whether or not you want to. And the results of who influences you, and who you influence can be bizarre… people you’ve never seen are apparently your top influences. Direct Messages appear in key moments…

And Klout is a gamified space… they reward users for giving data… more data = more influential?? And of course there is the tension between online or offline influence. Up until recently Justin Bieber had a perfect Klout score… is he really more influential than, say, Obama, offline?! And you can buy your Klout score… the site Fiverr for instance lets you buy a Facebook Girlfriend, or boost your Klout scores… this stuff is out there… these tools exist…

So in April 2013 Mitt Romney decided to buy 100,000 extra followers in one day… a huge spike in one day was suspicious and he was found out. There are as many as 20 million fake follower accounts out of the 200 million active users – that’s from last year – so 10% of the Twittersphere are fake followers. And that doesn’t count spoof accounts. If we think about offline data sets… these should make us incredibly nervous… but we forgot to be critical about this stuff and we should be.

One more word on Klout… GNIP is now partnered with Klout… we can now buy Twitter data with Klout scores… and those could really skew our research.

We really need to be better at describing the limitations of our data. We have to see APIs as data makers, once data is linked very hard to untangle how metadata is constructed and where problems might be. Included in terms of deleted content – people delete for many different reasons. And we need to think of ourselves as data makers as well. And when creating a dataset it is important to describe how it was made, what the limitations are. You have to be suspicious of your data, to verify it, to describe that process. And how do we do that in a standard journal article – perhaps we have to have a more detailed account elsewhere of how our data was created.

Tools as data makers… I increasingly see research projects designed around tools that will get them the data. That massively narrows the scope of what we are looking at… if that’s what we do, what kind of research landscape are we building. I essentially see the same Twitter tool being built over and over again. We do have to focus on the questions. So we really need to understand this as a very dynamic field where humans and tools co-create data. And we have to avoid thinking about social media as lots of data, and that it is for people who work with data to build those. Instead we have to have a good understanding of the platforms themselves. What kind of domain expertise do we need in this field? To do Twitter research you need to understand the platform, you also need to be a user of that platform.

So, what’s the future? Well we need to address what gets left out – all the stuff we are not looking at right now. One thing that gets left out is images, very little research on images but 750 million images shared daily, not reflected in research. Images grab our attention, key to engagement for companies. iPhone world’s number one camera. Top cameras on Flickr all iPhones. A camera used to be for special occasion. Smartphones are always on us… we take selfies, everyday snaps, but also witness to events. And smartphone penetration is really quite high – 65% in US, similar in UK – going along with this is mobile web access, and that’s shifting what we could look at… And we see a rise of platforms focusing on visual content… Pinterest, Tumblr, Instagram, Vine, Snapchat. But academia just getting a handle on Twitter… and we have to move on again. And so many issues of ethics. We have issues of ephemerality… how do we research Snapchat? Through interviews with users? Through using it directly ourselves? Snapchat is a really important new player. 400 million images shared every day… we should be researching these areas.

In response to how images are being used Twitter has changed how we see images… now showing them inline. And saw a huge boost in RTs for inline pictures – changing practice in platform and in user behaviour, so important changes.

So, in the future, we need to think about pitfalls, limitations, and think about what are we not researching. I will be working on an image project in the next year. Images are not easy to mine. Maybe we are avoiding things it is hard to draw meaning out from. Images do, however, have huge interest in industry – and may move way ahead of academia, though we can learn a lot from certain developments. We need to switch our focus to understanding what all of this means… why are people doing this? How do we understand this social world?

Social Media for Informal Minority Language Learning: Welsh Learners’ Practices – Ann Jones, Institute of Educational Technology, Open University

This is an educational case study on minority language learning, specifically Welsh. This will be quite a straight forward talk on the challenges, the literature and this case study. We are quite a small number in this room but here is a map to locate Wales and to get a sense of Welsh speakers by local authority… So Cardiff is south. Aberystwyth, with a high number of Welsh speakers is in mid Wales.

Welsh is a very old language, from about the 6th century. It was the main language until the 1900s. Now about 20% of the population speak Welsh (~560k). The distribution is very uneven. In Cardiff it’s 8% of people, in Aberystwyth its 42%, in Caernarfon it’s 88%. So 2 challenges… small number of speakers, and uneven distribution. As a learner wanting to practice that can be tricky.

We thought about how one might be able to overcome this a bit online. Last year Lamy and Zourou (2013) “Social networking for language education: two particular foci: identity and community building” was really helpful, and included focus on minority and heritage language learning. Zorou (2012) talks about 3 terms for language learning:

1. social media as a set of tools

2. social network sites

3. language learning communities – more than just tools to learn the language, sometimes including peer assessment.

And I’ve also drawn on Conole and Alevizou (2010) and their topography for SM for language learning. And they talk about media sharing; instant message, conversation and chat; social networking; blogging and microblogging. In the study I looked at microblogging was quite important, even for beginners. But for me I needed to add another aspect…

In terms of studies of welsh there is quite a lot on the status of welsh on social media, not so much for learning, so what happens if you are a bilingual speaker – do you speak welsh or do you speak English? Honeycutt and Cunliffe (2010) looked at Facebook, found quite a lot of use… groups that ranged from tiny numbers, to those in the thousands… later studies haven’t been quite so positive though. And on the social identity of welsh learners (Prosser 1986) looked at how welsh isn’t usually learned in order to community, it is about identity and your relationship to welsh identities.

So, an informal welsh learning case study. This was a small study with quite lengthy interviews with 12 learners. This gives you an indication of what they were doing – all made some use of social media. Even beginners used Twitter – if only to follow a tweet of the day in Welsh. Some also used email if they felt confident and it was about exchange with another learner.

There was one community that has grown up, 30k participants, called Say Something in Welsh. It talks about welsh learning podcasts. It emphasises communication skills. It was used by and referred to by many of my participants. They have two courses, a forum, a weekly newsletter, and an Online Eisteddfod – encouraging learners to  take pictures, write plays, etc. And they also run physical bootcamps for intensive speaking practice. There are local meetings. There are 30,000 users. And it is run by passionate people so that forum is very actively monitored.

So I want to give you some examples of social media use. Media sharing is an obvious one, and tends to come top in informal language learning. They were watching and sharing TV, often via app, and they watched kids programmes and programmes for learners. But as their learning progressed it changed. So a participant talked about listening to a documentary and understanding a little bit for the first time. Many listened to Radio Cymru – at work it didn’t distract them but felt it was training their ears. And there were materials on YouTube, music downloads, BBC resources for learners, etc.

In terms of instant messaging and chat, even if not that well progressed, were used. Emailing was part of this. Skype was particularly useful – both audio/video and text chat. And texting also part of the mix. And the forum included hugely detailed and caring discussions of detailed language use, such as correct use of “i”.

The social media spaces here were basically only Facebook. A participant here talks about having a Welsh Facebook page – and using the spell checker as part of that process, quite a sophisticated learning use. And learners talked about using Facebook to bring learners together… for instance welsh learners in England who used Say Something in Welsh to set up and support meet up groups – see Welsh Learners in England Facebook Page for instance. An online space and advertising that compliments in person activities and meetings.

I mentioned that there was a really active forum on SSIW. There is real encouragement, sharing of experiences, etc. One of my interviewees talked about going to Wales for a week, looking for resources, and downloading resources onto his smartphone. And how he was using that. And he talks about going into a shop and being understood. So access to that online course and community has been key to his understanding of the language.

Conclusions. I did a small scale study here. All use social media but their use varies. And it changes from a beginner to when you become more experienced. Most commonly they shared media, used it to interact, used SNS – usually Facebook. SNS successful in connecting learners. Experienced learners particularly creative in supporting other learners – perhaps because of the identity of welsh learners. SSIQ has been particularly successful.


Q: have you looked at how welsh learners adopt new English words… when we have new words related to technology and whether there are common words…

A: People do ask each other. Perhaps similar to other minority languages there is a board of language, and when new words emerge they discuss what they should call that… some are quite amusing. “Microdon” was the word for Microwave, but popularly known as “Poptiping” because of the noise it makes.

Q: Was there a spread beyond the group, that people were drawn in?

A: I didn’t look at it, people at Glamorgan did, but I’m not sure that it did. They say online communities often mirror offline groups. For welsh community some mirroring. Different for learners though. Don’t

Q: Social media communities around politics are often the most active – do you think that the political aspect of learning and speaking welsh is important – would the community work similarly for other minority languages without that political aspect or is that political baggage important?

A: lovely quote I had about technology as a boon but also a real issue – because community is so big online. Welsh government funded rugby union for bilingual website and they hadn’t done it… they are located down south. Has to be a real push. And meanwhile remote communities still don’t have broadband, meanwhile driving with dongle to do homework… definitely a significant political element there.

Social media initial public offerings (IPOs): Failure and Success Factors – Piotr Wisniewski, Warsaw School of Economics, Poland

I will be talking about social media commercialisation, the learning curve and some of the investment challenges. The Global Social Media Index. And some takeaways from key IPOs.

Social media organisations increasingly tapped public stock markets yet, despite appeal and improving economics, the success of several high profile IPOs has been rather lacklustre. Social media have been very popular with younger generation but this is changing. We see them setting trends in the economy. We see projected demands as role of social networking rises. Their primary focus fuels expected growth – the young will become more affluent over time. They are seen as democratic resources because of their ease of access.

From an investment point of view social media can be seen as facilitators of existing offline operations. But you can also look at social media as an asset for investment per se, and that’s my concern.

You have seen growing awareness of social media by industry, and adoption of them. There are critical challenges though: business metrics and KPIs are difficult for social media. Social media stocks represent very different business models so hard to benchmark them against each other. And that makes it hard to put a safe valuation on them. Further many business models have been hard to monetise. They have been popular with users but it is hard to monetise that. Most social media companies are “hit driven” so they have to innovate to remain relevant and interesting to stock holders.

Global Social Media Index: the companies primarily looked at to see the trends for investment stories tend to be those with public status and global outreach. Not only local presence but a global dimension. Which usually, because of languages, have to mean sites in global languages.

In terms of the SOCL Key Components we see a real focus on US and Chinese companies, Facebook and LinkedIn significant here.

Some social media stock got off to inauspicious start, they are seen as highly volatile stocks. We see most indeces outpacing Social Media stocks initially, at their floatation, but then they recover losses over time. We see quite a bit of volatility but we see more favourable Sharpe Ratio… So they have gained ground in terms of risk adjustment over time. Looking at SOCL financials we see LinkedIn as one of the most highly valued stocks, partly about the variance in business models.

As we look at the IPOs, many floatations were made when no clear path to commercialisation and monetisation could be seen by investors. Timing of some IPOs was not so good in some cases. And there were issues of IPO management – aggressive pricing made it difficult to successfully list them on the stock market.

I would say the conclusions that could be drawn from the information on IPOs… whoever brings them onto the market has to pay attention to timing, timing is critical. Pre-IPO integration is important, to make the route to commercialisation more clear. And the IPO management has to be done better in order to limit the mishaps that occurred in, say, the Facebook IPO. Has to be a more coherent process and perhaps with more conservatism on the pricing side.


Q: Why are social media attractive on the markets?

A: See a broadening and widening of customer base. Public markets are susceptible to trends, to public interests. The stories behind the IPOs are attractive. We have a young customer base, a loyal customer base.

Q: How are they valued? What is the product?

A: Earnings, cashflow, projections for instance. The service is networking among people, the product is advertising on the whole. Some applications are paid for… some models more commercially viable than others. Investors have those doubts too… looking for clear path to commercial success. LinkedIn is valued high for that reason… may be too high…

Q: Is LinkedIn so high because it has a more traditional model, a recognisable recruitment model almost

A: It has high quality users, graduates, professionals, and high quality networks that are particularly of interest to investors.

Q: Has the perception of Facebook and transparency changed since the IPO?

A: Argueably it is more transparent now, since the offering. But still questions about commercialising and monetising. But they have come a long way.

Pro-Am Writing: Towards a framework for new media influence on Old Journalism – Andrew Duffy, Nanyang Technological University, Singapore 

I started here by looking at travel writing, the professional travel writers – often armed with trusty notebooks – and the amateur travel bloggers, usually armed with laptops. And you might ask whether this is a serious area of study… but media frameworks influence public perception and reflect pubic opinion (Curran 2002), media shapes world view, provides shares symbols and language (Keller 2002). And the media can change perceptions and behavioural intentions (Hsu and Song 2013).

But lets turn that around… tourists are now an important media source for the public (Duffy 2014). The ambulant traveller can tell the travel writer where to go and what to do when they get there. So I came up with three research questions on the user generated sites. So far we have looked at 18 travel journalism students from the UK, Finland, Singapore, China and Taiwan. They planned their articles before travel-writing practicum to Istanbul. I did a survey and one hour interviews on their experiences.

The first thing they did was to look at background information. And I was surprised at how very vague they were… “about Istanbul”, “Turkish culture”, etc. They looked up sites they had heard about “Blue Mosque”, “Hagia Sofia”. They also did specific travel arctic searches… for “Traditional Turkish Hamam”, “Istanbul moustache transport”. Everything coming back was mainstream, they wanted to be different… so finally they searched for off the beaten track information.

Now they mostly started with Wikipedia/Wikitravel. They were a bit embarrassed and nervous about them. But as a basic starting point it was worth doing it. None mentioned the collaborative nature of those sites. Then they went to Lonely Planet forum and TripAdvisor. Seen as trusted but often seeing only the obvious stuff. And a smaller percentage of students went to blogs by travellers and residence – seen as insider’s viewpoint, authentic… but also seen as rather boring because they were every day. A dichotomy there.

Motivations for using UGC… students noted for Wikipedia – “anybody could write it, hell even I could write it” as if that were the last word in dubious authorship. For Trip Advisor it was seen as a  method of verification. Often people start with the top 10… or if they want to be obscure they look down at number 53 or something more obscure.

For blogs it was important for the reader to decide whether or not the author was on the same wavelength, the same personality as the individual. They make a quick assessment. But seen as giving you new information you don’t find anywhere else. And at the end I had to prod them about whether they used Facebook, Tumblr, Twitter… we are told they are digital natives, told to use Twitter or Instagram… but they go there as a last resort. One said that Twitter might be up to date but wasn’t sure how to search it… another used Facebook pages for a club to find out about it… but it surprised me how grudgingly they used these spaces.

UGC is sought for alternative travel ideas, off the beaten track and real life as it is lived, an authentic traveller experience. Instead it delivers mainstream attractions (no social reporting), reconfirms existing knowledge first, authentic tourist experience. This desire really focused down their trips into really mainstream activities..

So I’m trying to put together a framework for future studies, good practice professional journalism values combines with UGC equivalents. So for news “impact on many” would equate to “must see, must do”. All of these students researched using Google, no one questioned results on front page. Four went to a page sponsored by a hotel, none of them noticed. They were not aware of SEO. These are communications students, they should know better. So what is the influence of UGC on travel journalism? Well many of these factors add up to popular, mainstream, recentness, and a focus on personal experience… that limits how people see the outside world. Self trumps destination – we are producing a generation of travellers that place themselves above their destination. Classic news value in journalism is objectivism, but subjective experience is the outcome from this authenticity as a gold standard factor in UGC. Quite an interesting aspect.

It pushes towards mainstream activities, replicates mainstream media conventions – research on NYT travel pictures sent by users found both those replicating conventions and the jokey tropes for instance. A real focus on tourist activities. A focus on personal experience and the self. In the mainstream rather than the independent. In theory the internet should be freeing us from monolithic media makers, but seems to be the opposite happening. And, as I mentioned, they didn’t really discuss the effect of SEO and how that pointed them towards the mainstream.. I found across a great tool that forces you to page 11 of Google – to see the soft white underbelly of the internet.

So they want to blaze the trail they wish when they actually follow in others’ footsteps.

Q: Are those journalistic frameworks still relevant, is that idea of objectivity still relevant when mainstream media is moving to subjective terms, columnists etc. Objectivity isn’t what is seen in the same way now, may be influenced by social media but much bigger than that.

A: I did think I’d be asked about that. These values are from textbooks, long standing values. Whilst these students may want to end up being a columnist but they have to do that objective stuff, that socialisation, to enter the media, to reach that point.

Comment: reminds me of Ira Glass concept of “The Gap”- the idea that you ape a style you like but have great difficulty creating to that level until you have had a lot of practice.

Q: Could the lack of use of Twitter be about students seeing Twitter as a messaging service? My students certainly see it that way.

A: I don’t think so, as journalism students they see Twitter as an information source, but they didn’t search them.

Q: Why did students trust Trip Advisor, and use in preference to Booking.com or similar.

A: Partly because it is so well known, it also appeared very high up in the search results. But they were embarrassed about using it, like Wikipedia, as created by amateurs. Much more comfortable looking at journalistic sources and newspapers, especially British newspapers, appear highly in search results.

Q: Lets flip this round a bit… what would you do as a travel site to be used more?

A: If I was going to well paid consultancy for travel websites I would tell them that they should use the first person. They saw third person as promotional in tone. They much prefer first person “If they did, then I could do it too”. Why can’t you write in first person in a blog style on the Istanbul website? Need to break away from third person.

Q: Doesn’t that link back to the point made earlier to the objective versus subjective voice. They prefer subjective account.

A: This was the revelation to me… that thing of subjectivity being what they look for, that being the internet way… the impact on journalism is likely to be a significant thing.

David Gurteen – Towards Smarter Socially Mediated Conversations

Let me take you back 12 years… I used to go to talks in London on knowledge management. And afterwards we would go to the pub to chat. Some were good but many were not so good… And on those nights the pub was the best bit, that was where the real connections and learning took place. And so, I decided to set up Gurteen Knowledge Cafe’s and that’s what I do now, I travel the world arranging these sorts of discussion events. People started to ask me about having those conversations online, but I was focused on face to face engagement. But when I was asked to speak here, I thought about what I would really see as being important to creating the right sort of online environment for good conversations.

For those of you familiar with the cafe it’s a really simple process… a way of getting people together around conversation on a topic of mutual interest. It’s a very open format. Tyically a speaker makes a short presentation, poses a short question. People gather in groups for conversation. And ideally we come together to share those conversations, what we have learned from them.

More by accident than anything I have ended up running these cafes across the world – in the UK, Spain, Norway, Russia, USA. etc. I could share many many stories. I ran a Knowledge Sharing Workshop in Jakarta in 2007, but I’d run one the day before in the Dutch Embassy. I realised that English language skills were not great and that meant people dried up, the conversations were not going to work. So I realised that I didn’t need to talk, I let the group engage in their own language, and my host indicated how it was going on. I learned the importance of allowing people to converse in their own tongue. Even when you know a foreign language well it can be hard to have fluid chat.

And a year later in Malaysia, in 2008, I ran a cafe as part of an IBM workshop. What I find is that at the end of the first conversation it’s good to move people to other groups… I did that here and nobody moved at all… my immediate reaction was curiosity… my host, who was Chinese, said “don’t worry, I know the culture! I’ll make them move for you!”. So I said to go ahead. He told them to stand up, and then asked a few to change tables. And no one moved. And someone there said they didn’t want to move and that I had said that I didn’t make them do anything, and they didn’t want to. They had all arrived in in their own groups, they didn’t want to leave their comfort zones. People are not always relaxed about talking to strangers. In future I’ll try asking everyone to move…

In Thailand a week later (2008) I had a big sign up but a small group arrived, the rest wanted to watch and were doing so via a web cam. And when it came to conversations the Americans, Brits, Aussies, Indians joined in big conversations. Thai people engaged in small groups but not in that big group. A real lesson there for me about the comfort of speaking outside your own group versus inside your group.

And the most moving one for me, in Abu Dhabi in 2011, ran a session with Arab man and Arab women. They weren’t really mixing but I asked them to mix a bit. At the end one came up to him at the end quite agitated, quite upset. He said he had, until that day, only spoken to his mum, his wife, his nieces, four women. And I realised how much we don’t know about each others’ histories and backgrounds.

There are so many stories, I’ve boiled it down to key barriers:

  • Poor English – quality and confidence of english
  • Fear of loss of face, of looking foolish, of other dominant people
  • Fear of causing someone else to lose face, particularly people in authority
  • Deference to authority, I saw two people at a workshop in Singapore not engaging but the next day they were hugely involved and the difference was that the CEO wasn’t there the second day which meant no risk of looking foolish or making them look foolish
  • Humility – fear that the individual doesn’t have anything to add, to say of worth
  • Culture – a Chinese woman I met in Norway talked about education as being about sitting quietly, sitting on hands, the teacher talked at them and they could never ask questions, and they were taught to never ever question superiors. She knew that that wasn’t what she wanted to do but it was ingrained.

These traits are dominant in SE Asian Cultures but also exist in our Western Cultures.

These last few years, as I’ve become more interested in conversation, I’ve started to investigate the research on conversation and I just want to draw out some highlights. In “Why is conversation so easy?” (Garrod and Pickering) the researchers find that humans have evolved for conversation, rather than monologue. Influence of group size – above about 5 people it no longer a conversations but a series of mini presentations or monologues (Fay, Garrod and Carletta). Small groups engage, larger groups tend not to. “Friends (and sometimes enemies) with cognitive benefits” (Ybarra, Winkielman, Yeh, Burnstein, Kavanagh) – I’d never thought of that before but I have found that having some friendly ice breaking chat at the beginning of a session really change the energy. And social sensitivity (Williams Wooley, Chabris, Pentland, Hashmi, Malone) find that groups where one person dominates are less collectively intelligent than in groups where the conversational turns are more equally distributed.

So this and other research I have read about had made the cafe evolve… and I have established principles that underly any good conversation:

  • Relaxed, non-threatening, open conversation (close to a pub or cafe conversation)
  • Everyone equal; no table leaders or report back
  • No one forced to do anything  – it’s ok to just listen
  • Trust people to talk about what is important  – it’s ok to go off-topic. for conversation to be engaging it has to have a flow of it’s own.
  • No capture of outcomes – outcomes are what people take away in their heads.

So, the question I have for myself, that I’d like to share with you, is what does this mean for online discussion forums and a potential virtual knowledge cafe? How would I do it given all of those issues. Now, I may not be so bright here but there are many issues here…

English tends to be the dominant language. Large number of people. Open to anyone. No idea who is in the forum. Do not know the people. No idea of the authority figures. No idea of the trolls. Everything is recorded. Maybe not surprising that we have the 90:9:1 law (90% lurk/read; 9% occasionally engage; 1% of really active users). Perhaps not surprising given the experiences of the conversation sessions I talked about before.

And then we think about the nature of many forum conversations: posts tend to be monologues; posts often very lengthy; grandstanding; responses carefully thought through; more debate/argument than dialogue; trolls and “intellectual trolls” thrive; easy to misunderstand someone; not easy to correct misunderstandings.

So, what’s the solution?

I don’t have the answers but I have some ideas. I think we need safe spaces where people can speak in their own language. I think you do need to have some conversations that are peer only. I think you need to know who is in the room, make it clear who is in the forum. The ability to edit or delete posts – to get rid of something that goes too far. Do not store threads for long. Small groups – of 3 or 4 people – and I don’t see anything on the web that does that. Permission to join conversations. Limit the size of posts – Twitter we use to some extent… it is not a conversational tool though. Perhaps limiting a forum to 500 characters would work. Real time dissuasions may make things more useful.

So… Randomised Coffee Trials…

In large organisations not easy for people to connect and build relationships. RCTs pair people at random for coffee once a week. Bank of England connects 4 people and call it “Coffee Fours”. SABMiller have pub chats! Lots of companies and organisations like NESTA trying this. But there is also telepresence as an option – would like to try out at my cafe some time.

Before I finish I want to ask you a question… How do you think we could improve engagement in online forums and how do we improve the quality of those conversations?


Q: A comment and a feeling of camaraderie: working in India I have faced the same issues of hierarchy and fear of loss of face. At first I tried to impose my way of doing things. But when I let go and let them do it their way, that was a huge change.

A: The real issue here, is how do we do this online… but we don’t see the dark side of who is online.

Q: We had a quick conversation and what we came up with is that the visual cue is so important. Online you need some sort of visual cue to connect to the other person.

A: yes, and these telepresence machines seem the best option thus far.

Q: I have a solution. We teach online, we have students all over the world. We use WebEx and Blackboard. We share a question early in the week and students can then post on forums, or can use that real time chat online, students then roll with it, people do chime in. Small groups of no more than 10.

A: I’ll try and chat with you later.

Q: I’m glad that Pat has mentioned the live web conferencing – I was going to mention tools like Skype or Google+ Hangouts. But I also wanted to raise the issue of text and the permeance of text. If you main format for conversation is textual than it carries less permanence, it is more ephemeral. So I recognise that barrier but I think that barrier may be shifting. It seems odd for text to be deemed more permanent than the chat in the pub – which you certainly can’t go back and delete or correct.

A: I do also do a lot of conversing via text but it is a major barrier for many people, the idea that what they say could be quoted back verbatim to them or held against them.

Twitter based Analysis of Public, Fine-grained emotional reactions to Significant Events – Dr Martin Sykora

So I’m talking today about some research funded by the EPSRC. I will be talking about the background, including the software we developed in house for this work, and I will say a bit about the analysis, the data  analysis we have done, and I will also talk about some of our future work.

In terms of the significance, I wanted to talk about the significance of social media which has been really interesting over the last two years as it has been taken up. In Meier 2011 we see an Egyptian activist talking about use of social media to change the world. And we also see Twitter as a way to poll public opinion (O’Conner et al 2010; Tumasjan et al 2010) and that can be a real issue as well. And we do see social media breaking the news – not always the case but it is genuinely disruptive. And there is also a big commercial interest in social media – companies like Attensity, Crimson Hexagon, Sysomas, Socialradar, Radian6 etc. All that attention is appealing to commercial companies. We also see the crisis mapping communities interested in social media. And the security services monitoring social media (Sykora 2013).

Social media streams allow us to observe a large number of spontaneous real-time interactions and varied expression of opinion, often fleeting and private (Miller 2011). And unprecedented opportunity to study human communication. And we wanted to study a range of emotions and a range of heterogeneous emotional measures.

So we have created software called EMOTIVE and the emotions we used there used Ekman’s 6 basic emotions (Anger, Disgust, Fear, etc) as well as Shame. And we decided not to use lexicons but instead to built an ontology – a map of words so richer than a list of words. And basically what we did was we said what emotional terms and expressions people could use with basic emotions. We allowed for intensifiers, for negation, etc. We have over 800 words, phrases, and substring matching as well. This system analyses around 2000 tweets per second.

We built the ontology with an English Language and Literature PhD level research associate, with training in linguistics and discourse analysis during a three month time window. They looked at 600MB of cleaned tweets on 63 different UK-specific topics/search-terms datasets. We focused on explicit declarations of emotions. And we tested that and reviewed it. And we built a Natural Language Processing pipeline. This starts with data pulled in from the Twitter API, we had terms we wanted to monitor live so we collected new tweets repeatedly, polling regularly. For most events we caught most tweets, for some the rate limiting will have meant missed tweets.

So, the pipeline included checking whether a verb or a noun – helpful for understanding meaning of expressions. We used a tree often used in spell checkers to quickly match words and phrases… which means it is very fast! And you can use this to spit out the appropriate basic emotions. We checked this tool against manual and other techniques. It performed to good or excellent accurate. So, we had a system so we decided to run this across some events. We used the Twitter Search REST API 1.1 and continuously retrieved during an event. And often the search term or hashtag chosen to find good data set, often trending. This was about being on top of the news and initiating the process – e.g. Nelson Mandela’s death – and ask for the system to gather tweets and being careful to do that in the right place (e.g. putting names in quotes).

We did this for 25 distinct events, over 1.5 million tweets. And there are 28 separate datasets from this (http://emotive.lboro.ac.uk/resources/ECSM2014/. Not all tweets have an emotion though, about 12% do, standard deviation of 9%. But the five most emotional datasets related to particular news stories – mainly about the nurse who committed suicide in Australia, mainly shame. And death of Daniel Pelka also very emotional tweets. But something more positive – Chinese new year – did trigger lots of emotions. And some hashtags more emotional than others, even around the same event (#september11 anniversary tweets less emotional than those tagged #twintowers).

Around the Woolwich incident we see really interesting ranges of emotions – anger at Anjem Choudary after appearance on newsnight. Sadness, disgust and surprise around the incident itself.

Looking at the September 11th anniversary in 2013 we had a range of sadness and shock. But a few odd blips of happiness – some casually mocking, some claiming to be from terrorists. And than you have some odd tweets – more quirky mixes of surprise or disgust.

And we then have a graph of emotions across a number of events – #JamesGandolfini; Ariel Sharon; Daniel Pelka; Nelson Mandela. Mandela was ill for some time but surprise a strong emotion around his death. A reasonably high level of happiness around Ariel Sharon for instance.

But I want to go back to the death of that nurse. We have a lot of sadness, of shame, of disgust. The ones associated with her person high for sadness and shame. For the radio station you see happiness highish – use of sarcasm there but not for her personally because that didn’t seem appropriate for her.

So there were some basic correlations… we saw happiness-sadness negatively correlated (-.614). Anger-confusion are correlated (.444). anger-disgust (.370) etc. But interesting to see how these emotions correlate with mentions in tweets (-.402) – interesting but based on a small data set. So we want to analyse a much bigger data set.

The other thing we did was clustering, looking for similarities of events based purely on emotional responses. So we saw bank holiday and chinese new year cluster together… some less obvious connections – Daniel Pelka, woolwich, horse meat and g8 summit. Interesting emotional clusters here, quite interesting.

So, we have this tool. We want to look at racism for instance. Our future work will want to be with more data although, as far as we know, this is the biggest study looking at emotions. And we want to look at emotions over time and how they change.


Q: To follow up on question on timings of events and picking up trends… different times of day seems to change engagement online… people may not engage when they are work.

A: A good point. We did look at volume of tweets over time so, for instance for September 11th anniversary you see activity all day, but daytime in US you see peaks. But that was a day and a bit only. But Prism and NSA was over a month. Mandela five days after his death were still quite active. But when we do time series analysis we will focus more on that.

Q: The reason I mention it is because you want the best data you can for when engagement is high.

A: it could effect outcome but we had the issue of not that much data in some cases, less of an issue. For us it was just data collection. Could be important in other studies

Q: What did you do with tweets with more than one emotion in them?

A: We took it case by case, so we assumed you are expressing both…

Q: Is there a range for the emotion?

A: Like a score? Yes, the literature is there. There is a range for each expression, intensifiers etc. and we used that to work out the scoring of that intensity. And we have stronger and less strong words.

Q: But if you have one word showing both fear and disgust together?

A: Independent scores, yes, for both.

Using Twitter for What? – Lemi Baruh, Koc University, Turkey

This is a very small study on how people use Twitter – or report using Twitter – during Gezi Protests. This was part of the Cosmic project that looks at social media in crisis situations.

A bit of background: Turkey is ranked 154th out of 179 countries in terms of press freedom according to 2013 figures – it has gotten worse in the last year. Critics argue that the Turkish media companies have mainly changed hands in the last 7 years, the influence of the ruling president. And at the end of May in 2013 a relatively small sit-in protest against the removal of trees for a new redevelopment project in Taksim square was violently evicted. Protests spread around Turky. Agenda evolved to move onto media and media bias (e.g. Turkish CNN ran a Penguin documentary during protest), often expressed via social media.

So we did a quick study with an online survey administered via Qualtrics. Survey conducted between June 10-June 29th. 10 days after protest started – as it took 9 days for ethics approval. We sent email invites and shared via social media. It took 15 minutes to complete and out of 890 started the survey, 230 completed. 64% female. mean age 28. 54% indicated being students at higher education institution, Internet use of 4 hrs per day. politically active. In many ways this group did not represent Turkey in any way, even the protestors, but it gives us some indications and insights.

We asked the sample how they got news, before the protests they mainly used websites of newspapers, social media and some TV. But after the protests began a huge drop in use of websites of newspapers and big rise in social media usage. They didn’t necessarily trust it… but they needed up to date information, and a desire for first hand information. They reply to email, to a tweet, want to verify what is really happening. About 20% of respondents said that mass media did not cover the protests, another 16% said mass media were biased. These individuals talked about filtering and finding information themselves. For some social media was about getting the feeling of participation…

And when we asked about activities performed on Twitter during the protests we had them report that they frequently read tweets from accounts they follow, reading tweets from accounts that they do not follow, retweets and tweets were less often done. And we saw a lot of people undertaking information verification. They verify with friends on location, they check with multiple sources online, and they check with mass media/news sites. That is despite individuals saying that they did not trust mass media. Some people did searches for information, some did direct background checks.

So in terms of the results. We had respondents indicating the extent to which they would categorise their use of Twitter during Gezi Protests as orientated towards a continuum go “voicing your opinions� and “share news/updates�.

In analysing the data we identified four types of Twitter users. Close to half were “Update Hubs� – getting information in, sharing onwards with minimal opinion in. Then we had about 22% of update seekers – using Twitter to read news/updates and for learning about what others have shared. Then Opinion Seekers (19%) seeking opinions. That remaining Voice Makers group (around 17%) were the actual opinion makers.

We compared these segments around uses and gratifications, focusing on surveillance, self-expression, relationship maintenance, connectivity. The opinion makers didn’t just use Twitter to share their opinions, but also to build their networks. And in terms of types of activities we saw a few significant differences. We saw most retweets from Update Hubs. Replying to tweets much higher in Voice Makers group.

The Opinion Seekers had significantly lower trust in information from Twitter than members of the other segments, interested in information verification, consciously checking information through multiple sources before resharing information. Voice Makers are less likely to cross check.

Conclusion. Well the main drivers of Twitter use ere were mistrust in mainstream media, the desire for access to direct information, willingness to spread information and voice their opinions. Preference for Twitter did not necessarily mean that users trusted social media as a source of information. Cross checking across different social media was commonplace.

And the four segments, whilst all motivated to get information, had quite different preferences and characters.

And finally I would like to acknowledge my co-author Hayley Watson at Trilateral Research and Consulting in the UK, and the European Union for funding this research.


Q: I know your survey sample was skewed but how representative do you generally think that those who were tweeting about these protests were, compared to the wider Turkish population, or those interested in those protests?

A: the people actively tweeting during the event were like this skewed sample… but after the event the pro government side started tweeting much ore actively. It’s reported that the current ruling party has actually recruited thousands of people to tweet on their behalf… we have reportedly got professional trollers for the party… Has shifted post event and now both sides are likely to be tweeting.

Q: Did you include data from those who did not complete your survey?

A: No, many of our respondents stopped when we ask about political views… they were happy to talk about social media but not about their politics.

Q: How did you do segmentation?

A: We used two step cluster analysis rather than hierachichal clusters – the latter I tried first, but didn’t work well for this data. Also tried random forest decision tree with the data – decided not to predict anything!

Q: Looking at your title.. As a marketer I am much more interested in segmentation and why you are focusing on particular controversial events.

A: The reason why this is happening is because this is a project funded by the European Union, we saw an opportunity to gather data for our research on crisis. But on the other hand we have just finished completed collection of data on American audiences on more general twitter using segmentation analysis. We did some work on privacy preferences which was quite revealing.

A Case Study of the Impact of Instructional Design on Blogging and terms Networks in Teacher-Training Course – Minoru Nakayama, Tokyo Institute of Technology, Japan

Social media can be useful in university courses – online discussion using blogs, wikis, discussion boards etc and can allow discussion and sharing of knowledge about the given concept with classmates, and promote critical thinking and interactive learning (Leh et al 2012, 2013). Good for fostering class discussion, attractive features of social media technology, sharing and collaborative filtering (Educase 2005). But the effectiveness may depend on the type of activity and that’s where instructional design comes in. And in terms of learning topics it can sometimes be useful to use a concept map concept.

So, how do you use concept mapping idea in online discussion? Well mapping discussion content (postings) to the concept map. Lexical analysis can illustrate relationships in discussion texts and individual term networks (Rabbany et al 2012). So we undertook a small case study in an online course looking at whether the online discussion can be illustrated using lexical graph visualisation techniques, and what features of this were.

The online course is fully online, at graduate level, on “Instructional Technology” which looks at how to design an online course. There are a series of assignments for the final projects which include discussion boards and blogs. They have specific blogging tasks which include a content task – a lesson plan for online course (to be posted to their blog); critique – every participants did a critique of two peer’s content and were required to address good/strength points; and the third task was suggestions  – every participant made suggestions for peers.

So in the case of critiques, the participants were required to address only good/strong points and suggestions. There were options for more controlled (critque) and open ended (suggestions) entries here.

In terms of participants only five students gave consent for us to use their postings so a fairly small sample and covering several different blog types. So, with that data, we undertook lexical analysis and mapping. We used TreeTagger to extract nouns and extracted consequential nouns as 2-gram. Concurrent relationship can be summarised in adjacency matrix, and that can then be illustrated as a directed graph. So you can see a score for each noun indicating the points of connection, and that can be graphed…. most nouns have some form of connection. We also gather this type of map in order to analyse the texts, and we can then look for points of connection, density of connections etc.

So, looking at graphs we can compare the number of words and number of 2-grams for both the critique and suggestions, looking for similarities, complexities, etc. and differences between critique and suggestions. When you look for closeness you see a real scattering of the critique and suggestions. And most terms were centralised in critiques. By contrast in the suggestions there is a much less central pattern to the use of words.

So in terms of understanding the blog communication, that allows us to build rubrics with specific criteria for particular activities. The lexical analysis can be used to directly evaluate post – concept map can represent term networks. Some features of the postings in terms are measured. The analysis can be applied to the course design – so we can compare the appropriateness of the online discussion design – the controlled versus open ended tasks. And some difference in centralisation of term mapping were observed.


Q: That’s a nice objective measure of different learning activities. Were all posts analysed for lexical analysis. Did you differentiate between the key posts and the social conversations. That conclusion that the closed tasks led to more focused discussion is good, it’s plausible, but may be missing sociable stuff.

A: This is a fully online course… students will discuss things beyond the course. We can analyse all of those posts but we didn’t in this work. The blog post can also be evaluated in other ways and the text analysis compared.

Q: These were graduate students, in what kind of class…

A: This is part of a teacher training course.

Q: I wonder if that makes a difference in terms of the types of posting being done, would it make a difference in your results?

A: They are students in instructional design. They may use social media that we cannot measure. But using the blog posts for the Instructional Design course gives us a point of focus to analyse.

A Massive Open Online Courses Odyssey: A confessional account – Alejandro Ramirez, Carleton University, Ottowa, Canada

Firstly thank you for being here, because a confessional account requires an audience! And my title has both aspects of learning… Odyssey – the original way to transmit knowledge – but also MOOCs! The most modern of learning.

I came to think about technology in education when we were redesigning the curriculum in my school, and we decided to start using social media tools as they needed that competence in these areas. And I was about to go on Sabbatical when the MOOCs exploded! I thought there was a lot of hype taking place but there was always the worry part of the idea being that they may have to force people to use MOOCs. So I decided to spend my sabbatical researching MOOCs. So I thought that I should start by learning more about distance learning, and to think about the context of MOOCs.

MOOCs are not a revolution, it’s more of an evolution. We now have students very savvy with technology – they engage all day long… or they could be engaging in technology without even going into the technology. So it’s not a revolution… and it takes a long period of time for things to change. Technology is more reliable today and that gives us competence to use it properly. If we have MOOCs today it’s because we have Wedermeyer that came up with the concept of distance learning. And that concept is about transposing what we do in these rooms today into a distance setting – engaging in conversations with each other, to learn things, to ask questions, how can technology enable that to occur? That’s the promise of distance education.

I teach a course at first year and a course at fourth year. You see a real change in the students. In first year you see them think the university will be the answer to all the questions that they have, and at the end they realise that it is up to them to make the change, to learn, to take those skills into the future with them. Notes in the 21st century is taking a picture on the iPhone. It’s about remembering the content, things are changing, they expect me to change to.

So I looked at various aspects of the research and decided I could use ethnography. van Mannen (2011) advocates immersion as a student, so, I registered for a MOOC. That was the best way to understand what that experience is. So I registered in a MOOC to immerse myself. And I needed to keep track of the observer in me so that I could track the process. I wanted to be more aware of the process of learning using technology.

So, these big MOOCs were offered for major universities to reach out to wider audiences. You can view a list of courses, you have to create an account, and that’s it, you have registered. At first I was a bit skeptical that the Massive part might be an issue. At the end of the day I knew I would sit alone in my room doing the work. The Online part isn’t different from Open University or distance learning so I wanted to focus on the massive. What were my assumptions about what would happen in this course? I did the process, I did the homework, I viewed the lectures… and I recorded what had changed, how my expectations had changed. So, the first course was offered in fall of 2012, running September to November. It was offered by UC Berkeley via edX. That course was a foundation in Artificial Intelligence and also get hands on experience implementing AI algorithms in a video-game themed context. It included coding in Python which I hadn’t done before, I learned that online to do the course.

On day one I met the massive impact of the Massive factor. I had a question and there are lots of names, and TAs but there was no email address for them. There is a forum. I had no answer to my question. I still haven’t had an answer. That could have caused me to abandon the course, many did. 100k were registered. Less than 10% would finish the course. And of that only 5% had credit for it and passed. But that is the model. And we need to understand why, and what are the expectations for that…

So, to see what happens elsewhere I registered in a course on data science, offered by Washinton University via Coursera. I started to engage but the same thing happened. Again you cannot ask a direct question of anyone, you have to use the forum.

Whilst I waited for that course to take place I was invited to take an ICT in Education in Spring 2013, offered by UNAM in Coursera, in Spanish. But it was more or less the same thing. And there was more or less the same issue. And I needed to create two email accounts in order to be able to take part. I spent most of my day going through videos… these are really very good. Universities have learned from YouTube generation and from TED. There are subtitles, you can pause videos. They are spiced up with some tests to make sure you are listening. Most questions and assignments need to be done only by watching the videos. And one of the thing they have learned is that only the students who already have a degree actually watch the videos, others skip them. Not great. BUT it is a computer mediated teaching where the facility is within the tools that we use. But we have forgotten we can use technology to really engage the students. If we are able to capture the engagement of the students and reflect that back, see patterns, and maybe do that so that they can actually learn. Right now that is not available.

So in terms of my conclusions I see that computer mediated learning has had some missed opportunities. The computer is a means to an end… when it works… you want a conversation, the computer is just the means. It is the adaptable tool to help you to use the computer to achieve your goals and needs. And so there is opportunity there.

The second thing is that the computer is the other observer in an ethnography that we cannot use. It tracks what you are doing. And that could be used as feedback to the students, for them to understand habits and patterns.

And, since they are free, MOOCs are not upset about abandoning their courses. Hopefully we in universities can use them more effectively because they are great ways to engage and spread information. We can use the technology to learn to do things better, since our students are eager to use this technology.

I think that the future of MOOCs will be when we take out more than we pay for…


Q: sort of a question and observation. The original MOOCs were about collaboration and sharing and totally based on social media… and we are all a bit upset with this style of MOOC. But Harvard did a super one using a lot of social tools built in… it was about 2010… it was damn good and before the MOOC surge. I don’t think this style of MOOC is a dead end but I think we should be researching other ways of teaching crowds.

A: I think universities are presenting this option as a way to do research. But they miss the opportunity to empower the students who have signed up to the course. Maybe if they told me that they wanted me to be a research subject in that course, it would be different. If the issue is up front we can learn from that… bit of that history of massive opportunity, maybe it will change. We have to recognise that things have changed.

Q: I didn’t quite understand to which extent students acted as mentors to other students… a logical way to do this stuff at scale, based on pre-exercises perhaps. And secondly our business school we have a different approach where our staff can select MOOCs and report on what is learned, have workshops on how to adopt and use this knowledge. And also that idea of credit bearing courses, the paying for credits. MOOC seen as input to knowledge sharing in classroom.

A: I learned in this research that we have technologies to empower students, not to allow me to suddenly teach 10,000 students. But a bit of your comment before… yes there were groups that started to emerge from these MOOCs. I had an invitation in Facebook for people taking this course, at this time, in a given language and lots of communities popped up like that. But when posting questions etc. there were so may threads… overwhelming… scale was so huge. There are opportunities but they need more management. But I like the idea of having it blended, bringing the MOOC back into the classroom.

Learning from others mistakes: how social media etiquette distorts informal learning online – Me!

A link to the Prezi will appear here shortly once all happily synced from my machine to the web – currently the web is a version behind!


Student Social Media Showcase (#SSMS2014) and Mixed Methodologies Seminar (#ECSM2014)

Today I am at the Student Social Media Showcase (#SSMS2014) and the Mixed Methodologies Seminar, both precursors to the European Conference on Social Media (#ECSM2014) which I will be at until Friday. I’ll try to liveblog most of the conference days but today I’ll be posting notes as this is a loosely structured day. The Showcase, being Storified here, brings together both students and academic delegates of the conference and, for the student social media showcase, over 100 local school children as well as local businesses and apprenticeship schemes operating in Sussex. Both the conference and today’s event’s have been organised by the Brighton Business School, at University of Brighton.

This morning, while the kids have been experimenting in the creativity suite, I have met the organiser of ECSM2015 (which will be in Portugal), and we have been hearing about the DV8 Sussex Apprenticeship scheme which has been placing students, aged 16 to 23, in businesses from very small cafes to big social media agencies, on specific digital media and social media apprenticeships. They spend four days a week at their employer, and one day a week at college taking a number of social media, digital media, and marketing modules. It sounds like a really interesting scheme and the two students we met this morning seemed like great representatives of the scheme – they will be running hands on experiments in running mini campaigns for the students.


Asher, one of the main organisers, is talking about social media and how central it is in business and marketing, and the business school’s recognition of the centrality of social media in our day to day lives. Today the focus is on what social media means for us, for the kids in the audience, and for jobs. And Asher is also talking about some work on “what is it students get out of studying?”, we think that the most important thing is learning how to learn… if we give you a seminar on Snapchat, it will be out of date in 6 months time, so the important thing to learn is how to research this stuff, how to learn about it, and how to think about what social media can do in business, in media, in the arts.  And as you look at the displays around the building you will see work by students that demonstrates that.

Sue: When we knew we would be hosting this event we went out looking for partners from the local community. We knew that the research conference would bring in people from across the world, but we also wanted to pull in local graduates and near graduates, but also local employers, and schools. We want to see how this all works, and we plan to do it again and again every year. We should have lots of spontaneous conversations… talk to anyone, see what they do, what they use… And there will be stuff every hour in this theatre – and we have five students you can talk to right away…

Tom English: I will be talking about Snapchat and ASOS, and how Asos could use Snapchat to sell their clothes

Cecilia: I’ll talk about Zara and how they use Facebook, Twitter and Pinterest to communicate with customers

Abiola Oduwasi: I’ll be talking about how people prepare to present themselves for the jobs market – graduates and recruiters

Sean Fitzsimons: promoting your writing and journalism through social media

Alice Britton: I did a project on how Bagelman, a local business, used social media for their business


Running throughout the venue today are screens showing digital media presentations from students. Some nice case studies that I’ve already seen included a presentation on beauty bloggers and brands’ use of sponsored posts – where the blogger receives direct or indirect benefit from the brand for writing about them. Some examples were shown and some research suggesting that consumers find reviews useful no matter whether or not they have been paid for was quoted – an interesting finding in the blurry authenticity space that is social media. More on that in Lu, Chang and Chang 2014.

Brighton Fuse Project: Why Social Media needs all your skills – Dr Jonathan Sapsed

It will be good to talk to you today about this project, the Brighton Fuse Project, a research project looking at new media, digital media, and creative industries in Brighton. There is real clustering of these industries in Brighton – you see it in Shoreditch, in Bristol and Bath to some extent, Salford, etc. There is no one big company in Brighton drawing people in – unlike the BBC in Salford – so we wanted to see what was drawing them to Brighton, what attracts them. And we also saw that these companies need people like you (the teens in the audience), and all your skills.

This was a £1.5 million project with University of Brighton, University of Sussex, Natioanl Centre for Universities and Skills, BBC Academy, etc. involved. And Ed Vaizey welcomed this report and it’s findings on the Brighton CDIT. We’ve had a lot of interest because we looked at how the creative industries and skills really intersect with business. And we’ve also seen a huge investment made in Brighton to encourage these industries, to improve infrastructure and the quality office space for these high growth creative businesses. These sorts of things can be exposed through this kind of research, and you can then talk about how to address this.

So what is “fusion”? Well the combination of creative design skills and digital technology skills, the mix of artists, programmers, and business skills. One of our participants from Plug-in Media talked about how important the relationship between creativity and tech is. And we’ve known that idea, that concept of fused content, is important for a long tie for converging platforms – games, tv, mobile, online, etc.  But we didn’t know the extent to which this fusion was needed in sectors like social media. So lots of these digital media companies who have been running since the 1990s are increasingly adding design skills, social media skills, it’s about working out what the company desires, what they will want next, how a campaign can engage people more, to sell more. So you need those sensibilities of the analytical, segments, and patterns of search but also the creative skills and sensibilities for this space.

We looked at entrepreneurs… those who did their first degree in Arts and Humanities or Design are about 48% of the entrepreneurs. That was a bit of a surprise. And those with more degrees, with PhDs, their businesses often were doing even better. And whilst STEM and Computing folks were also doing well, it was equally as well as those from Arts and Humanities backgrounds.

But we also found that some firms are more fused than others. Some – about a third – are specialist so only really employ developers, or only really employ designers. About a third have some mix, and then we have the “super fused” who are dependent on having a tightly integrated mix of these skills. In terms of what types of companies are represented here… the Digital Agencies are more likely to be super fused, as are design services. And the least fused were arts organisations – but that’s probably a good thing, they need to be specialists in my opinion. On the whole fused businesses correlated positively with innovation and turnover growth. The super fused firms grow three times faster than unfused companies. That mix is very important.

So, looking at business models, the firm iCrossing, probably the second biggest digital agency in terms of employees in Brighton, do lots of work as “creative technologists” for various big firms, including Rolls Royce. Now they have a small customer base, they are happy with sales levels, but they want their brand to be more popular…  [brief break as kids leave] So Rolls Royce is an example of a company not looking at sales as a measure. But they had 14 measures of engagement in social media – really playing into the geeky side of what they do, the craftsmanship is shared via YouTube videos and shares of those… so it’s about good creative skills, how to make that interesting and enticing engagement, that is needed. So those 14 measures also get used for triggering payments to iCrossing. Each time they meet a target there, they get paid. So iCrossing employs programmers, journalists, copywriters, graphic designers, tim makers. They are looking for “Creative Technologies” job roles.

And an iCrossing campaign – which I can show now the kids are gone – was for Ann Summers and around paid search (YouTube: Ann Summers: Sexy Paid Search). So this was about using high interest news related web searches that hijack that news story by triggering related ads – for the budget, the BA Strike in particular – and got a good reception and impact for clients – click throughs, media coverage, a huge boost in profile etc. So for that client they have that client on a retainer – giving space for creative ideas, something thought of on the fly. That’s a particularly useful space for experimentation, for lateral thinking, for trying stuff out that is clever rather than high tech, trendy stuff perhaps. Counter intuitive stuff.

We found high levels of innovation in the cluster… and we used the types of innovation used in the European Innovation Survey… usually they find 60-65% innovation but for this cluster in Brighton  99% innovation. And more innovation in super fused companies. And 37% of firms allow time for personal projects – and that allows space for unexpected products and services for the firms.

Fusion is linked to innovation but… it’s not new to the world technology, traditional R&D, or protected by patents. Instead it’s service-oriented, continuously attending to user-experience and design. The value is hard to capture, in spire of £231m revenues across the 500 companies we looked at.

In terms of location… these organisations work for some local firms 40% ish of the companies do local, often business to business work for each other. A good 56% work for clients in london. And about a quarter work for international clients. And these firms are relatively young… the average respondent is 41.7 years old, two thirds of respondents are in their 30s and 7.8% in their 20s. And there are real cross overs of backgrounds… some have STEM backgrounds (22.89%) but many are from Arts and Humanities, Design, Business Management or Economics… but some have, say, stage management degrees… and they bring that creative background to bear on their work.

And the people working in these companies… only 8.4% always lived in Brighton. Many moved to Brighton for the lifestyle (e.g. one of the most successful web company CEO’s cited Britain’s only Vegetarian Shoe Shop as a reason he moved to Brighton!), many for personal reasons. Rarely do they move to find a job, for professional reasons… we think that is starting to change… there’s a kind of second wave here… many of these companies started in the 90s and they need people like you guys to be part of that next wave… And Ian Elwick, Founder-Manager of Brighton Media Centre and The Werks cite the support, the peer communities, these physical co-working spaces, those types of aspects as being important to these communities [we are now watching video – findable on the AHRC website along with the report – on these types of spaces, how they foster knowledge sharing and “being a good corporate citizen in the modern world”].

There are a lot of different styles of network events… there are cheese and wine events… but those are not so much about help, collaboration, contracting in a business sense… and those engaging in those benefit in material terms… So, a good example. Black Rock Studio, a big developer which was acquired by Disney. They did so well for 10 years they were brought by Disney… something happened… probably a failure of marketing for two big games… closed in 2011… made all of their 279 staff redundant… but a whole group of “black pebbles”, companies started by former employees, set up… and they create apps, small games, smaller scale stuff… some work for hire… some brought out by big Shoreditch company… they meet up, they help each other out, they use social networks online and offline, supportive culture there that is so important to clusters. Though fusion tends to be weak at community level, strong at a business and project level.

But it’s not all perfect news… some risks and barriers facing these companies. Fused firms face skills barriers, they find it hard to find the right skilled candidates. Easy in Brighton to recruit good design hirees, but paid search, product managers, etc. are not skills easily found. Sometimes they have to hire more technical roles through London. That limits growth. They find it hard to find the right people with the right skills… and larger firms perceive artistic community as a barrier… perhaps too laid back, too bohemian according to some. The recession and skills barriers were the main issues facing these firms at the time of the report.

But a key conclusion for us is that arts and humanities is key to interdisciplinary interaction and innovation and economic growth… but the HE system can be suite set again interdisciplinarity, often fields of study are quite separate and that’s not a good fit for creating these fused individuals. And this is a really organic cluster in Brighton, it’s hard to create that sort of effect artificially… policy makers often want to support a wide geographic range of locations but we think they should fund succeeding clusters more, to stimulate growth there…. to let that growth be organic…


Q: You didn’t mention Brighton SEO… are you aware of any other conferences or similar happening that cement Brighton as a digital hub…

A: There are lots of those but tend to be very segmented and just known to that sector. In September Reasons to be Creative… and another which Warren Ellis is involved in, Deconstruct,… lots of these things… Twitter is the place to look for these things… a lot more smaller meet ups, in pubs, etc. and a great way to meet and make connections and find jobs, etc. That stuff leads to pub chat… I know one guy, now a senior manager for Electronic Arts in Montreal, who got the leads that led to that job through a pub chat…

Q: If you were designing a module or similar what would you include to address gaps… stuff to support such clusters in future…

A: We’ve talked a lot about this… but a lot of the message that comes from businesses themselves is that comfort with technical and creative sides is essential. And knowing how to manage a project, to be organised, to show leadership, also key. And we’ve thought about ways to best deliver that… practitioners say that graduates aren’t industry ready… and you ask them to help and to get involved in course design… and they are too busy to help… But the bureaucracy of developing courses, and the existence of disciplinary silos, can be the enemy of those sorts of skills…

Asher: if you are a graduate and you have experience of creative writing but never done SEO… or vice versa… what are the first steps to being part of this fused economy?

A: A lot of these skills are very much self-taught… a lot of people learn in that way. A lot of people hire someone they know with those skills and pay them for a morning to teach them on an ad hoc basis – as courses often exist that help with that. And they learn through others…

Information Visualization for Knowledge Discovery: Big Insights from Big Data – Ben Shneicerman, Professor of Computer Science at University of Maryland

One of the fun things here I think is the breadth of types of people involved in these spaces, as we heard before in Jonathan’s talk. Steve Jobs used to talk about his work being at the intersection of technology and the liberal arts. I am based at the Human Computer Interaction Laboratory, an interdisciplinary research community of Computer Science, Information Studies, but also Psychology, Sociology, Education, Journalism, and the wonderful Maryland Instute of Technologies for Humanities. Now many of you may know me from the book Designing the User Interface. Now the stuff you will be talking about at this conference was a real driver for the most recent update, in 2010, to that text. More than 5bn people have mobile phones now and they are changing the world, the way that we interact around health, around community. We have mobile, desktop, web, cloud. We have diverse users, diverse applications… so many opportunities to explore the world around us…

Now today I am going to talk about “Big Data”. In 2012 a release from Obama, announcing a Big Data initiative and talking about visualisation, talks about developing scalable algorithms for processing imperfect data in distributed data stores, and creating effective human-computer interaction tools. So we need to be teaching the key skills of visual reasoning, which we don’t usually teach… In 1999 we published a collection of papers on information visualisation. That area has now massively grown so no longer possible to capture in a book – the web gathers that whole world of papers that is emerging. But we do get some new directions… Jim Thomas and Kristin Cook wrote about the concept of Visual Analytics, Illuminating the Path, in 2004 (online for free). And in Europe Daniel Kein wrote on visual analytics (also available for free).

Now… one of our graduates set up an information visualisation company called Spotfire, growing a business out of their research work. For instance a visualisation showing Retinol’s role in embryos in vision – a rare example of a single image acting as an important research finding. That’s a rare occasion… but that tool became well known for genomic, biomedical, oil and gas discovery, etc. So…. increasingly visual tools are being used… we see a move to large display walls (10M to 100M pixels) helping productivity… Bloomsburg uses arrays of 8 screens with very fixed windows having huge value… we see radiology workstations with multiple displays to see a brain scan… some with 16 displays showing last weeks as well as this week’s scans… these sorts of workspaces are becoming common – multiple people sharing, collaborating, around multiple screens.

We are also seeing small screens (1M pixels and less) having a real impact… mobile screens with data such as Google’s expansive transportation interfaces through their maps, and historical data on that… There is a huge amount of data, our job as designers is to organise that, to understand data needed to make decisions…

So, the information visualisation mantra (and I once wrote this a dozen times in a paper – now cited over 27k times!):

  • Overview – the full range of items
  • Zoom and Filter – let the user do that, find what they want…
  • Details-on-demand – let the user drill into the data

The most compelling part here is the centrality of the human user. It’s not just about the algorithm…

And if we think about the last 50 years of Scientific visualisation in 1D Linear (Document Lens; SeeSoft, Info Mural), 2D Map (GIS, ArcView, PageMaker; Medica Imagery) and 3D World (CAD, Medical, Molecules, Architecture) forms… and they have a great future. And we now have the new area of Information visualisation… often about muti-variable data (Spotfire, Tableau, Qliktech, Visual Insight), Temporal (LifeLines, TimeSearcher, Palantir, DataMontage); Tree (Cone/Cam/Hyperbolic/SpaceTree/Treemap); Network (Pajek, UCINext, NodeXL, Gephi, Tom Sawyer). Loads of blogs here that are worth a read: Flowing Data; Perceptual Ledge; Etc.

So, let me go to the first demo… traditionally we often look at temporal data… for instance Stock Market Data. So… overview first… so looking at a year… February has a lot of uncertainty. Now you (an audience member) mentioned a “spike”… is that a spike upwards? Or downwards? We have the wrong language for visual reasoning yet! Now we can zoom into this data… look through this data…. seek patterns… Information visualisation allows you to see new patterns, new changes, to ask new questions. So with this [demo] visualisation you can create a pattern and look for that in your data set… but people were interested in how one might do the opposite – make a pattern and explore by inverses of that pattern… that’s thought patterns you can’t explore on paper and you can do it rapidly, and readjust them on a screen… You can try out and test hypotheses easily with these tools – and you can try this out, look for “TimeSearcher”. TimeSearcher was designed to do time series for stocks, wealth, genes, and to work with large data sets and allow the user to really shape interactions.

Now another tool we built was LifeLines, an attempt to create a visualisation for Patient Histories – with the overview acting as routes into that medical history, to understand changes, medications, interactions… And one of the nice things I like is that visualisations can also show you what isn’t there… harder to do algorithmically… but you can see gaps that might be concerns, questions, it’s a starting point…. we thought one patient was good, but a million patients would be better… so we worked with some data from the Pediatric Trauma Centre in Washington DC and using a tool we built called EventFlow (also free to download). The hospital (via video recordings then transcribed) record initial checks – airway, breath sounds, distol and central pulse in the first few minutes… and then you get longer for the secondary checks… Looking over a large set of data (216 patients) you can get a sense of how quickly secondary checks occurred… And you can spot anomalies in how staff conducted checks – not dangerous perhaps but not the hospitals protocol…. And you can see all the ways that these patients have been seen, how they vary… the most common variance was starting the disability check before secondary checks… there are some repetitions… some took ages to get their checks done.

So talking about Treemaps… that was our work… for instance SmartMoney Stock Data… looking at a terrible day you see a single blip of good activity – a real clear contrast… often you see patterns that are more subtle… but that visual training happens when data is spatially fixed, when you can spot change…

Treemap: Newsmap (work by Marcos Weskamp) looks at global news items and the number of online articles on a given topic… you can compare countries’ coverage directly… again, a free to use/explore visualisation.

And we did some work with the Hive Group on tree maps for Nutritional Analysis. SpotFire added tree maps in 2007, Tableau now has it. the New York times have used tree maps now. And a German researcher developed the idea of Voronoi tree maps – they look cool and organic it can be hard to read. There is a design aesthetic aspect here, these look cool but are hard to compare size of spaces.

Manual Lima has a great site called VisualComplexity.com with thousands of network visualisations…

And the work we did was in a tool called Node XL, it’s free to download and use, and it’s a network overview for discovery and exploration in Exel… designed to show interactions and connections between people… So for instance can be used to see voting in the US Senate… And you can use NodeXL to directly import from Facebook, Twitter, YouTube etc… feel free to create another importer tool… So one of our first experiments was for #WIN09 Conference back in 2009… and you could see from the 80 people in the room a kind of split between two groups of people – computer scientists and sociologists – and in the tweets you saw that clearly shown… just one cross over in a graduate student!

And that sort of connecting and cross over issue is even more compelling in political discourse… So we did this for the #GOP tweets… you could see a very cohesive densely connected group of republians. A less connected group of democrats. And a few cross over people… but they talk within their group but very little interaction between them. Cross over only via Politico. Media consumed between these groups otherwise really diverged…

But, this work kinda works…. but not a great way to visualise… using grapes for inspiration we tried to restructure around smaller clusters, separations, etc. in a more clear to view way…. for instance used in looking at #SOTU (State of the Union Address).

And… a researcher called Scott Dempwolf who looks at Innovation Networks… he took data on companies, patents, grants from government agencies… 26k edges, 11k nodes…. so he has created a beautiful visualisation for Pensylvania Innovation Networks… but hard to read…. so we tried to break this down a bit…. found a major pair of nodes who hold a lot of patents…. And you see real cluster of some of the big players in innovation…. Westinghouse Electric and the Navy being key drivers here…. So drilling down you see the big players…

We asked Scott to show us something on Maryland…. he created a visualisation for our lab…. again looking at connections and gaps… we can also look at innovation in Chicago to see how we see clusters here… You begin to see the finer grained structure more clearly when you have a visual way into the data…

Recently we published this on the Pew website – you can see Node XL Gallery for more of this sort of data – looked at Twitter network structures: polarised crowds; Tight crowds; Brand clusters; broadcast network; community clusters; and support networks… for those doing customer support via Twitter…

So, you can read more. You can find out about our Social Media Research Group. And we also want to talk about not only business but also other spheres in which these tools can help, for instance the UN Millennium Development Goals… Some progress towards their goals… Bill Gates is helping with next goals… The Gates Foundation is a big user of Node XL… in that presentation earlier we saw visualisations via Bar Charts but understanding interactions is key here.


Q: I’m sure over the next few days we’ll see a lot of papers with statistical analysis… what would your advice be for business and finance academics to get papers more visual, and get published…

A: A good question. You do see Science and Nature moving to printed visualisations… they are static…we have a long way to go to make those interactive… by contrast the web and blogs are much more interactive and visual… and increasingly you see that supplemental stuff – video or interactive website – online. Science encourages you to have a website, data if possible, and visualisation tools with your papers. Actually  there is an annual competition around visualisation run by Science and partners…

Q: This is on errors and potential for misrepresentation… with many of these tools there is so much potential to accidentally misrepresent the data…

A: You are right of course… statistics can lie, data can lie, and visualisations can lie… you can use colour, labelling, etc. in misleading ways. But for any visualisation I think an intelligent understanding can reduce that impact. But the majority of datasets I get into my office have errors that the person whose data set it is didn’t know about it…. I was looking at emergency room admissions data recently… 8 patients in that data were 999 years old… those kinds of errors are widely found in data, or a patient admitted 14 times, but discharged only twice… And you have people using flawed data to predict sales but miss one month when their sale is on! Statistics without visualisations risk never spotting that error… visualisation provides a sort of microscope, telescope… new ways to explore and understand our data. And you need a new sort of literacy, that concept of visual reasoning. And the tools have made that possible…

Q: You talked about a lack of vocabulary… what should we be using?

A: We have a tool, not quite as polished as a shape finder, but the question is can you make a measure of the spikiness of each spike? In books you see standards about what is and is not a spike. During a discussion a student suggested something brilliant… using the angles within the spike to find sharp spikes, and also areas of fall and rise. So we have started to explore this sort of stuff… but of course volatility can be a measure… but there are interesting shapes that we ca use and explore here… you have concepts like “value line”, sizes of plateau. It’s a rich space we’ve only just started to explore in the shape finder.

Q: In terms of the methodology to create these models… I am interested in customer journeys between social media channels, capturing those touch points between platforms…

A: You have some systems, like Klout, that gives you numeric data… but we are interested in networks here…. IBM did a project with their internal networks of these things, of connections in discussion. My colleague did work with emails, to see cohesiveness of discussions… but we are only 5 or 6 or 7 years into this social media world… but it’s definitely an opportunity to do good… And again there is an effort from the National Cancer Institute to use social media to make health related opportunities, for smoking cessation, obesity reduction, etc…. to get changes through use of social media… And you see media networks evolve. Jenny Priess and I wrote a paper called “From Reader to Leader”… On Wikipedia only 1/10th of 1% ever make an effort… and only 11,000 admins…. so we need to understand the dynamics of that… how one goes through that path, what the motivations, rewards, recognition, to encourage people along that path… The sciences of the natural world have been successful for 400 years but I think the science of the made world, of social structures, etc. is the science of the next 100 years.

Q: You mentioned bar charts etc. in my presentation earlier. We have looked at new ways to present this data… info graphics etc… there are a lot for quantitative data but fewer for qualitative data…

A: Well one step back…. it’s not about visualising your data…. it’s about your goal, your question, what are you trying to answer… in your data there was clearly more there… a simple taste of what’s possible… the network structure of these community might be interesting…. so it might be a geographic relationship… but you need to know the questions first, and use that to decide what you need, what you will find in the data, how you make new opportunities happen.


Mixed Methodologies Seminar – Professor Dan Remenyi 

Dan Remenyi is introducing himself as an itinerant academic, who teaches research methods at various universities and also supervises PhD students.

When I completed my PhD, rather late in life, I felt the most interesting part was the research methodologies but I felt like I needed to learn more in that area, and had a lot to learn. I have supervised a lot of PhDs now and most actually use “mixed methods” but, a bit like “reflection”, you needed to do this stuff… you have to do that… these days you can’t just do it, you actually have to write about, to describe that stuff. If you use the phrase “mixed methods” about your research – and I’m going to counsel you not to necessarily do that – you have to be able to say why you did that, what that means, what the implications are…

So today we will talk about what Mixed Methods really is, and how you talk about it… You should all have had the slides in advance… I took those slides and put them into Wordle… you can see I’ll be talking about Data, about Mixed Methods, and about Synthesis… Now… as I progress down this road of talking about research methodology I’ve learned that it is so important to understand the vocabulary of the research world, how to use them appropriately…. Some are easy perhaps but some are much more tricky. You should know these… I suggest you create your own glossary where you really pin down your own understanding of these words… You need to know what they mean, you need to be able to defend your work.

Now, lets talk Mixed Methods… Well this is an expression, some call it a misnomenclature – it really doesn’t explain what it does (a bit like Life Insurance, of Jumbo Shrimp, some often refer to “military intelligence” as the same type of misnomer!). Why? Well there is almost no way that methods can be mixed. What we mean is using both qualitative and quantitative data to make a convincing argument… In the previous talk the speaker talked about charts, visualisations, and that the research question is absolutely key. And that’s the case in methods… but think slightly wider than that… in actual fact when we do research the research enables us to understand better the research question, and come up with possible answers for it…

So what is usually meant by Mixed methods is that combination of qualitative and quantitative data in research. In your research you need to be contributing to the academy, both in terms of the findings and the theoretical aspects of the field. And you have to convincingly make your case. There is still a lot of confusion about Mixed Methods. Researchers sometimes lose sight f the fact that evidence, of whatever sort, is a constituent of the argument which underpins the findings. The challenging part is bringing these different dimensions of the argument into a convincing whole.

At it’s heart Mixed Methods is a research design issue. You can adjust that plan as you go along, academia is essentially about self-improvement… your plan will always emerge and involve as you go along. A research design might start with what data you require to answer the question, then think about how you will collect it. How will you analyse it? How will you use it to establish some findings? And increasingly you are expected to interpret those findings, to talk about what the implications of your research is.

So the term Mixed Methods is being used in two senses…

  • – There is an emerging school of thought, or community of practice, that argue for the use of mixed methods research design.
  • – There is the research practice which has been in place for decades which have called upon researchers to use different methods at different times, stages, phases in their research. Indeed it is hard to use an entirely quantitative approach in research.

Now, not all researchers welcome the concept of Mixed Methods… some think you have to be world class and that you cannot be world class quantitively or qualitatively…. the aspiration is to be world class but I think you can be extremely competent at both. But the philosophical argument is trickier… the ontological argument is that you can either be a realist – positivist, quantitative type road – or a relativist and that that takes you down the more constructivist, analytical route. In reality we are often a combination of both in reality…

Now the key person in this area, he has made it his own, is Creswell. He says you cannot tell your story unless you can put together the numbers behind your research and to tell the stories behind those numbers. He says that numbers never speak for themselves… you have to be able to see the numbers and the facts in context. Paulos (1998) talks about statistics as being uninterpretable without context, background, their origins then they cannot be properly understood…

An example here… stats on home runs in the US Baseball league show increasing numbers of home runs… what’s happening? More matches? More training? More reporting of games? Changes in recording measures? More rewards for better players? Stand out players like Babe Ruth? But a more important reason… they banned cheating! Generally Baseball was played in the afternoons… and the light got dimmer… flood lights weren’t great… pitchers started messing with the ball, spitting on it, rubbing it in the dirt… and the batter could see the ball…. How will you know that just looking at numbers? You won’t, you need some other form of research to understand that data. (For more on these stats Dan recommends Bill Bryson’s book A Short History of Nearly Everything – a great book for PhD students to read as, essentially, a history of science. And his book One Summer: 1927 include those statistics… in that book the most important thing is Charles Lindberg flying the Atlantic….)

Now, there is another phrase you need to be aware of and that is “Multiple Methods”… If you are using multiple methods in the qualitative arena then some say you are using Multiple Methods, that Mixed Methods is exclusively for the combination of Qualitative and Quantitative Methods. You also hear Combined Methods, Hybrid Methods, and (from an audience member) Multi-Level Methods.

A few really important distinctions… At the highest level research can either be Theoretical – this is based on secondary data, data that has been previously been published, and already-established ideas and you create something new from those existing ideas. Empirical Research is about the collecting of data. Now data is a hugely contested term, there is a surprising lack of papers on data… when I questioned what data was in a statistics department they thought I was mad but data is a really tricky term, I’ll come back to this.

Now, in theoretical research is highly linked to empirical research, but always relating that back to theory, and using existing empirical data.

And then we have the two major paradigms of Positivist which is about the qualitative world, numbers (mostly), the process is deductive so there are hypotheses that you are attempting to reject (you try to reject it, if you don’t you accept it pro tem), it’s interpretation with a “little i”. And we have Interpretivist approaches… an inductive process, uses a wide range of data… and it’s about taking that data and from it attempting to form a hypothesis from that. Now the vast majority of research is deductive, a faster process. An inductive approach can take longer and require much more data… Now… Mixed Methods sits between these, straddling both positivist and interprevist perspectives. And following a side chat on mathematical methods, mathematics fits not quite anywhere into these research paradigms… The concept of Ocham’s Razor is useful here: the explanation that the idea that is simplest is best… In general we can never say we have proved something… the only thing that is certain is that we know what we don’t know… But we can say “the evidence suggest”, or “it appears from the evidence”… that can be said… much harder to say that “the evidence shows this is true”.

Now… a comment on Qualitative and Quantitative research and how they differ…

In Quant: You articulate the research question, you collect evidence, you process evidence (questionnaire) – only after you have collected data, and you produce findings…

In Qual a learning loop is involved: you articulate the research question, you collect evidence (interview), you understand the question as you process the evidence and you really have a loop, you learn as you go, and you do produce your findings.

There are alternative approaches too… Action research often takes an iterative approach for instance.

Of course Mixed Methods can be used in theoretical work… you might collect data to support a theoretical perspective. And Mixed Methods are particularly useful in interdisciplinary work. And it can also be useful in applied research, where there are blurred boundaries between topics…

So we have 12 steps in research design:

Setting the course

  • 1. Field of study exploration and conceptualisation
  • 2. Literature review
  • 3. Research question
  • 4. Research design

Moving the project forward

  • 5. Data acquisition …………………… when is triangulation relevant?
  • 6. Data management
  • 7. Data analysis
  • 8. Presentation of findings

Completion Issue

  • 9. Theory development
  • 10. Research question resolution
  • 11. Implications for practice
  • 12. Limitation and future research

Each step informs the next step, although the research process is not a water fall based project

Remember that to do competent academic research we not only have to understanding our data and analysis of that but we also have to understand all of the arguments in the body of knowledge, and we have to be able to articulate that. And that has to feed into the research design.

There are different ways to approach Mixed Methods research…. One way is to start with qualitative data as a way to reach understanding, and to design a quantitative instrument (e.g. a questionnaire) that is then deployed and leads to findings… It’s a big deal to create a questionnaire from scratch! And in this approach each step is distinct. You take two steps… one step followed by another… the mixing is very minimal…

But there is no reason not to take a different approach… You use an established research instrument to gather data, then you conclude that stage, and you take a qualitative approach next, in order to reach your findings. That’s a perfectly respectable Mixed Methods approach.

Now you can also take what they call a “supportive mixed methods” design… here you have overlap between types of research, you can benefit from understanding the data of one type in your work collecting data of another type. Now I like metaphor… so take the buttress (flying and not)…. someone pointed out to me that the way that Cathedrals are built is fundamentally unstable… will push the walls out… and that’s why buttresses, and flying buttresses came about. And I like to think of scientific discovery as not always standing on it’s own without data from a variety of different sources. Multiple sources of validation are always welcome… they act like buttresses… (and now we have a side chat in which Dan makes  the point that doctoral students should not touch longitudinal studies… “that’s a different methodological world”).

You should know that academic research gives you a great deal of flexibility in what you do. It is based on peer review – your papers will be seen by at least two people reviewing it – but there is a lot of flexibility as to how you do it. Paul Feyerbiant wrote a famous book, a difficult book, called “Against Method”. And in that book he says the only universally accepted academic research methods, and that is “anything goes”! It doesn’t mean you can be sloppy… it means no one can tell you how you must do your research, or what you cannot do… you can do it your way as long as you can convincingly argue your case, and show that you are contributing to the academic body. As long as you can argue that your methods got you to the right answer, you have to be able to argue your methods, to justify them… I had someone come up for examination who had done 35 interviewers… a particularly tough examiner who said he needed more… but how many do you need? Well you need as many as need before you reach the point of data saturation… you have to be able to justify the number that is acceptable. As it happened this guy went out and found a whole load of papers showing that 35 could be a valid number… this is part of why you have to understood the literature… you have to have read everything that can be read about your topic… And the other thing about academic research is that you have a lot of flexibility but you have to use the language consistently, and to understand the meaning of those words… we had a chat before about what it means to be longitudinal… it means an extended period of time… is that 3 months? 3 years? 3 weeks? For anthropologists they conduct ethnography, they talk about a lived experience… how many of us in the business or management world truly have a live experience… Ethnography is, as a word, taking liberties there… but we can talk about being “ethnographically informed”, by the same token we could talk about “a longitudinal type study”. Teet was talking about interviews over a few months as being not a snapshot… but argued appropriately you could use some of that language of longitudinal language… Because, as we’ve said, we have to be clear of making a clear and justifiable case for your choice of methods… We have so many methods but you have to be clever about how you put your argument together…

So… back to a third model for Mixed Methods… this is a parallel or converging Mixed Model… Where you undertake quantitative and qualitative research in parallel… now I have gone light on talking about “triangulation” here… some people love that term, some hate it… to be precise the word is borrowed from land surveyors who use various tools to map particular features, measuring from different angles… social scientists have borrowed that term to talk about different perspectives… now when I did my research 25 years ago I was told triangulation was a way to resolve conflicts and contradictions in the data… that is nonsense… by being able to look at things through different perspectives, different lens, different data, different people… you get a richer understanding of the question, of the issues involved. Now some say the term “triangulation” is too positivist, that something like corroboration is better…. I don’t really mind… more perspectives is usually better. BUT…. it is tempting to believe that the more panoramic the view, the better… and that may often be the case, but is not always true….  Sometimes putting all this extremely rich view into a cohesive whole can be really problematic… Research does not seek complexity for it’s own sake… If you have a credible answer to the research question from one or two data sources then the job is probably done… Answering the research question is the paramount issue.

So in this third approach, the parallel or converging mixed method design… we will get two sets of data, from two different sources, and bring them together into an argument… and we will draw on both sets of data to draw our conclusions… There’s no other sense in which we want to mix it… Now in the literature you will see some discussion of putting numbers into words and vice versa but I am not convinced by that. Some critical issues… were the two different data collection strategies driven by the same research question? If not, then why to? Was the same research logic used for both – i.e. inductive or deductive? And are the results commensurable? They don’t have to be but you will have to argue your case well, you have to change your argument and explain any contradictory results. And again, you have to answer the research question.

Now, reflection is central to research. It has always been necessary. But it’s now really important to be able to discuss it… Reflection may be defined as a process of questioning the range of activities and thinking which have been performed by the researcher in order to surface any inadequacies or bias which may be present in research. And why you have come to the conclusions you have come to.

Reflexivity – and the piece in MIS Quarterly is worth reading – is about seeing the interrelationships between the sets of assumptions, biases and perspectives that underpin the different facets of the research undertaken. So you might ask yourself what assumptions are at play when you start your research? All research starts with assumptions that there will be an answer to the question, that that question is worth answering, and that the process of answering that research question will change you, will develop you to a higher level in the case of a doctorate for instance. Reflexivity is about understanding that, of understanding biases… nobody likes to feel that they are biased… but you can’t get away from the facts what you are… so I’m a white, British, elderly, academic… all of those mean expectations and values… I might work against those but there are always some residues there… You also want to ask yourself what values of yours affect your research? So all of us have the shared values that knowledge is important for instance, we want to learn more. As someone in academia you also have to believe there is some value in sharing, that’s part of being an academic… you could explore all of that much further of course… but that’s what we mean by reflexivity.

Some mixed methods researchers talk about integrating the qualitative and the quantitative data so that an overarching analysis can be performed… so about how and when you mix the data… now I argue that we are really talking about synthesising the arguments. And the test of an argument is whether it convinces… There are various types of evidence which include data, authority and logical inference… So in academia argument is used to support theoretical conjectures. The way we learn is influenced by the Greeks… Socrates, regarded as close to a tramp, walking around picking arguments, who developed the idea of the dialectic… and that is how academia works… you articulate a thesis… you float an idea, then someone does the “ah, but…”, they correct the idea or take the antithesis… and then you put those together, you synthesise them, and create a new idea… and that re-articulation of thesis starts a new cycle… that’s an ancient concept that still underpins academia.

Now, Teet earlier mentioned a model like an Advanced Mixed Methods Design, something which may result in a case study, experiment or action research project. But what actually determines the method? This can be influenced by your background… an engineer may not want to work in qualitative research, a humanist may not want to undertake complex equations… So it may be about the scale of the work required, the skills that you have and, in the case of doctoral students it may also be about the influence of the supervisor or culture of the institution.

And with that, we are done.



Digital Scholarship Day of Ideas 2014: “Data” – LiveBlog

Today I am at the University of Edinburgh Digital Humanities and Social SciencesDigital Scholarship Day of Ideas 2014 which is taking place at the Edinburgh Centre for Carbon Innovation, High Street Yards, Edinburgh. This year’s event takes, as it’s specialist focus, “data”. These notes have been taken live so my usual disclaimers apply and comments, questions and corrections are, as ever, very much welcomed.

Introduction: Prof Dorothy Miell, Head of College of Humanities and Social Science

I’m really pleased to welcome everybody here today. This is our third Digital Scholarship Day of Ideas and they are an opportunity to bring in interesting outside speakers, but also for all of us interested in this area to come together, to network and build relationships, and to take work forward. Again today we have a mixture of international and local speakers, and this year we are keeping us all in one room so we can all hear from those speakers. I am really glad to see such a popular take up for the day, and mixing from across the college and Information Services.

Digital HSS, which organised this event, is work that Sian Bayne leads and there are a series of events throughout the year in that strand, as well as these events.

Today we are going to be talking about the idea of data, particularly what data means for scholars in the humanities, how can we understand the term Big Data that we hear in the Social Sciences, and how can we use these concepts in our own work.

Sian Bayne, Associate Dean (digital scholars) is introducing our first speaker. Annette describes herself as an “itinerant researcher”. Annette’s work focuses on internet and qualitative research methods, and the ethical aspects of internet research. I think she has a real talent for great paper titles. One of my favourites is “Undermining Data” – which today’s talk is partially based on – but I also loved that she had a paper entitled “Fieldwork in Social Media: What would Manonovsky do?”. Anyway, I am delighted to welcome Professor Annette Markham.

Can we get beyond ‘data’? Questioning the dominance of a core term in scientific inquiry - Prof Annette Markham, Department of Informatics, Umeå University, Sweden; Department of Aesthetics & Communication, Aarhus University, Denmark; School of Communication, Loyola University, Chicago (session chair: Dr Sian Bayne)

As Sian mentioned I have spent a lot of time… I was a professor for ten years before I quit in 2007 and pushed myself across other disciplines, to push forward some philosophical work on methods. For the last 5 years or so I’ve been thinking about innovative and creative ways to think of methods to resonate better with the complex and complexity of modern life. I work with STS – Science and Technology – scholars in Denmark, Informatics scholars, Machine learning Scolars in Boston, Language scholars in Helsinki… So a real range across the disciplines.

The work today is around methods work I’ve done with colleagues over the last few years, much is captured in a special issue of First Monday: Vol 18, No 10: Making Data – Big Data and Beyond Special Issue. And this I’m doing from a post humanist, STS, non positivist sort of perspective, thinking about the way in which data can be used to to indicate that we share an understanding when actually, we are understanding the same information in very different ways. For some data can be an easy term, consistent with your world view… a word that you understand in your own method of inquiry. Data and data sets might be familiar parts of your work. We all come from somewhere, we all do research… what I say may not be new, or may be totally new… it may resonate… or not at all… but I want this to be a provocation, to make you question and think about data and our methods.

So, why me, well mainly I guess because I know about methods… so this entire talk is part of a bigger project where I look at method, at forms of inquiry… but looking at method directly isn’t quite right, but looking at it from the side, from the corner of your eye… And to look at method is to look at the conditions in which we undertake inquiry in the 21st century. For many of us inquiry is shaped by funding, and funding priviledges that which produces evidence, which can be archived. For many qualitative researchers this is unthinkable… a coffee stain on field notes might have meaning for you as an ethnographer but how can that have meaning for anyone else? How can that be archivable or sharable or minebale.

And I think we also have to think about what it is that we do when we do inquiry, when we do research… to get rid of some of the baggage of inquiry – like collecting data, analysing and then writing up as there are many forms of inquiry that don’t fit that linear approach. Another way to think of this is to think of frames, of how we frame our research. As an American Scholar trained in the Chicago School of Sociology is that I cannot help but cite Erving Goffman. They both tell us to focus on something, and to ignore other things… So if I show you a picture of a frame here…. If I say Mona Lisa you might think of that painting. If I tell you to look outside of the frame you might envision the wall, or the gallery, or what sits outside that frame. And if you change the frame it changes what you see, what you focus on… so if I show you a frame diagram of a sphere and say that is a frame, a frame for research what do you see? (some comment they see the globe, they see 3D techniques, they see movement). The frame tells us to think about certain phenomenon…. to also not think about others… if I say Mona Lisa now… we think of very different things… Similarly an atomic structure type image works as a very different type of frame – no inside or outside but all interconnected node… But it’s almost impossible to easily frame, again, Mona Lisa…

So, another frame – a not-quite-closed drawn circle – and this is to say that frames don’t tell you a lot about what they do… and Goffman and others say that frames work best when they are almost invisible…. like maps (except say the McArthur Corrective Map). So, by repositioning a map, or by standing in an elevator the wrong way and talking to people – as Harold Garfield had his students do – we have a frame that helps us look differently at what we do. “Data” can make us think we look at the same map, when we are not… Data may not be understood as a shortcut term of a metanym, it could be taken rather as preexisting aspects of the phenomenon – have been filtered and created through a process, and organised in some way. Not the meaning I want for my work but not good or bad…

So I want to come back to “How are our research sensibilities being framed?”. In order to understand inquiry we have to understand three other things. (1) How do we frame culture and experience in the 21st Century; (2) How do we frame objects and processes of inquiry; (3) How do we frame “what counts” as proper and legitimate inquiry?

For me (1), as someone focused on internet studies, I think about how our research context has shifted, and how has our global society shifted, since the internet. It’s networked for instance. But also interesting to note how this frame has shifted considerably since the early days of the internet… So taking an image from the Atlas of CyberSpace – an image suggesting the internet as a tunnel. But city scapes were also common ways to understand the world. MIT suggested different ways to understand a computer interface. This is about what happened, the interests in the early days of the internet in the 90s. That playfulness and radical ideas change as commerce becomes a standard part of the internet. Skipping forward to Facebook for instance… interfaces are easy to understand, friendly, almost all social media looks the same, almost all websites look the same… and Google is a real model for this as their interface has always been so clean…

But I think the significant issue here about socio-technical research and understanding has been shaped by these internet interfaces we encounter on a daily basis.

For me frame (2) hasn’t changed that much… two slides…. this to me represents any phenomenon or study – a whole series of different networks of nodes connected to the centre. There is no obvious starting point. Not clear what belongs in the centre – a person, an event, a device – and there are all these entanglements charecterising these relationships. And yet our methods were designed for and work best in the traditional anthropological fieldwork conditions… And the process is still very linear in how we understand it – albeit with iterative cycles – but it’s still presented that way. And that matters as it priviledges the neat and tidy inquiry over the messy inquiry, the inquiry without clear conclusions… so how we frame inquiry hasn’t changed much in terms of inquiry methods.

Finally, and briefly, (3) my provocation is: I think we’ve gone backwards… you can go back to the 60s or earlier and look at feminist scholars and their total reunderstanding of scientific method, and situated research. But as budgets tighten, as research is funded under more conservative conditions this stuff that isn’t well understood isn’t as popular… so we’ve seen a return to evidence based methods, to clear conclusions, to scientific process. Particularly in media coverage of research. It’s still a dominent theme…

So… What is data?

I don’t want to be glib here. The word “data” is awefully easy to toss around. It is. In every day life this term is a metanym for lots of stuff, highly specific but unspecified stuff. It is arguably quite a powerfully rhetorical term. As Daniel Rosenburg says the use of the term data has really shifted over the last few hundred years. It appeared in the 1760s or so. Many of those associated with the word only had it appear in translations posthumously. It is derived from Latin and, in the 1760s, it was about conditions that exist before arguement. Then as something that exists before analysis. And in that context data has no theoretical baggage. It cannot be questions. It always exists… has an incontrovertible it-ness. A “fact” can be proven false. But false data is still “data”. Over time and usage “data” has come to represent the entirity of what the researcher seeks and needs in pursuit of the goal of inquiry. To consider the word in my non-positivist stance, I see data as “what is data within the more general idea of inquiry”. In the mid 1980s I was taught not to use that word, we collect materials, we collect artefacts as ethnographers… and we construct… data… see even I used it there, so hard not to. It has been operationalised as discreet and uncontrovertible.

Big data has brought critical responses out, they are timely and subtle responses… and boyd and Crawford (2011) came up with six provocations for big data. And Nancy Baym (2013) also talks about all social media metrics being a nonrepresentative partial sample. And that there is an inherant ambiguity that arises from decontextualising a moment of clicking from a stream of activity and turning it into a stand alone data point. Bruno LaTour talked about this too, in talking about soil from the Amazon, of removing something form it’s context.

And this idea disturbs me, particularly when understanding social life as representated in technology. Even outside the western world, even if we don’t use technology, as Sonia Livingstone notes, we are all implicated in technology in our everyday life. So, I want to show you a very common metaphor for everyday life in the 21st century – a Samsung Galaxy SII ad. I love this ad – it’s low hanging fruit for rhetorical critique! It flattens everything – your hopes and dreams offered at equal value to services or products you might buy… and flatterns as equal in not infitesimal bits that swirl around, can be transmitted, transformed, controlled – as long as we purchase that particular phone. An interesting depiction of life as data – and humans and their data as new. It’s not unusual and not a problem as we don’t buy into it as a notion, uncritically.

This ad troubles me more. This is Global Pulse, an NGO, a sub committee of UN, that distributes data on prices in the developing world. It follows the story of a woman affected by price shifts. So this ad… it has a lot of persuasive power and I want to be careful about this arguement that I make to conclude…

I really like what we get from many big data analyses. I have nothing against big data or computational analysis. Some of the work you hear about today is extroadinary, powerful… I won’t make an arguement about data, about data to solve certain problems. I want to talk about what Kate Crawford talks about as “big data fundamentalism”. I wouldn’t go that far… but algorithms can be powerful but not all human experience can be reduced to data points. And not everything can be framed by big data. Data can be hugely valuable but it’s important to trouble what is included and what is missed by big data. That advert implies data can be understood as it happens. Data is always filtered, transformed, framed… from that you draw conclusions. Data operates within the larger framework for inquiry. We have to remember that we have strong and robust models for inquiry that do not focus on data as the core of inquiry. Data might be important – it should be the chorus not the main player on the stage. The focus of non-positivist research is upon collecting the messy stuff….

And I wanted to show a visualisation, created in Gephi, by one of my colleagues who looked at Arab Spring coverage in media and social media in Sweden… In doing this as he shifts the algorithm he is manipulating data, changing how the data appears to us, changing variables to make his case… most of the algorithms of Gephi create neat round visualisations. Alex Galloway critiques this by saying that some forms may not be representable, and this tool does not accommodate that, or encourages us to think that all networks can be visualised in that way. These visualisations and network analyses are about algorithms… So I sort of want to leave it there, to say that data functions very powerfully as a term… and that from a methodoly perspective it creates a very particular frame that warrants concern, particularly when the dominant context tells us that data is the way to do inquiry.


Q: I enjoyed that but I find you more pessimistic than I would be. That last visualization shows how different understandings of that network as possible. It’s easy to create a strawman like this but I’ve been reading papers where videos are included in papers… the audience can all think about different interpretations. We can click on a data point, to see that interview, to see that complex account of that point. There are many more opportunities to create richer entanglements of data… we should emphasize those, emphasize that complexity rather than hide the complexity of how that data is created.

A: Thanks for finishing my talk for me! If we consider the generative aspects of inquiry then we can use the tools to be transparent about the playfulness of interrogation, by offering multiple interpretations… I talk about a process of Borrow / Play / Move / Interrogate / Generate. So I was a bit pessimistic – that Global Pulse ad always depresses me. But I agree!

Q: I was taken by your argument that human experience cannot be reduced to a single data point… what else can it be reduced to… it implies an alternative to data… so what might that be?

A: I think that question is not one that I would ask. To me that is not the most important question. For me it’s about how we might make social change – how might I create interventions, how might I represent someone’s story. I’m not saying that there is an alternative… but that discussion of data in general puts us in that sort of terrain… and what is more interesting or important is to consider why we do research in the first place, why do we want to look for a particular phenomenon… to not let data overwhelm any other arguments.

Q: I think your talk noted that big data focuses on how people are similar and what similarities there are, whilst ethnography tend to be about difference. That makes those data tracking that cover most people particularly depressing. Is that the distinction though?

A: I think I would see it as simplification versus complexity… how do we envision inquiry in ways that try to explode the phenomenon into even a more complex set of entanglements and connections. It may be about differences but doesn’t have to be… its about what emerges from a more generative process… it’s an interesting reading though, I wouldn’t disagree.

Q: I wanted to share a story with you of finishing my PhD, a study of social workers when I was a social worker. I had an interview for a research post at the Scottish Government and one of the panel asked me “and how did you analyze your data” and I had never thought of my interviews and discussions as data… and since then I’ve been in academia in 20 years but actually I’ve had to put that idea, that people are not data, aside to progress my career – holding onto the concept but learning to talk the talk…

A: I can relate to that. You hear that a lot, struggling to find the vocabulary to make your work credible and understandable to other people. With my students I help them see that the vocabulary of science is there, and has been dominant… and to help them use other terms to replace the terms they use in the inquiry, in their method… these terms of mine (Borrow / play / move / interrogate / generate) to get them thinking another way, to make them look at their work in a different way from that dominant method. These become a way that people can talk about the same thing but with less weighty vocabulary, or terms that do not carry that baggage. So that’s one way I try to do that…

Crowd-sourced data coding for the social sciences: Massive non-expert coding of political texts - Prof Ken Benoit, Professor of Quantitative Social Research Methods, London School of Economics and Political Science (session chair: Prof John McInnes)

Professor John McInnes is introducing our next speaker, Professor Ken Benoit. Ken not only talks about big data but has the computational skills to work with it.

I will be showing you something very practical…. I had an idea that I’d do something live… so it could be an Epic Fail!

So I took the UKIP European Election Manifesto… converted to plain text in my text editor. Made every sentence one line… put into spreadsheet… Then I’m using CrowdFlower with some text questions… So I’ll leave that to run…

So back to my talk… the goal is to measure unobservable quantities… we want to understand ideology – the “left-right” policy positions… we have theories of how people vote, that they vote to parties most proximate to their own positions. For political scientists this is a huge issue. We might also want to measure corruption, cultural values, power… but today I’m going to focus on those policy positions.

A lot of political science data is “created” by experts… a lot of it is, frankly, made up. A lot of it is about hand-coded text units – you take a text, you unitise it…. e.g. immigration policy statements… (Comparative Manifesto Project, Policy Agenda Project). Another way is Solicited Expert Opinion (Benoit and Laver, Chapel Hill, etc) – I worked with Laver for years looking at understanding of policies of each party. It’s expensive work, takes an expert an hour to fill out a form… real headache… We have expert-completed checklists (Polity, Comparative Parliamentary Democracy Dataset, Freedom House, etc.). And there are Coded International events (KEDS, Penn State Event Data). And we have inductively scaled quantities (factor analysis such as “Billy Joe Jimbon Factoral analysis).

So what are some of the problems of coding using “experts”. Who are experts anyway? Difficult to find coders who are suitably qualified. It’s hard to find them AND hard to train them… most of the experts coding texts tend to be PhD students who find it a pleasing thing to do whilst avoiding finishing their thesis. There can be knowledge effects since no text is ever anonymous to an expert coder with country knowledge. Human coders are unreliable – their codings of the same text unit will vary wildly. And even single coding is relatively costly and time-consuming. So only one coder codes each text. Even when you pay the experts, they are still doing you a favour!

So I will talk about an alternative solution to this problem, and that problem is about classifying text units. So the idea is to observe a political party’s policy position by content analysis of it’s texts. And party manifestos are most common texts. The idea behind content analysis is breaking text into small units and then using human judgement to apply pre-defined codes. e.g. coding something as right wing policy. And usually that is done for LOTS of sentences by only ONE coder.

Tomorrow I’ll be in Berlin… the biggest (only?) game in town is the Comparative Manifesto Project (CMP). This is a huge project with 3500 party manifestos from 55 countries from 1945-2010 though still going. Human coders are trained and have PhDs. They break manifestos into sentences, human judgement to apply pre-defined codes. Each sentence assigned to one of 56 policy categories. Category percentages of the total text are used to measure policy. And each manifesto is seen by just one coder, and coded by just one coder.

So… what could we do… crowd-sourcing involves outsourcing a task by distributing it to an unspecific group, usually in parts… based idea of this, versus expert coding is that it reduces the expertise of each of the coders, but increase the number of coders. Distribute texts for coding partially and randomly. Increase the number of coders per sentence. Treat different coders as exchangable – and anonimous, and we don’t care if sitting in internet cafe in Estonia in their underwear, or whether they engage on a day off from a bank…

The coding scheme here is to have a more simplified coding scheme. We applied it to 18 of the “big 3″ British party manifestos from 1987 to 2010. So a sentence can be coded as Economic, Social or neither… under either of the first two categories there are further options (anti, neutral or pro) from “Very left” to “Very right”, or “Very liberal” to “Very conservative”. And there is a 10 question test to show correct codings, to guide the coder and to keep them on track.

So, to get this started we wanted a comparison we understood. We wanted to compare crowd coding to expert coding. So my colleague and I, and some graduate students, coded a total of 123,000 sentences between us… With between 4 and 6 coders per manifesto and using the same system to be deployed to the crowd. This was  a benchmark for the crowd sourcing end of things. This took ages to do… we did that…. that’s a lot of expert coding… and in practice you wouldn’t get this happening… For the crowdsourced codings we got almost twice as many codings…

We used an IRT type scaling model to estimate position. We didn’t want to just take averages here… we used a multi nomial method here. We treat each sentence as an item, to which the manifesto is responding, and the left or rightness (etc) as a quality they exhibit. Despite that complexity we found that a mean of means approach led to very similar results. We are trying to simplify that multi nomial method… but now the results…

Comparing expert codings to expert surveys on economic and social positions look pretty good.. good correlation for economic particularly a thing that we’d expect – and we see.

We tested to see how best to serve up results… we tried the sentences in order and out of order. Found .98 correlation so order doesn’t matter…

For the crowd sourcing we used Crowdflower, a front end to many crowd-sourcing platforms, not just Mechanical Turk. Uses a quality monitoring system so that you have to maintain an 80% “trust” score to be rejected. Trust maintained through “gold questions” carefully selected and generated by experts…

So, we can go back to the live experiement… it’s 96% complete!

So, looking at results in two dimensions… if Liberal Democrats were actually Liberal would be right of economics and left of social… but actually they are more left on economics. Conservatives on the right socially but getting nearer the left in some cases… but it’s not about the analysis so much as the comparison with the benchmark…

When we look at expert codings versus crowd coders… well the points are all over the place but we see correlations of 0.96 for economic, 0.92 for social dimensions. So in both cases there isn’t total agreement – we have either have a small crowd of experts or a bigger crowd of non experts. Its always an average but just a matter of scale…

So, how many coders do we need? No need for 20 codes for a sentence if it’s clearly not about immigration policy… we did massively over sample, then drew sub sets there for standard error… we saw that estimates from our errors the uncertainty starts to collapse… The rate of collapse for experts is substantially steeper… for aggregate of these two processes you need five times more non-expert coders than experts. But you can run good codings with five coders…

So we did some tests for immigration policy… used 2010 British manifestos, knowing that there were two expert surveys on this dimension (but no CMP measures). Only coded immigration or not, and if immigration is positive or not. Cost about $300. Ran again, same cost, extremely similar results…

Doing this we had 0.96 correlation with Benoit 2010 expert survey. .94 correlation with Chapel Hill Survey. And between the two runs correlation of around 0.94. Would have been higher… the experts differed between the immigration policies of Labour and Conservative… were not obvious positions in the text… but they had positions that experts knew about…

So, who are these people? Who are these crowd coders? They are from all over the world… the top countries were USA, Britain, India and Estonia. One person coded over 10,000 sentences! Crazy person loves coding! The mean trust score rarely drops below 0.8 as you’ll be booted off if it does… You don’t pay or get data from those that fail. Where are these jobs being sourced? We tried Mechanical Turk… we’ve used Crowd Flower… there are huge numbers of these sites – a student looked at about 40 of these sites… but trust scores are great no matter how these people are sourced… Techniques are not all ideal… but they don’t stay in the system if trust score changes. No relationship between coder quality and platform…

Conclusions here. Non experts produce valid results, just need a few more of them. Experts have variance, have noise, so experts are just another version of a crowd with higher expertise (lower variance). Repeat experiments prove that the method is reliable (and replicable). Some places require your work to be replicatable… is data plus script a good way to do that? Here you really can… You can replicate everything here. You can redo in February what you did in December… with the right text you can reproduce the result. Why does this appeal? Well it’s cheap, it’s flexible. Great for PhD students who lack expert access. And you can work independently from big organisations that have their own agenda for a study. You can try an idea, run again, tweak, see what works… Can go back again… And this works for any data production job that is easily distributed into simple tasks… sign up for Mechanical Turk, be a worker, see what it’s like to actually do this… for instance for transcriptions of audio tapes… it’s noisy…. a common job is that they upload 5 second clips and you transcribe that… gives you pretty good human transcription that timestamps weaves back together. Better than computer method…

So, we are 100% finished with our UKIP crowdsourcing experiment… Interestingly 40 negative, 48 positive… needs further analysis…


Q: In terms of checking coders do the right thing – do you check them at the beginning or do you check during the process of codings?

A: Here I cheated a bit… used 126 gold questions from another experiment. You have to give a reason for each question about why it’s there – if the person doesn’t get it right then they get text to explain why that is the case… Very clear unambiguous questions here. But when you deploy a job you can monitor how participants responded or if they contested it… In a previous experiment we had so many contested responses that I actually looked again and removed it…

Q: A very interesting talk… I am a computer scientist and I am interested in whether now you have that huge gold data set you have thought about using machine learning.

A: Yes, we won’t let that go to waste. The crowd data too…

Q: I am impressed but have two questions… you look at every sentence of every manifesto… they are funny things as not every sentence is about the thing you are searching for – how do you deal with that? And a lot of what is in manifestos are sort of dog whistle things – with subtexts that the reader will pick up, how do you deal with that in crowdsourcing?

A: You get contextual sentences around the one you are coding, that helps indicate the relevance of that sentence, it’s context. In terms of the dog whistle question… people think that but manifestos are not designed to be subtle. They actually tend to be very plain, very clear. It’s rare for that subtlety to be present. Want truly outrageous immigration policy look at the BNP manifesto… every single area is about immigration, not subtle at all.

Q: I’m a linguist, I find it very interesting… and a question about tasks appropriate to crowdsourcing. Those that can be broken down into small tasks, and that your participants can relate to their daily life. I am doing work on musical interpretation… I need experts because I can’t see how to do that in language, in a way that is interpretable to non experts…

A: You can’t give something that’s complex… I couldn’t do your task… you can’t assume who your crowd is, we have very little information… we didn’t ask about language but they wouldn’t retain that trust score without some good English language skills. But workers have a trust score across projects so anything they can’t do they avoid as losing that score is too costly… You could simplify the task with some sort of task that can test corect or incorrect interpretation… but we keep the task simple.

Q: A very interesting talk, I have a quick question about how you set the right price for these tasks… how do you do that? People come from different areas and different contexts.

A: Good question. We paid 2 US cents per sentence. We tried at 5 cents and it was done very fast but quality wasn’t better. A job at 1 cent didn’t happen fast at all. So it’s about timings and pricing of other jobs.

Q: Could you say something about the ethics of this kind of method… you are not giving much consideration to the production of these texts, so I wondered if you could talk about the ethics of this work and responsibilities as researchers.

A: Well I didn’t ruin any rainforests, or ruined any summers. These people have signed up for terms and conditions. They are responsible for taxation in their jurisdiction. Our agreement with Crowdflower gives them responsibility. And it’s voluntary. Hopefully no sweatshops for this… I’m receptive to the idea of what ethical concerns could be… but couldn’t see anything inherently wrong about the notion of crowdsourcing that would be a concern. Did run past ethics committee at LSE. Didn’t directly contact people, completing tasks on the internet through third party supplier.

Q: You were showing public domain documents… but for research documents not in the public domain how would security be handled…

A: Generally transcriptions are private… but segments are usually 3 or 5 segments… like reading a document from the shredder basket… the system have that data but workers do not have access to that system

Q: But the system does have that so you need trust in the platform…

A: Yes.

Comment from floor: companies like Crowdflower have convinced companies to give them data – doctors notes etc. they have had to work on making sure they can assure customers about privacy of data… as a researcher when you go in you can consider what is being done in that business market in comparison

Q: Have you compared volunteer coders to paid coders? I am thinking particularly about ethical side of things and motivations, particularly given how in political tasks participants often have their own agendas. Might be interesting to do.

A: Volunteer crowdsourcing? Yes, it would be interesting to compare that…

Reading Data: Experiments in the Generative Humanities – Dr Lisa Otty, Lecturer in English Literature and Digital Humanities, University of Edinburgh (session chair: Dr Tom Mole)

Dr Tom Mole is introducing our next speaker, Dr Lisa Otty whose interests are in the relationship betweeen reading, writing and the technologies of transcription. And she will be talking about her work on Reading Poetry, and the process of what happens when we read a poem.

Now to be  a literature scholar speaking at an event like this I have to acknowledge that data is not a term typically used in our field. When you think about what we are used to reading texts are often books, poems… but a text is not neccassarily a traditional material but may also be another linguistic unit, something more complex. Taking the Open Archival Information Systems (CCSDS 2002) describes data as “a reinterpretable representation of information in a formalized manner suitable for communication, interpretatio, or processing”. Interpretation being crucial there. When we look at texts like books or poems those are “cooked” – edited, curated, finished. Data is too often not seen as that.

Johanna Drucker – in Humanities Approaches to Graphical Display (DHQ 5.1 2011) talks about data as Taken Not Given, Constructed from the Phenomological World. Data passes itself off as a priori conditions, as if same as phenomena observed, collapsing the critical gap between the data collection and observation.

Some of these arguements gel with some of the arguements around close versus distance reading. And I think it can therefore be more productive to see data as a generative process…

Between 2009-2012 I was involved in the research project Poetry Beyond Text (University of Glasgow, and University of Kent). This was a collaborative project so inevitably some of my reflections and insights are also collaborative and I would like to acknowledge my colleagues work here. The project was looking at interpretation of poetry, and particular visual forms of poetry such as artist boks. What these works share is that they are deeply resistent to being shared as just information.

For example Eugen Gomringer’s (1954) “silencio” is an example of how the space is more resonant than the words around it… So how do we interpret these texts? And how do our processes for interpretation effect our understanding. One method, popular in psychology, is eye tracking… a physical way of registering what you are doing. We combined eye-tracking with self-reporting. Eye Tracking takes advantage of the movements of a small area of the retina. So a map of concentration sees those little jumps, those movements around the page. But it’s an odd process to be part of – you wear a head brace with a camera focused on your eye. You get a great deal of data from the process. Where more concentration that usually indicates trickiness or challenge or interest in that section – particularly likely for challenging parts of text. From this data you can generate visualisations from this data. (We are watching a video of eye tracking process for poetry).

Doing this we found a lot of patterns. We saw that people did focus and understand space, but only when that space has significance in the process. In poems where space is more conceptual than nemetic. But interestingly people who recorded high confusion also reported liking them much more… With experiments with post linear poems the cross-linear connections. All people start with a linear reading patterns before visual reading. And that reflects the colour strip test – psychology test that shows that visual information trumps linguistic information… so visual readings and habitual reading processes are hard to overcome. We are programmed to read in a certain way… our habits are only broken by obstacles or glitches in the text we are reading…

Now talking about this project if I talk about findings I am back in that traditional research methods… and that would be misleading. We were a cross disciplinary team and so I am particularly interested in focusing on that process, on how we worked on that. The eye tracking data generates huge amounts of numerical data… we faced real challenges in understanding how to understand, to read this data… a useful reminder of the fact that data’s apparent neutrality has real repurcussions. Its one thing to make data open, another to enable people to work with it.

To my colleagues in psychology didn’t understand our interest in visualisations of numerical eye tracking data, it is an abstraction… and you have to understand the software to understand how that abstraction works. Psychologists like to interpret the data through the numerical data. They see visualisations, graphs etc. as having a rhetorical rather than analytical function. Our team were interested in that rhetorical function. We were humanists running an experiment – the framework was of hypotheses, of labs, of subjects… but the team came from creative practice background so this sense of experiment was also in play. In it’s broadest terms experiments are about seeing something in process and see how they behave, for scientists about testing hypotheses in this way, creative experiements rather different… For humanist analysis of these texts you have to deal with a huge number of variables, very much a contrast to traditional psychology experiements. For creative experiments there is a long tradition of work in surrealism, dadaism, etc. that poetry can unleash and disrupt our traditional reading of texts… they are deliberately breaking our habits. The reader of the literary form is a potentially revolutionasible(?) subject.

In Literary scholarship and humanities the process of reading is social, contextualised process. In psychology reading is a biomedical process, my colleagues in this field collapse the human and machine. In a recent article by Lutz Koepnick asked Can Computers Read? (2014) and discussed the different possible understandings of what reading is for.. our ideological framework of reading means to us… computational reading is less about what computers are, more about how we invest in them and envision them.

One of the things that came out of our project was the connections between poetry and psychology, and the connections to creative experiments.

To finish I want to talk about some examples of experiments around reading and what reading can mean.

The readers project – John Cayley and Daniel Howe (2009 – ) their work explores imaginative critiques of reading. Cayley is a literary scholar and has been working in digital production for some time. The readers project features “programmed autonomous entities”. Each reader moves through a text at different speeds and in different ways. So for each part of the experiment projections are used, and they are often shown with books, a deliberate choice. A number of interfaces are available. But these readers move according to machine reading rather than biomechanical reading. Cayley terms this an exploration of vectors of reading… directions in which reading might take of. It explores and engaged with new creative understandings of reading. This seems to be seen by Cayley in avant garde context. Emphasis on constructed nature of the work.

“because the project’s readers move within and are thus composed by the words within which they move, they also, effectively, write. They generate trxts and the traces of their writings are offered to th eproject’s human readers as such, as writing, as literary art.” (Cayley, The Readers Project website).

As someone engaging with these pieces the experience is of reading with, more than processing or consuming or analysing.

Tower – by Simon Biggs and Mark Shovman (2011), working at Hive, uses knowledge of natural language processing to build visualisations. When the interactor speaks their words spiral around them. And other texts are also present – the project is inspired by the Tower of Babel and builds up and up. Shovman’s previous work at Hive was on geometric structure. Biggs hope is that participants “will be enabled to reflect upon the inter-relations of the things that they are experiencing and their own contingency as part of that set of things.”

Michelle Kendrick talks about hybrids, that hybrid of human and machine interaction, the centrality of human investment in computer reading.

When I talk about this work I am overwhelmed by the rhetorical significance of words like “experiment” and the dominance of scientific research methods – the first interpretation of this work is often wrongly around seeing the work as applying scientific methods to literary interpretation.  But instead this work is about interpretation and exploring methods of understanding and interpretation.


Q: You talked about different disciplines coming together. Do you think there is a need for humanities researchers to understand data and computational methods?

A: I think we would all benefit from a better understanding of data and analysis, particularly as we move more and more into using digital tools. I’m not sure if that needs to be in the curriculum but it’s certainly important.

Q: One of the interesting things about reading is the idea of it being a process of encoding and decoding… but the code shifts continously… and a challenge in experimental reading or interpretation is that literature is always experimental to some extent because the code always changes.

A: I think the idea of reading as always being experimental… I think that experimental writing is about disruption… less about process but more about creating challenge.

Q: I was very struck in what you were presenting there in the Poetry Beyond Text project about the importance of spatiality and space… so I was wondering about explicit spatial understandings – the eye tracking being a form of spatial understanding…

A: We were looking at the way that people had been interpreting those texts in the past, in the ways people had looked at that poetry in the past… they had talked about the structural work of the poets themselves… and we wanted to look beyond that…We wanted to find out people’s responses to some of these processes, and what the relationship was between that experience and those critical views of those texts.

Q: Did you do any work on different kinds of readers – expert readers or people who had studied these works?

A: It was quite a small group but we looked at the same people over time and we did see development over time. We worked mainly with students in literature or art and most hadn’t encountered this type of concrete poetry before but were well experienced with reading.

Q: I wanted to ask you about the ways in which we are trained to read… there are apps showing images of texts very very quickly, are we developing skills to read quickly rather than more fully and understand the text.

A: There was a process of rapid image showing to the eye (RSVP was the acronym) – to allow you to absorb more quickly but in actual fact that was quite uncomfortable. We do see digital texts playing with those notions. I don’t think we will move away from slow reading but we are seeing more of these rapid reading processes and technologies.

Chair: Kinetic Text project works in some of these ways, about focusing eye movement…

A: The text can also manipulate eye movement and therefore your reading and understanding of the text. Very interesting in that respect.

Algorithm Data and Interpretation - Dr Stephen Ramsay, Associate Professor of English at the University of Nebraska; Fellow at the Center for Digital Research in the Humanities (session chair: Prof James Loxley)

James Loxley is introducing our next speaker, Dr Stephen Ramsay.

I want to say that my mother is from Ireland, a little place west of here, and she said that if she had ever been to University it would have been to University of Edinburgh which she felt was the best in the world.

Now I was planning to teach a technical talk – I teach computer science in an English faculty. But instead I’m going to talk about data. So I’m going to start with the 1965 blackout of New York. At the time it was about disaster, groping in the dark, a city stranded. But then 9 months later they ran stories on the growth in birth rates, a sharp rise across hospitals across the state. All recording above average numbers of births. Although one report noted that Jewish hospitals did not see an increase. Sociologists talked about the blackout as in some way responsible… three years later a sociologist published a terse statement showing no increase in births after the Great Blackout. This work looked at average gestation period and noting that births would have been higher from June through to August, not just in August… but he found that 1966 was not unusual or remarkable. Black Out Babies were a myth…

You could read this tale as a cautionary one about the misuse of data. But I think this can be read another way… the New York Times piece said something about human nature – people turning to each other when power out is a sad reflection on the place of television in our life, but a hopeful narrative for humanity. And citing birth rates and data and using scientific language adds to that. And the comments about Jewish people shows prejudice. But at the same time that subsequent analysis frames the public as prone to fantasy, as uninformed, with the scholar overcoming this…

The idea of “lies, damn lies, and statistics” encourages us to always look for falsehood hiding behind truth… so we think of what stories we are being told, and what story we want to tell. It’s simple advice that is hard to do. I want to give a different spin on this. I think that data is narrative automatic. the way we use data is instructive – we talk about lists, numbers… Pride and Prejusice does not seem to be a data set unless we convert it. It gains narrative in transformation. The data can be shown to show and mean things – like stories, stories waiting to be told… data doesn’t mean anything by itself, someone has to hear what it is saying…

What does data look like in its pre interpretive state? There is an internet site called “Found” – collecting random items such as notes, cards, love letters, shopping lists. Materials without their context. Abandoned artefacts. All can be found there. But the great glorious treasure of Found is it’s lists…

[small pause here for technical difficulty reasons]

These lists are just abandoned slips of paper… one for instance says:







roach spray



The spareness and absence of context turns these data-like lists turns them, quickly into narrative… not all are funny… one reads:

go out for a walk with someone

speak with someone

watch tv

go out to cemetry to speak to mom

go to my room

Have you ever wanted to give your data a hug? Bram Stoker said in writing Dracula he just wanted to write something scary… his novel is far more interesting without him as the interpretations of others are fascinating and intriguing… Do facts matter in the humanities? In some areas… who painted a picture, when a treaty was signed… these are not contingent truth claims… surely we can say fact is a good word for those things that are not subject to debate. Scholars can debate whether a painting is by Rembrandt or his school, that debate is about establishing a fact. But facts still matter…

If we look at Rembrandt’s Night Watch the lighting of the girl equating to that of the captain is intriguing. If he said it meant nothing we’d probably ignore him… The signing of a treaty may be a fact but why it occured is much more interesting. Humanities are about that category 1 inquiry more than the category 2 fact inquiries. Often this is the critique of the humanities and the digital humanities, Jonathan Gotschil insists that the humanities should embrace scientific approaches and sense of optimism… And sees the sciences as doing a better job of this stuff but that “what makes literature special” should be retained… he doesn’t say what those things are. There are unsettled matters if one takes scientific approaches. Of course Gotschil’s nightmare is to understand data with the same criticality we apply to Bram Stoker, questioning it’s being and meaning… and I suggest we make that nightmare a reality!

[More technical issues… ]

What I wanted to show you was a list of English Novels [being read to us]… It is a list, from Hoover, organises novels in terms of breadth of the vocabulary in that list. I have shown this list to many people over the last few years, including many professors… they see Faulkner and Henry James at the top and approve of that and of Mark Twain…. and young adult novel writers at the bottom… but actually I read you the list in ascending order… Faulkner and James are at the bottom. Kipling and Lewis are at the top. And there it starts… richness is questioned… people want to point out how clearly correct the answer is, despite having given the wrong answer; some explain that the methodology is flawed or misreported… these are category 1 people being annoyed by category 2 reality…

But when we stop using it as a Gotcha it is a more provocative question… each of these titles contains a thousand, a hundred thousand thoughts and connections… it is what we do… as humanists we make those connections… we ask questions of the narrative we have created… part of our problem is a general discomfort with lettinng the computer telling us what is so… but if we stop doing that we might see peculiar mappings of books a cultural objects… it might show us a way to deeper understanding of reading itself… it raises any number of questions about the development of English style… and most of all it raises questions of our discursive paradigms.

That gives us narrative possibilities we could not see. We cannot think of text as 50k word blocks. The computer can ONLY apprehend the text in such terms. To understand the computer as finding facts is to miss the point. It is about creating triggers to ask questions, to look at the text in new ways. This is something I came across working on Virginia Woolf’s The Wave. The structure is so orderly… and without traditional cultural narrative. And they speak in very similar styles, sentence structures, image patterns… some see some difference between gender or solidarity… but overall it is about unity… this is the sort of problem that attracts text analysis scholars like myself. I ran algorithm clustering models looking for similaritudes unseen by scholars. On a lark we posted a simple question… “what are the words that the women in the novel use in common, that none of the men do?” and it turns out that there are 9 such words. Could see that as a narrative – like a Found list – and then we did it with men and found 120 words! Dramatic. So many words… Some critics found that disparity frightening… some think it backs up sexism of western cannon. Others see this as a chance to ask another questions… to try with other authors, novels, characters… if you think this way, perhaps you’ve caught the DH bug, I welcome you. But do we think we’ll find an answer to questions of gender and isolation? Do we want to answer those? The humanities want a world that is more complex, deeper than we thoughts. That process is a conversation…

In 2015 the Text project will release huge volumes of literature. Perseus contains most greek texts… there are huge new resouerces. almost all questions we ask of these corpuses have not been asked before… we can say they will transform the humanities but that may not be true… the limiting factor is whether we choose to remain humanists in the face of such abundance… perhaps we need to be programmers, tool builders, text engineers… many more of us need to invite the new texts – lists, ngrams, maps etc. – into our ongoing conversation. We are here to talk about philosophical issues of data and these issues are critical… but we have to be engaging with these questions…. Digital humanities means databases, mark up, watermelon…!


Q: I am intrigued to think about how we design for the things we don’t know what we need to know…

A: Sure, imagining what we don’t know… you inevitably build your own questions into the tools… ironically an issue for scientific methods. The nice thing about computers is that they are fast, obedient and stupid. They will do anything we ask them to, even our own most stupid ideas, huge serendipity just baked into that! Its a problem but its amazing how the computer does that job for me, surprisingly.

Q: That was a brilliant fascinating talk. Part of the problem with digital humanities for literature right now is that it either tells us what we do know… or it tells us what we don’t know but then we worry that it’s wrong… The description of the richness list was part of that. I really liked your call for an ongoing discussion that includes computer generated data… but I don’t see how we get past the current description. If all literary criticism says something is so, and expects “yes, but…” I can see how computer generated data sits in that… but how can data be a participant in that conversation – beyond ruling something out, or concurring with expectations.

A: Excellent point and lets not downplay at all the first part of your question. I saw Franco Morelli give a talk about titles getting shorter for instance… who’d have thought?! But I think it has a lot to do with how we build our tools… I find it frustrating that we all use R, or tools designed for science or psychology… I want our schools to look more like the art-informed projects Lisa talked about. I think the humanities needs to do more like that, to generate the synergies. Tools that are more ludic.

Q: May be to be about perceived barriers being quite high. An earlier speaker talked about the role of repeatability. Ambiguity reading a poem is repeatible. if barriers to entry low enough for repitition and for others to play, to ask new questions, maybe that brings the data in as part of the conversation…

A: There are tools that let you play with the text more ludically. Voyant for instance. But we come with a lot of cultural baggage as humanists… there is a phenomenon that… no matter what they are talking about they give a literary critical reading of a text but when they show a graph we all think we are scientists… there is so much cultural baggage. We haven’t learned how to be humanistic users of these tools, or to create our own tool.

Q: A question and an observation… There is a school of thought in cognitive psychology that humans are infinitely able to retrofit any narrative to any circumstances whatsoever, and that is very much what was coming through your data… Many humanities departments have become pseudo social sciences departments… but if you don’t have a clear distinction between category 1 and category 2 they can end up doing their own thing…

A: I don’t want the humanities. I resist the social science type study of literature, the study of human record or of the human condition… when we are talking about… in my own work I move between being a literary critic and being an engineer… when it comes to writing software that method definition is wrong, it doesn’t work… when I am a literary critic it is about all those shades of grey, those complexities… but those different states both seem important in pursuit of that end goal… if studying flu outbreaks lets not be ludic… but for Bram Stroker then we should!

Q: In my own field of politics there was a particular set of work which gave statistical data a bad name… and I wonder in your field is the risk of the same is there…

A: In digital literary studies this is sometimes seen as a 25 year project to get literary profs into the digital field.. but I always say that that’s not true, there’ll always be things to be done. There was a book in the 70s that looked at slavery in an entirely quantitative way, it made the arguement no one wanted to hear, that slavery had been extremely lucrative. Economists said that it’s profitable. History fled from statistical methods for years after that… but they do all agree that that was profitable. And there is quantitative work there again/still. If I had to predict I’d say the same thing for digital literary studies does seem likely…

Q: I can’t resist one here… I was following a blog by Kirsch where you say that scholars should code and I wanted to ask about that…

A: OK, well Kirsch lumps me in with the positivists… I’m not quite in the devils party. But I teach programming and software engineering to humanists. Its extremely divisive… My views have softened over the years… for me programming is a magnificant intellectual excercise… knowing about it seems to help understand the world. But also if you want to do research in this area you need some technical skills. If that’s programming… well learn what you need whether thats GIS, 3D Graphics… if you want to build things you might need coding!

Big Data and the Co-Production of Social Scientific Knowledge - Prof Rob Procter, Professor of Social Informatics, University of Warwick (session chair: Prof Robin Williams)

Professor Robin Williams is now introducing Professor Rob Proctor, our next speaker, talking about his work around social informatics.

The eagle eyed amongst you will spot my change of title – but digital is infinitely rewritable! I am working in the overlap of sociology and computational tools and methods. So, the second thing I want to talk about is Sociology in the age of “big data”. I think what this demonstrates is the opportunities for sociology to respond in various different ways to this big data, and tools to interrogate that data. The evolving of tools and methods is a key thing to look at in the area. So that brings me to the Collaborative Online Social Media Observatory (COSMOS) and tools we are developing for understanding social media… and then I want to talk about Sociology beyond the academy – knowledge co-produced of social scientific knowledge. But there are other types of expertise being mobilised at the moment, in looking at the computational turns things are taking. Not always a comfortable thing for social scientists…

So firstly Social Informatics. So what is that? Well to me its the inter-disciplinary study of factors that shape adoption and use of ICTs. And what gets me excited is how these then move into real processes. And for me the emphasis on innovation as public, participatory process of experimentation and learning where meanings of technologies are collaboratively explored and co-produced. In social media you can argue that this is a large scale experiment in social learning… Of course as we witness growing scale of adoption more people experience those processes: how social media works, how they might adopt or use it… to me this is a fascinating area to study. And because it is public and involves social media it is very easy to see what’s going on… to some extent. And generally that data is accessible for social research purposes. It is not quite that simple but you can research without barriers of having to pay for data if you do it in a careful way.

So these developments have led me into social media as a prime area of my research. So firstly some work we did on the impact of Web 2.0 on scholarly communications – work with Robin Williams and James Stewart – many of us will be part of this, many of us tweet our research… but many of us are not clear of what that means, what the implications are. So we did some work, got some interesting demographic research… we also did interviews with people and got ideas of why they were, and why they were not adopting… Some very polarised. And in parallel we looked at how scholarly publishers incorporate social media tools into their work, in order to remain key players… they do lots of experiments and often that is focused on measuring impact and seeing the movement of their work to other audiences. Some try providing blogs on their content. But that is all with mixed success. A comment notes that it is easier to get comments on cricket reports than on research online… So it’s hard to understand and capture impact…

I’ll come back to that and about co-creation of knowledge. But first I want to talk about the riots in England in 2011. This was work in conjunction with the Guardian Newspaper. They had been given 2.5 million tweets directly by Twitter. They wanted to know if social media was particularly vulnerable for sharing false information, did that support calls for shutting down social media at times of crisis? So we looks at a number of different rumours known about and present in the corpus: zoo animals on the loose; london eye on fire; miss selfridge on fire; rioters attack a children’s hospital in Birmingham. I will talk about that latter example. But we wanted to ask about how people use and understand and interpret social media in these circumstances, how they engage with rumous…

So this is about sociology in the age of “big data”. It calls for interpretive methods but we can’t do that at scale easily… so we need computational methods to focus scarce human resources. We could crowdsource some of this but at this scale that would still be a challenge…

So firstly lets look at the work of Savage and Burrows (2007) talked about the “coming crisis of empirical sociology” because the best sociology, as they saw it, was conducted by private companies who have the greatest and most useful data sets which sociologists could not rival nor access. However we might be more confident about the continuing relevance of social sciences… social media provides a lot of born digital data… maybe this should be entitled the “social data deluge”. There is a lot of data available, much of it freely available. Meanwhile lots of policy initiatives to promote open data in government for/by anyone with a legitimate usage for it. Perhaps we can be more confident about the future of academic sociology…

But if you see the purpose this data is put to, its a more mixed picture… so we see analysis of social media for stock market prediction. But here correlation is mistaken for causality. Perhaps more interesting are protest movements – like occupy wallstreet – or use of social media during the Egyptian revolution… It is a tool for political change, a way for citizens to acquire more freedom and change? Is it a movement to organise themselves? Lots of discussion of these contexts. Methodologically its a challenge of quantity, and methods that combine social science understanding with social media tools enabling analysis of large scale data…

So back to that rumour from the riots and that rumour of a children’s hospital being attacked in Birmingham. This requires thorough work with the data, but focused where it counts.

So, what sparked this off was someone tweeting that the police were assembling in large numbers outside the hospital… therefore the hospital must be under threat. A reasonable inference.

So, methodologically we undertook computational methods for analysing tweets in an active area of research: sentiment analysis; topic analysis. We combine a relatively simple tool looking at information flows… and then looking at flow from “opinion leaders” to others (e.g. RTs). Once that information flow analysis has been done we can then take those relative sizes to analyse that data, size as proxy for importance… this structure, we argue, is relatively useful for focusing human effort. And then we used coding frames for conventional qualitative methods of content analysis to understand how Twitter was used – to inductively analyse information flow content to develop a “code frame” of topics; use code frame to categorise inofrmation flows (e.g. agreement, disagreement, etc.); and then we used visualisation around that analysis of information flows…

So here we see that original tweet… you see the rumour mushroom, versions appear… bounding circles reflect information flows… and individuals and their influence… Initially tweets agree/repeat… and we then start to see common sense reasoning: those working or nearby dispute the threat, others point out that the police station is next door to the hospital thus providing alternative understanding. People respond and do not just accept the rumor as true… So rumours do break quickly BUT they are not neccassarily more vulnerable as versions and challenges quickly appear to provide alternative likely truth. That process might be more rapid with authoritative sources – media or police in this case – adding their voice. But false information may persist longer, with potential risk to public safety – see follow on Pheme project.

But I wanted to talk about authoritative sources again. The police and media and how they use social media. The question is what were the police doing on twitter at that time? Well another interesting case here… riots in Manchester led to people creating new accounts to draw attention to public bodies like the police, as an auxillery service to raise awareness of what was going on. Quite an interesting use of social meidia where these see something like this arising.

So what these examples demonstrate is innovation as a co-production… lots of people collectively experimenting, trying out things, learning about what social media can and cannot do. So I think it’s a prime example for sociologists. And we see uses are emergent, people learn as they use… and it continues to change and people reinvent their own uses… And we all do this, we have our own uses and agenda shaping our interactions.

So this work led to development of tools for use by social scientists… COSMOS involved James S, Ewan K, etc. from Edinburgh… It would be an error to assume social media can tell us everything that takes place in the world – this data goes with crime data, demographic data, etc. The aim of COSMOS is to forge interdisciplinary working between social and computing scientists. To provide open, sustainable platform for interoperable social media analysis tools. And refine and evolve capabilities, provide service models compatible with needs of diverse user communities.

There are existing tools out there for social media analysis… but many are blackbox systems, its hard to understand that process that is taking place. So we want those blackbox processes to be opened up, they are complex but can be understood and explored…

So the Cosmos Tools let you view timelines, to look at rates and flows… to look for selection based on keywords and hashtags… and to view the networks of who is tweeting… and to compare data with demographic data.

Also some experimental tools around geographical tools for clustering. The way people use Twitter can show geographical patterns. Another factor is about topic modelling, topic clustering… identifying tweets on the same topic. This is where NLP and Ewan and his colleagues in Informatics has become important.

So current research looking at: Social media and civil society – social media as digital agora; “hate” speech and social media – understanding users, networks and information flows –  a learning challenge here about people not understanding impact and implications of their comments, perhaps a misunderstanding of social media… ; citizen social science – harnessing volunteer effort; social media and predictions – crime sensing, data integration and statistical modelling; suicide clusters and social media; humanitariansim 2.0 – care for the future; BBC World Service – tweeting the olympics. And we have a wide range of collaborators and community engagement.

Let me briefly talk about social media as digital agora… may sound implausible… many talk about social media as a force for change… opportunities to promote democracy… not just in less democratic countries, but also democratic countries where processes don’t seem to work as well… So we are looking at social media in communicative, in smaller communities. And also thinking about social resiliance in a day to day small scale way… problems which if not managed may become bigger issues. For that we have studied Twitter in several locations, collected data, interviewed participants… and built up a network of communications. What is interesting, for instance, is that non governmental group @c3sc seems to have big impact. We have to see how this all plays out… deserves longitudinal approach…

So, to conclude… let me talk about the lessons for academic sociology… and I think it’s about sociology beyond the academy and the role of wider players. Firstly data journalism – was interested in Steven’s 1965 press accounts of the black out earlier. Perhaps nowadays the way journalists are being trained might change that… journalists are increasingly data savvy. We see this through Fact Check, through RealityCheck blog… through sourcing from social media. So is citizen journalism, used to gather evidence of what is happening… tools like Ushahidi… and a sense of empowerment for these communities… reminds me of notion of sousveillance… and the possibility of greater accountability… And Citizen Journalism in the expenses scandal – guardian recruited people to look at the expense claims. The journalists couldn’t do that externally… so recruited others.

So, citizen social science… in various ways (see Harris 2012 “Oh man, the crowd is getting an F in social science”. And Ken Benoit’s work discussed earlier… we see more people coming into social science understanding…

So the boundaries of social science research production are becoming more porous, social scientific knowledge production is changing, potentially becoming more open. These developments create an opportunity to reinvigorate the project for a “public sociology” – as per Burawoy (2005) and his call “For a public sociology”. to make sociology accountable to more people, to organisations, to those in power. Ethically we need to ask what is needed and wanted, how the agenda is set, how to deliver more meaningful and useful social sciences to the public.

How can we do that? New modes of scholarly communications, technology, but it’s not enough… we’ve also been working with a company on a  possible programme for the BBC where social media is used to reflect on the week, a knowledge transfer concept. Also knowledge transfer in the Pheme project – for discriminating false and true information… all quite conventional… but we need other pathways to impact… with people as sensors and interpreters of social life, training and capacity building – in ways we have not done before, and something that has emerged in science and citizen science has been the notion of workshops, hackathons, getting people engaged in using mundane technologies for their own research (e.g. Public Lab), we need something similar for tools, social media, to extract data they want for their purposes for their agenda… to create more public sociology that people can do themselves. And we need to also have an open dialogue about research problems.


Q: My question is about COSMOS and the riot rumours stuff… within COSMOS do you have space for formal input around ethics and law… you cut close to making people identifiable and locatable. And related to that… with police in those circles… may arouse suspicions about motives… for instance in Birmingham did police just monitor or did they tweet.

A: They did tweet but not on that rumour. It is an understandable concern that collaborations make powerful state actors more powerful… for us we want these technologies available for anyone to use them… not some exclusive arrangement, should be available to communities, third sector organisations… anyone who feels that social media may be important in their research

Q: I was more concerned about self-led vigilantes, those who might gang up on others…

A: A responsibility of civil society to be aware of those dangers, to have mechanisms to avoid harm. It does exist already… so if social media becomes instrument of that we have to respond and be aware – partly what hate speech project is about… Bigger learning problem is about conduct in social media space. And the probably issue that people don’t realise how conduct quickly becomes visible to much bigger group of others… and that relates to ethics… twitter is public domain space but when something is highlighted by others… we have to revisit the ethics issues time and again… for the study for the riots we did the usual clearance process… Like Ken we were told it was fine… but don’t make identifiable but that is nearly impossible in social media. Not an easy thing to resolve.

Q: I’m curious about changes in social media platforms and how that effects us… moves from facebook to twitter to snapchat to instagram… how does that become apparent, may be invisible, how do we track that..

A: There is a fundamental issue of sustainability of access to data from social media. Not too much of a problem to gather data if you design harvesting appropriately for their rate limits. In terms of other platforms, and people moving to them, and changes in modality and observability and accessibility of data… what social research needs is agreement with providers of data that, under certain conditions of access, that their data is available for research.. to make access for legitimate data easy. There are efforts to archive data – Library of Congress collects all tweets. Likely to allow access under license I think, to ensure access to platforms as use of platforms change…

Edinburgh Data Science initiative – Prof Dave Robertson, Head of School of Informatics

Sian Bayne quickly introducing Dave Robertson providing a coda to today’s session.

I’m just briefly going to talk about the Edinburgh Data Science Initiative. The ideas being data as the catalyst for change in multiple academic disciplines and business sectors.

So firstly the business side… big data can be very big and very fast… that can be off-putting in the humanities… And you don’t have to build something big to be part of this… I work in these areas but my models are small… and there is a stack you never see – economic and political side of this stuff.

And here’s the other one… this is about variety and velocity – a chart from IBM – looking at predictions of the volume of data and, more interestingly, the uncertainty of data… And the data sites in a few categories… Enterprise Data, loads of Social Media, and loads of Sensors (internet of things)… but uncertainty over aggregate data is getting hugely large… and that’s not in sphere of traditional engineering, or traditional business…

The next slide here is about architectures… this is topical… it’s IBM’s Watson system… this is the one that won Jeopardy… harvested loads of information and hypothesis generation… This stack starts with very computational stuff but the top layers look much more like humanities work and concepts…

Now technology and society interact. Often technology pushes on society. For instance if we look at Moore’s Law (memory in your computer doubles every year) mapped against the cost of mapping the human genome. It looks radically different, costs drop hugely in late 2000′s as a lot of effort is pushed in here. And that drop in cost to $1000 per unit… that is socially important… I could sequence my genome… maybe I don’t want to. You can sequence at population scales… machines generate a TB of data a week too – huge data being generated! And this works the other way around… sometimes technology gives you an inflection point and you have to keep up, sometimes society pushes back. A lot of time online is spent on social networks (allegedly 1/7)… now a unified channel for discovery and interaction… And the number of connected devices is zooming up…

So that’s the sort of thing that is pushing a lot of things… A lot of people have spoken to all the schools in the university… everyone reacts… you will find everyone recognising this… and you hear them saying “and it changes the way it makes me think about my research”. That’s so unusual to have such a common response…

Why this is important at Edinburgh… We have many interdisciplinary foundations at Edinburgh… All are relevant, no matter how data intensive, but we are well developed in interdisciplinary working…

And we have a whole data driven start up Ecosystem in Edinburgh… we have Silicon Walk (miicard, zonefox, etc.), Waverley Gate (Amazon, Microsoft), Appleton Tower (Informatics Ventures, feusd, Disney research, tigerface), Evo House (FlockEdu, Lucky Frame, etc), Quartermile (Skyscanner, IBM), Informatics, Techcube (FanDuel, Outplay, CloudSoft, etc.). A huge ecosystem here!

So, I’ll leave it there but input, feedback welcomed, just speak to myself and/or Kevin.

And that was it for the day…

Related resources: