UoE Information Security Awareness Week 2017: Keynotes Session

This afternoon I’m at the Keynote Session for Information Security Awareness Week 2017 where I’ll speaking about Managing Your Digital Footprint in the context of security. I’ll be liveblogging the other keynotes this afternoon.

The event has begun with a brief introduction from Alistair Fenemore, UoE’s Chief Information Security Officer, and from his colleague David Creighton Offord, the organiser for today’s event.

Talk by John Whitehouse, PWC Cyber Security Director Scotland covering the state of the nation and the changing face of Cyber Threat

I work at PWC, working with different firms who are dealing with information security and cyber security. In my previous life I was at Standard Life. I’ve seen all sorts of security issues so I’m going to talk about some of the things I’ve seen, trends, I’ll explain a few key concepts here.

So, what is cybersecurity… People imagine people in basements with balaclavas… But it’s not that at all…

I have a video here…

(this is a late night comedy segment on the Sony hack where they ask people for their passwords, to tell them if it’s strong enough… And how they construct them… And/or the personal information they use to construct that…)

We do a lot of introductions for boards… We talk about technical stuff… But they laugh at that video and then you point out that these could all be people working in their companies…

So, there is technical stuff here, but some of the security issues are simple.

We see huge growth due to technology, and that speaks to businesses. We are going to see 1 billion connected devices by 2020, and that could go really really wrongly…

There is real concern about cyber security, and they have concerns about areas including cloud computing. The Internet of Things is also a concern – there was a study that found that the average connected device has 25 security vulnerabilities. Dick Cheney had to have his pacemaker re programmed because it was vulnerable to hacking via Bluetooth. There was an NHS hospital in England that had to pause a heart surgery when the software restarted. We have hotel rooms accessible via phones – that will come to homes… There are vulnerabilities in connected pet feeders for instance.

Social media is used widely now… In the TalkTalk breach we found that news of the breach has been leaked via speculation just 20 seconds after the breach occurs – that’s a big challenge to business continuity planning where one used to plan that you’d perhaps have a day’s window.

Big data is coming with regulations, threats… Equifax lost over 140 million records – and executives dumped significant stock before the news went public which brings a different sort of scrutiny.

Morrisons were sued by their employees for data leaked by an annoyed member of staff – I predict that big data loss could be the new PPI as mass claims for data loss take place. So maybe £1000 per customer per data breach for each customer… We do a threat intelligence service by looking on the dark net for data breach. And we already see interest in that type of PPI class suit approach.

The cyber challenge extends beyond the enterprise – on shore, off shore; 1st through to 4th parties. We’ve done work digging into technology components and where they are from… It’s a nightmare to know who all your third parties are… It’s a nightmare and a challenge to address.

So, who should you be worried about? Threat actors vary…. We have accidental loss, Maware that is not targeted, and hacker hobbyists in the lowest level of sophistication, through to state sponsored attacks at the highest level of sophistication. Sony were allegedly breached by North Korea – that firm spends astronomical amounts on security and that still isn’t totally robust. Target lost 100 million credit card details through a third party air conditioner firm, which a hacker used to get into the network, and that’s how the loss occured. And when we talk organised crime we are talking about really organised crime… One of the Ukrainian organised crime groups were offering a Ferrari for their employee of the month prize for malware. We are talking seriously Organised. And serious financial gain. And it is extremely hard to trace that money once its gone. And we see breaches going on and on and on…

Equifax is a really interesting one. There are 23 class action suits already around that one and that’s the tip of the iceberg. There has been a lot of talk of big organisations going under because of cyber security, and when you see these numbers for different companies, that looks increasingly likely. Major attacks lead to real drops in share prices and real impacts on the economy. And there are tangible and intangible costs of any attack…. From investigation and remediation through to DEO and CTO’s losing their jobs or facing prison time – at that level you can personally liable in the event of an attack.

In terms of the trends… 99% of exploited vulnerabilities (in 2014) had been identified for more than a year, some as far back as 1999. Wannacry was one of these – firms had 2 months notice and the issues still weren’t addressed by many organisations.

When we go in after a breach, typically the breach has been taking place for 200 days already – and that’s the breaches we find. That means the attacker has had access and has been able to explore the system for that long. This is very real and firms are dealing with this well and really badly – some real variance.

One example, the most successful bank robbery of all time, was the Bangladesh Central Bank was attacked in Feb 2016 through the SWIFT network .These instructions totalled over US $900 million, mostly laundered through casinos in Macau. The analysis identified that malware was tailored for the target organisation based on the printers they were using, which scrubbed all entry and exit points in the bank. The US Secret Service found that there were three groups – two inside the bank, one outside executing the attack.

Cyber security concerns are being raised, but how can we address this as organisations? How do we invest in the right ways? What risk is acceptable? One challenge for banks is that they are being asked to use Fintechs and SMEs working in technology… But some of these startups are very small and that’s a real concern for heads of securities in banks.

We do a global annual survey on security, across about 10,000 people. We ask about the source of compromise – current employees are the biggest by some distance. And current customer data, as well as IPR, tend to be the data that is at risk. We also see Health and Social Care adopting more technology, and having high concern, but spending very little to counter the risks. So, with Wannacry, the NHS were not well set up to cope and the press love the story… But they weren’t the target in any way.

A few Mythbusters for you…

Anti-Virus software… We create Malware to test our clients’ set up. We write malware that avoids AVs. Only 10-15% of malware will be caught with Anti-Virus software. There is an open source tool, Veil-Framework, that teaches you how to write that sort of Malware so that you can understand the risks. You should be using AV, but you have to be aware that malware goes beyond that (and impacts Macs too)… There is a malware SaaS business model on the darknet – as an attacker you’ll get a guarantee for your malware’s success and support to use it!

Myth 2: we still have time to react. Well, no, the lag from discovery to impacting you and your set up can be minutes.

Myth 3: well it must have been a zero day that got us! True Zero Day exploits are extremely rare/valuable. Attacker won’t use one unless target is very high value and they have no other option. They are hard to use. Even NSA admits that persistence is key to sucessful compromise, not zero day exploits. The NSA created EternalBlue – a zero day exploit – and that was breached and deployed out to these “good guys” as Wannacry.

Passwords… They are a thing of the past I think. 2-factor authentication is more where we are at. Passphrases and strength of passphrases is key. So complex strings with a number and a site name at the end is recommended these days. Changing every 30 days isn’t that useful – it’s so easy to bruteforce the password if lost – much better to have a really strong hash in the first place.

Phishing email is huge. We think about 80% of cyber attacks start that way. Beware spoofed addreses, or extremely small changes to email addresses.

We had a client that had an email from their “finance director” about urgently paying money to an account, which was only spotted because someone in finance noticed the phrasing… “the chief exec never says “Thanks”!”

Malware trends: our strong view is that you should never ever pay for a Ransomeware attack.

I have another video here…

(In this video we have people having their “mind read” for some TV show… It was uncanny… And included spending data… But it wasn’t psychic… It was data that they had looked up and discovered online… )

It’s not a nice video… This is absolutely real… This whole digital footprint. We do a service called Digital Footprinting for senior execs in companies, and you have to be careful about it as they can give so much away by what you and those around you post… It’s only getting worse and more pointed. There are threat groups going for higher value targets, they are looking for disruption. We think that the Internet of Things will open up the attack surface in whole new ways… And NACS – the Air Traffic people – they are thinking about drones and the issues there around fences and airspace… How do you prepare for this. Take the connected home… These fridges are insecure, you can detect if owner is opened or not and detect if they are at home or not… The nature of threats is changing so much…

In terms of trends the attacks are moving up the value chain… Retain bank clients aren’t interesting compared to banks finance systems, more to exchanges or clearing houses. It’s about value of data… Data is maybe $0.50 for email credentials; a driving license is maybe $25… and upwards the price goes depending on value to the attackers…

So, a checklist for you and your work: (missed this but delighted that digital footprint was item 1)

Finally, go have a look at your phone and how much data is being captured about you… Check your iPhone frequent locations. And on Android check Google Location History. The two biggest companies in the world, Google and Facebook, are free, and they are free because of all the data that they have about you… But the terms of service… Paypal’s are longer than Hamlet. If you have a voice control TV from Samsung and you sign those, you agree to always on and sharable with third parties…

So, that’s me… Hopefully that gave you something to ponder!


Q1) What does PWC think about Deloitte’s recent attack?

A1) Every firm faces these threats, and we are attacked all the time… We get everything thrown at us… And we try to control those but we are all at risk…

Q2) What’s your opinion on cyber security insurance?

A2) I think there is a massive misunderstanding in the market about what it is… Some policies just cover recovery, getting a response firm in… When you look at Equifax, what would that cover… That will put insurers out of business. I think we’ll see government backed insurance for things like that, with clarity about what is included, and what is out of scope. So, if, say, SQL Injection is the cause, that’s probably negligence and out of scope…

Q3) What role should government have in protecting private industry?

A3) The national cyber security centre is making some excellent progress on this. Backing for that is pretty positive. All of my clients are engaging and engaged with them. It has to be at that level. It’s too difficult now at lower levels… We do work with GCHQ sharing information on upcoming threats… Some of those are state sponsored… They even follow working hours in their source location… Essentially there are attack firms…

Q4) (I’m afraid I missed this question)

A4) I think Microsoft in the last year have transformed their view… My honest view is that clients should be on Windows 10 its a gamechanger for security. Firms will do analysis on patches and service impacts… But they delayed that a bit long. I have worked at a firm with a massively complex infrastructure, and it sounds easy to patch but it can be quite difficult to do that in practice, and it can put big operational systems at risk. As a multinational bank for instance you might be rolling out to huge numbers of machines and applications.

Talk by Kami Vaniea (University of Edinburgh) covering common misconceptions around Information Security and to avoid them

My research is on the usability of security and why some failings are happening from the point of view of an average citizen. I do talks to community groups – so this presentation is a mixture of that sort of content and proper security discussion.

I wanted to start with misconceptions as system administrators… So I have a graph here of where there is value to improving your password; then the range in which having rate limits on password attempts; and the small area of benefit to the user. Without benefits you are in the deadzone.

OK, a quick question about URL construction… http://facebook.mobile.com? Is it Facebook’s website, Facebook’s mobile site, AT&T’s website, or Mobile’s website. It’s the last one by construction. It’s both of the last two if you know AT&T own mobile.com. But when you ask a big audience they mainly get it right. Only 8% can correctly differentiate http://facebook.profile.com vs http://profile.facebook.com. Many users tend to just pick a big company name regardless of location in URLs. A few know how to to correctly read subdomain URLs. We did this study on Amazon Mechanical Turk – so that’s a skewed sample of more technical people. And that URL understanding has huge problematic implications for phishing email.

We also tried http://twitter.com/facebook.com. Most people could tell that was Twitter (not Facebook). But if I used “@” instead of “/” people didn’t understand, thought it was an email…

On the topic of email… Can we trust the “from” field? No. Can we trust a “this email has been checked for viruses…” box? No. Can you trust the information on the source URL for a link in the email, that is shown in the bottom of the browser? Yes.

What about this email – a Security alert for your linked Google account email? Well this is legitimate… Because it’s coming from accounts.google.com. But you knew this was a trick question… Phishing is really tricky…

So, a shocking percentage of my students think that “from” address is legitimate… Tell your less informed friends how easily that can be spoofed…

What about Google. Does Google know what you type as you type it and before you hit enter? Yes, it does… Most search engines send text to their servers as you write it. Which means you can do fun studies on what people commonly DON’T post to Facebook!

A very common misconception is that opening web pages, emails, pdfs, and docs is like reading physical paper… So why do they need patching?

Lets look at an email example… I don’t typically get emails with “To protect your privacy, Thunderbird has blocked remote content in this message” from a student… This showed me that a 1 pixel invisible image had come with the email… which pinged the server if I opened it. I returned the email and said he had a virus. He said “no, I used to work in marketing and forgot that I had that plugin set up”.

Websites are made of many elements from many sources. Mainly dynamically… And there are loads of trackers across those sites. There is a tool called Lightbeam that will help you track the sites you go to on purpose, and all the other sites that track you. That’s obviously a privacy issue. But it is also a security problem. The previous speaker spoke about supply chains at Target, this is the web version of this… That supply chain gets huge when you visit, say, six websites.

So, a quiz question… I got to Yahoo, I hit reload… Am I running the same code as a moment ago… ? Well, it’s complicated… I had a student run a study on this… And how much changes… In a week about half of the top 200 sites had changed their javascript in a week. I see trackers change between individual reloads… But it might change, it might not…

So we as users you access a first party website, then they access third party sites… So they access ad servers and that sells that user, and ad is returned, with an image (sometimes with code). Maybe I bid to a company, that bids out again… This is huge as a supply chain and tracking issue…

So the Washington Post, for instance, covering the yahoo.com malware attack showed that malicious payloads were being delivered to around 300k users per hour, but only about 9% (27k) users per hour were affected – they were the ones that hadn’t updated their systems. How did that attack take place? Well rather than attack, they just brought an ad and ran malware code.

There is a tool called Ghostery… It’s brilliant and useful… But it’s run by the ad industry and all the trackers are set the wrong way. Untick those all and then it’s fascinating… They tell you about page load and all the components involved in loading a page…

To change topic…

Cookies! Yes, they can be used to track you across web sites. But they can’t give you malware as is. So… I will be tackling the misconception that cookies is evil… And I’m going to try to convince you otherwise. Tracking can be evil… But cookies is kind of an early example of privacy by design…

It is 1994. The internet cannot remember anyone between page loads. You have an interaction with a web server that has absolutely no memory. Cookies help something remember between page loads and web pages… Somehow a server has to know who you are… But back in 1994 you just open a page and look at it, that’s the interaction point…

But companies wanted shopping baskets, and memory between two page reloads. There is an obvious technical solution… You just give every browser a unique identifier… Great! The server remembers you. But the problem is a privacy issue across different servers… So, Netscape implemented cookies – small text strings the server could ask the browser to remember and give back to it later…

Cookies have some awesome properties: it is client visible; third party tracking is client visible too; it’s opt out (delete) option on a per-site basis; it’s only readable by the site that set it; and it allows for public discussion of tracking…

… Which is why Android/iOS both went with the unique ID option. And that’s how you can be tracked. As a design decision it’s very different…

Now to some of the research I work on… I believe in getting people to touch stuff, to interact with it… We can talk to each other, or mystify, but we need to actually have people understand this stuff. So we ran an outreach activity to build a website, create a cookie, and then read the cookie out… Then I give a second website… To let people try to understand how to change their names on one site, not the other… What happens when you view them in Incognito mode… And then exploring cookies across sites. And how that works…

Misconception: VPNs solve all privacy and security problems. Back at Indiana I taught students who couldn’t code… And that was interesting… They saw VPNs as magic fairy dust. And they had absorbed this idea that anyone can be hacked at any time… They got that… But that had resulted in “but what’s the point”. That worries me… In the general population we see media coverage of attacks on major companies… And the narrative that attacks are inevitable… So you end up with this problem…

So, I want to talk about encryption and why it’s broken and what that means by VPNs. I’m not an encryption specialist. I care about how it works for the user.

In encryption we want (1) communication between you and the other party is confidential and has not been changes, and no-one can read what you sent and no one can change what you sent; and (2) to know who we are talking about. And that second part is where things can be messed up. You can make what you think is the secure connection to the right person, but could be a secure connection to the wrong person – a man in the middle attack. A real world example… You go to a coffee shop and use wifi to request the BBC news site, but you get a wifi login page. That’s essentially a man in the middle attack. That’s not perhaps harmful, it’s normal operating procedure… VPNs basically work like this…

So, an example of what really happened to a student… I set up a page that just had them creating a very simple cookie page… I was expecting something simple… But one of them submitted a page with a bit of javascript… it is basically injecting code so if I connect to it, it will inject an ad to open in my VPN…. So in this case a student logged in to AnchorFree – magic fairy dust – and sees a website and injects code that is what I see when they submit the page in Blackboard Learn…

VPNs are not magic fairy dust. The University runs an excellent VPN – far better for coffee shops etc!

So, I like to end with some common advice:

  • Install anti virus scanner. Don’t turn off Windows 8+ automatically installed AV software… I ran a study where 50% of PhD students had switched off that software and firewalls…
  • Keep your software updated – best way to stay safe
  • Select strong passcode for important things you use all the time
  • For non-important stuff, use a password manager for less important things that you use rarely… Best to have different password between them…
  • Software I use:
    • Ad blockers – not just ads, reduce lots of extra content loading. The more websites you visit the more vulnerable you are
    • Ghostery and Privacy Badger
    • Lightbeam
    • Password Managers (LastPass, OnePassword and KeePass are most recommended
    • 2-factor like Yubikey – extra protection for e.g. Facebook.
    • If you are really serious: UMatrix and NoScript BUT it will break lots of pages…


Q1) It’s hard to get an average citizen to do everything… How do you get around that and just get the key stuff across…

A1) Probably it’s that common advice. The security community has gotten better at looking at 10 key stuff. Google did a study with Blackhats Infosec conference about what they would do… And asked on Amazon Mechanical Turj about what they would recommend to friends. About the only common answer amongst blackhats was “update your software”. But actually there is overlap… People know they should change passwords, and should use AV software… But AV software didn’t show on the Blackhat list… But 2-factor and password managers did…

Q2) What do you think about passwords… long or complex or?

A2) We did a study maybe 8 years ago on mnemonic passwords… And found that “My name is Inigo Montoya, you killed my father, prepare to die” was by far the most common. The issue isn’t length… It’s entropy. I think we need to think server side about how many other users have used the same password (based on encrypted version), and you need something that less than 3 people use…

Q2) So more about inability to remember it…

A2) And it depends on threat type… If someone knows you, your dog, etc… Then it’s easier… But if I can pick a password for a long time I might invest in it – but if you force people to change passwords they have to remember it. There was a study that people using passwords a lot use some affirmations, such as “I love God”… And again, hard to know how you protect that.

Q3) What about magic semantic email links instead of passwords…

A3) There is some lovely work on just how much data is in your email… That’s a poor mans version of the OAuth idea of getting an identity provider to authenticate the user. It’s good for the user, but that is one bigger stake login then… And we see SMS also being a mixed bag and being subject to attack… Ask a user though… “there’s nothing important in my email”.

Q4) How do you deal with people saying “I don’t have anything to hide”?

A4) Well I start with it not being about hiding… It’s more, why do you want to know? When I went to go buy a car I didn’t dress like a professor, I dressed down… I wanted a good price… If I have a lot of time I will refer them to Daniel Salvo’s Nothing to Hide.

Talk by Nicola Osborne (EDINA) covering Digital Footprints and how you can take control of your online self

And that will be me… So keep an eye out for tweets from others on the event hashtag: #UoEInfoSec.


Reflecting on my Summer Blockbusters and Forthcoming Attractions (including #codi17)

As we reach the end of the academic year, and I begin gearing up for the delightful chaos of the Edinburgh Fringe and my show, Is Your Online Reputation Hurting You?, I thought this would be a good time to look back on a busy recent few months of talks and projects (inspired partly by Lorna Campbell’s post along the same lines!).

This year the Managing Your Digital Footprint work has been continuing at a pace…

We began the year with funding from the Principal’s Teaching Award Scheme for a new project, led by Prof. Sian Bayne: “A Live Pulse”: Yik Yak for Teaching, Learning and Research at Edinburgh. Sian, Louise Connelly (PI for the original Digital Footprint research), and I have been working with the School of Informatics and a small team of fantastic undergraduate student research associates to look at Yik Yak and anonymity online. Yik Yak closed down this spring which has made this even more interesting as a cutting edge research project. You can find out more on the project blog – including my recent post on addressing ethics of research in anonymous social media spaces; student RA Lilinaz’s excellent post giving her take on the project; and Sian’s fantastic keynote from#CALRG2017, giving an overview of the challenges and emerging findings from this work. Expect more presentations and publications to follow over the coming months.

Over the last year or so Louise Connelly and I have been busy developing a Digital Footprint MOOC building on our previous research, training and best practice work and share this with the world. We designed a three week MOOC (Massive Open Online Course) that runs on a rolling basis on Coursera – a new session kicks off every month. The course launched this April and we were delighted to see it get some fantastic participant feedback and some fantastic press coverage (including a really positive experience of being interviewed by The Sun).

The MOOC has been going well and building interest in the consultancy and training work around our Digital Footprint research. Last year I received ISG Innovation Fund support to pilot this service and the last few months have included great opportunities to share research-informed expertise and best practices through commissioned and invited presentations and sessions including those for Abertay University, University of Stirling/Peer Review Project Academic Publishing Routes to Success event, Edinburgh Napier University, Asthma UK’s Patient Involvement Fair, CILIPS Annual Conference, CIGS Web 2.0 & Metadata seminar, and ReCon 2017. You can find more details of all of these, and other presentations and workshops on the Presentations & Publications page.

In June an unexpected short notice invitation came my way to do a mini version of my Digital Footprint Cabaret of Dangerous Ideas show as part of the Edinburgh International Film Festival. I’ve always attended EIFF films but also spent years reviewing films there so it was lovely to perform as part of the official programme, working with our brilliant CODI compare Susan Morrison and my fellow mini-CODI performer, mental health specialist Professor Steven Lawrie. We had a really engaged audience with loads of questions – an excellent way to try out ideas ahead of this August’s show.

Also in June, Louise and I were absolutely delighted to find out that our article (in Vol. 11, No. 1, October 2015) for ALISS Quarterly, the journal of the Association of Librarians and Information Professionals in the Social Sciences, had been awarded Best Article of the Year. Huge thanks to the lovely folks at ALISS – this was lovely recognition for our article, which can read in full in the ALISS Quarterly archive.

In July I attended the European Conference on Social Media (#ecsm17) in Vilnius, Lithuania. In addition to co-chairing the Education Mini Track with the lovely Stephania Manca (Italian National Research Council), I was also there to present Louise and my Digital Footprint paper, “Exploring Risk, Privacy and the Impact of Social Media Usage with Undergraduates“, and to present a case study of the EDINA Digital Footprint consultancy and training service for the Social Media in Practice Excellence Awards 2017. I am delighted to say that our service was awarded 2nd place in those awards!

Social Media in Practice Excellence Award 2017 - 2nd place - certificate

My Social Media in Practice Excellence Award 2017 2nd place certificate (still awaiting a frame).

You can read more about the awards – and my fab fellow finalists Adam and Lisa – in this EDINA news piece.

On my way back from Lithuania I had another exciting stop to make at the Palace of Westminster. The lovely folk at the Parliamentary Digital Service invited me to give a talk, “If I Googled you, what would I find? Managing your digital footprint” for their Cyber Security Week which is open to members, peers, and parliamentary staff. I’ll have a longer post on that presentation coming very soon here. For now I’d like to thank Salim and the PDS team for the invitation and an excellent experience.

The digital flyer for my CODI 2017 show - huge thanks to the CODI interns for creating this.

The digital flyer for my CODI 2017 show (click to view a larger version) – huge thanks to the CODI interns for creating this.

The final big Digital Footprint project of the year is my forthcoming Edinburgh Fringe show, Is Your Online Reputation Hurting You? (book tickets here!). This year the Cabaret of Dangerous Ideas has a new venue – the New Town Theatre – and two strands of events: afternoon shows; and “Cabaret of Dangerous Ideas by Candlelight”. It’s a fantastic programme across the Fringe and I’m delighted to be part of the latter strand with a thrilling but challengingly competitive Friday night slot during peak fringe! However, that evening slot also means we can address some edgier questions so I will be talking about how an online reputation can contribute to fun, scary, weird, interesting experiences, risks, and opportunities – and what you can do about it.

QR code for CODI17 Facebook Event

Help spread the word about my CODI show by tweeting with #codi17 or sharing the associated Facebook event.

To promote the show I will be doing a live Q&A on YouTube on Saturday 5th August 2017, 10am. Please do add your questions via Twitter (#codi17digifoot) or via this anonymous survey and/or tune in on Saturday (the video below will be available on the day and after the event).

So, that’s been the Digital Footprint work this spring/summer… What else is there to share?

Well, throughout this year I’ve been working on a number of EDINA’s ISG Innovation Fund projects…

The Reference Rot in Theses: a HiberActive Pilot project has been looking at how to develop the fantastic prior work undertaken during the Andrew W. Mellon-funded Hiberlink project (a collaboration between EDINA, Los Alamos National Laboratory, and the University of Edinburgh School of Informatics), which investigated “reference rot” (where URLs cease to work) and “content drift” (where URLs work but the content changes over time) in scientific scholarly publishing.

For our follow up work the focus has shifted to web citations – websites, reports, etc. – something which has become a far more visible challenge for many web users since January. I’ve been managing this project, working with developer, design and user experience colleagues to develop a practical solution around the needs of PhD students, shaped by advice from Library and University Collections colleagues.

If you are familiar with the Memento standard, and/or follow Herbert von de Sompel and Martin Klein’s work you’ll be well aware of how widespread the challenge of web citations changing over time can be, and the seriousness of the implications. The Internet Archive might be preserving all the (non-R-rated) gifs from Geocities but without preserving government reports, ephemeral content, social media etc. we would be missing a great deal of the cultural record and, in terms of where our project comes in, crucial resources and artefacts in many modern scholarly works. If you are new the issue of web archiving I would recommend a browse of my notes from the IIPC Web Archiving Week 2017 and papers from the co-located RESAW 2017 conference.

A huge part of the HiberActive project has been working with five postgraduate student interns to undertake interviews and usability work with PhD students across the University. My personal and huge thanks to Clarissa, Juliet, Irene, Luke and Shiva!

Still from the HiberActive gif featuring Library Cat.

A preview of the HiberActive gif featuring Library Cat.

You can see the results of this work at our demo site, http://hiberactive.edina.ac.uk/, and we would love your feedback on what we’ve done. You’ll find an introductory page on the project as well as three tools for archiving websites and obtaining the appropriate information to cite – hence adopting the name one our interviewees suggested, Site2Cite. We are particularly excited to have a tool which enables you to upload a Word or PDF document, have all URLs detected, and which then returns a list of URLs and the archived citable versions (as a csv file).

Now that the project is complete, we are looking at what the next steps may be so if you’d find these tools useful for your own publications or teaching materials, we’d love to hear from you.  I’ll also be presenting this work at Repository Fringe 2017 later this week so, if you are there, I’ll see you in the 10×10 session on Thursday!

To bring the HiberActive to life our students suggested something fun and my colleague Jackie created a fun and informative gif featuring Library Cat, Edinburgh’s world famous sociable on-campus feline. Library Cat has also popped up in another EDINA ISG Innovation-Funded project, Pixel This, which my colleagues James Reid and Tom Armitage have been working on. This project has been exploring how Pixel Sticks could be used around the University. To try them out properly I joined the team for fun photography night in George Square with Pixel Stick loaded with images of notable University of Edinburgh figures. One of my photos from that night, featuring the ghostly image of the much missed Library Cat (1.0) went a wee bit viral over on Facebook:

James Reid and I have also been experimenting with Tango-capable phone handsets in the (admittedly daftly named) Strictly Come Tango project. Tango creates impressive 3D scans of rooms and objects and we have been keen to find out what one might do with that data, how it could be used in buildings and georeferenced spaces. This was a small exploratory project but you can see a wee video on what we’ve been up to here.

In addition to these projects I’ve also been busy with continuing involvement in the Edinburgh Cityscope project, which I sit on the steering group for. Cityscope provided one of our busiest events for this spring’s excellent Data Festread more about EDINA’s participation in this new exciting event around big data, data analytics and data driven innovation, here.

I have also been working on two rather awesome Edinburgh-centric projects. Curious Edinburgh officially launched for Android, and released an updated iOS app, for this year’s Edinburgh International Science Festival in April. The app includes History of Science; Medicine; Geosciences; Physics; and a brand new Biotechnology tours that led you explore Edinburgh’s fantastic scientific legacy. The current PTAS-funded project is led by Dr Niki Vermeulen (Science, Technology & Innovation Studies), with tours written by Dr Bill Jenkins, and will see the app used in teaching around 600 undergraduate students this autumn. If you are curious about the app (pun entirely intended!), visiting Edinburgh – or just want to take a long distance virtual tour – do download the app, rate and review it, and let us know what you think!

Image of the Curious Edinburgh History of Biotechnology and Genetics Tour.

A preview of the new Curious Edinburgh History of Biotechnology and Genetics Tour.

The other Edinburgh project which has been progressing at a pace this year is LitLong: Word on the Street, an AHRC-funded project which builds on the prior LitLong project to develop new ways to engage with Edinburgh’s rich literary heritage. Edinburgh was the first city in the world to be awarded UNESCO City of Literature status (in 2008) and there are huge resources to draw upon. Prof. James Loxley (English Literature) is leading this project, which will be showcased in some fun and interesting ways at the Edinburgh International Book Festival this August. Keep an eye on litlong.org for updates or follow @litlong.

And finally… Regular readers here will be aware that I’m Convener for eLearning@ed (though my term is up and I’ll be passing the role onto a successor later this year – nominations welcomed!), a community of learning technologists and academic and support staff working with technologies in teaching and learning contexts. We held our big annual conference, eLearning@ed 2017: Playful Learning this June and I was invited to write about it on the ALTC Blog. You can explore a preview and click through to my full article below.

Playful Learning: the eLearning@ed Conference 2017

Phew! So, it has been a rather busy few months for me, which is why you may have seen slightly fewer blog posts and tweets from me of late…

In terms of the months ahead there are some exciting things brewing… But I’d also love to hear any ideas you may have for possible collaborations as my EDINA colleagues and I are always interested to work on new projects, develop joint proposals, and work in new innovative areas. Do get in touch!

And in the meantime, remember to book those tickets for my CODI 2017 show if you can make it along on 11th August!


European Conference on Social Media (#ecsm17) – Day Two Liveblog

Today I am at the Mykolo Romerio Universitetas in Vilnius, Lithuania, for the European Conference on Social Media 2017. As usual this is a liveblog so additions, corrections etc. all welcome… 

Keynote presentation: Daiva Lialytė, Integrity PR, Lithuania: Practical point of view: push or pull strategy works on social media 

I attended your presentations yesterday, and you are going so far into detail in social media. I am a practitioner and we can’t go into that same sort of depth because things are changing so fast. I have to confess that a colleague, a few years ago, suggested using social media and I thought “Oh, it’s all just cats” and I wasn’t sure. But it was a big success, we have six people working in this area now. And I’m now addicted to social media. In fact, how many times do you check your phone per day? (various guesses)…

Well, we are checking our smartphones 100-150 times per day. And some people would rather give up sex than smartphones! And we have this constant flood of updates and information – notifications that pop up all over the place… And there are a lot of people, organisations, brands, NGOs, etc. all want our attention on social media.

So, today, I want to introduce three main ideas here as a practitioner and marketer…

#1 Right Mindset

Brands want to control everything, absolutely everything… The colour, the font, the images, etc. But now social media says that you have to share your brand in other spaces, to lose some control. And I want to draw on Paul Holmes, a PR expert (see www.holmesreport.com) and he says when he fell in love with social media, there were four key aspects:

  • Brands (in)dependency
  • Possibilities of (non)control
  • Dialogue vs monologue
  • Dynamic 24×7

And I am going to give some examples here. So Gap, the US fashion brand, they looked at updating their brand. They spent a great deal of money to do this – not just the logo but all the paperwork, branded items, etc. They launched it, it went to the media… And it was a disaster. The Gap thought for a few days. They said “Thank you brand lover, we appreciate that you love our brand and we are going to stick with the old one”. And this raises the question of to whom a brand belongs… Shareholders or customers? Perhaps now we must think about customers as owning the brand.

Yesterday I saw a presentation from Syracuse on University traditions – and some of the restrictions of maintaining brand – but in social media that isn’t always possible. So, another example… Lagerhaus (like a smaller scale Ikea). They were launching a new online store, and wanted to build community (see videos) so targeted interior six design blogs and created “pop up online stores” – bloggers could select products from the store’s selection, and promote them as they like. That gained media attention, gained Facebook likes for the store’s Facebook page. And there was then an online store launch, with invitees approached by bloggers, and their pop up stores continue. So this is a great example of giving control to others, and building authentic interest in your brand.

In terms of dialogue vs monologue I’d quote from Michael Dell here, on the importance of engaging in honest, direct conversations with customers and stakeholders. This is all great… But the reality is that many who talk about this, many are never ever doing this… Indeed some just shut down spaces when they can’t engage properly. However, Dell has set up a social media listening and command centre. 22k+posts are monitored daily, engaging 1000+ customers per week. This was tightly integrated with @dellcares Twitter/Facebook team. And they have managed to convert “ranters” to “ravers” in 30% of cases. And a decrease of negative commentary since engagement in this space. Posts need quick responses as a few minutes, or hours, are great, longer and it becomes less and less useful…

Similarly we’ve seen scandinavian countries and banks engaging, even when they have been afraid of negative comments. And this is part of the thing about being part of social media – the ability to engage in dialogue, to be part of and react to the conversations.

Social media is really dynamic, 24×7. You have to move fast to take advantage. So, Lidl… They heard about a scandal in Lithuania about the army paying a fortune for spoons – some were €40 each. So Lidl ran a promotion for being able to get everything, including spoons there cheaper. It was funny, clever, creative and worked well.

Similarly Starbucks vowing to hire 10,000 refugees in the US (and now in EU) following Trump’s travel ban, that was also being dynamic, responding quickly.

#2 Bold Actions

When we first started doing social media… we faced challenges… Because the future is uncertain… So I want to talk about several social media apps here…

Google+ launched claiming to be bigger than Facebook, to do it all better. Meanwhile WhatsApp… Did great… But disappearing as a brand, at least in Lithuania. SnapChat has posts disappearing quickly… Young people love it. The owner has said that it won’t be sold to Facebook. Meanwhile Facebook is trying desperately to copy functionality. We have clients using SnapChat, fun but challenging to do well… Instagram has been a big success story… And it is starting to be bigger than Facebook in some demographics.

A little history here… If you look at a world map of social networks from December 2009, we see quite a lot of countries having their own social networks which are much more popular. By 2013, it’s much more Facebook, but there are still some national social media networks in Lithuania or Latvia. And then by 2017 we see in Africa uptake of Twitter and Instagram. Still a lot of Facebook. My point here is that things move really quickly. For instance young people love SnapChat, so we professionally need to be there too. You can learn new spaces quickly… But it doesn’t matter as you don’t have to retain that for long, everything changes fast. For instance in the US I have read that Facebook is banning posts by celebrities where they promote items… That is good, that means they are not sharing other content…


I want to go in depth on Facebook and Twitter. Of course the most eminent social media platform is Facebook. They are too big to be ignored. 2 billion monthly active Facebook users (June 2017). 1.28 billion people log onto Facebook daily. 83 million fake profiles. Age 25 to 34 at 29.7% of users are biggest age group. For many people they check Facebook first in the morning when they wake up. And 42% of marketers report that Facebook is very important to their business. And we now have brands approaching us to set up Facebook presence no matter what their area of work.

What Facebook does well is most precise targeting – the more precise the more you pay, but that’s ok. So that’s based on geolocation, demographic characteristic, social status, interests, even real time location. That works well but remember that there are 83 million fake profiles too.

So that’s push, what about pull? Well there are the posts, clicks, etc. And there is Canvas – which works for mobile users, story driven ads (mini landing), creative story, generate better results and click through rates. (we are watching a Nespresso mobile canvas demo). Another key tool is Livestream – free of charge, notifications for your followers, and it’s live discussion. But you need to be well prepared and tell a compelling story to make proper use of this. But you can do it from anywhere in the world. For instance one time I saw livestream of farewell of Barack Obama – that only had 15k viewers though so it’s free but you have to work to get engagement.

No matter which tool, “content is the king!” (Bill Gates, 1996). Clients want us to create good stories here but it is hard to do… So what makes the difference? The Content Marketing Institute (US), 2015 suggest:

  1. Content
  2. Photos
  3. Newsletters
  4. Video
  5. Article
  6. Blogs
  7. Events
  8. Infographics
  9. Mobile applications
  10. Conferences and Livestreams

So, I will give some examples here… I’ll show you the recent winner of Cannes Lions 2017 for social media and digital category. This is “Project Graham” – a public driver safety campaign about how humans are not designed to survive a crash… Here is how we’d look if we were – this was promoted heavily in social media.

Help for push from Facebook – well the algorithms prioritise content that does well. And auctions to reach your audience mean that it is cheaper to run good content that really works for your audience.

And LinkedIn meanwhile is having a renaissance. It was quite dull, but they changed their interface significantly a few months back, and now we see influencers (in Lithunia) now using LinkedIn, sharing content there. For instance lawyers have adopted the space. Some were predicting LinkedIn would die, but I am not so sure… It is the biggest professional social network – 467 million users in 200 countries. And it is the biggest network of professionals – a third have LinkedIn profile. Users spend 17 minutes per dat, 40% use it every day, 28% of all internet users use LinkedIn. And it is really functioning as a public CV, recruitment, and for ambassadorship – you can share richer information here.

I wanted to give a recent example – it is not a sexy looking case study – but it worked very well. This was work with Ruptela, a high tech company that provides fleet management based on GPS tracking and real-time vehicle monitoring and control. They needed to hire rapidly 15 new sales representatives via social media. That’s a challenge as young people, especially in the IT sector – are leaving Lithuania or working in Lithuania-based expertise centres for UK, Danish, etc. brands.

So we ran a campaign, on a tiny budget (incomparable with headhunters for instance), around “get a job in 2 days” and successfully recruited 20 sales representatives. LinkedIn marketing is expensive, but very targeted and much cheaper than you’d otherwise pay.

#3 Right Skills

In terms of the skills for these spaces:

  • copywriter (for good storytelling)
  • visualist (graphics, photo, video)
  • community manager (to maintain appropriate contact) – the skills for that cannot be underestimated.
  • And… Something that I missed… 

You have to be like a one man band – good at everything. But then we have young people coming in with lots of those skills, and can develop them further…

So, I wanted to end on a nice story/campaign… An add for Budweiser for not drinking and driving


Q1) Authenticity is the big thing right now… But do you think all that “authentic” advertising content may get old and less effective over time?

A1) People want to hear from their friends, from people like them, in their own words. Big brands want that authenticity… But they also want total control which doesn’t fit with that. The reality is probably that something between those two levels is what we need but that change will only happen as it becomes clear to big brands that their controlled content isn’t working anymore.

Q2) With that social media map… What age group was that? I didn’t see SnapChat there.

A2) I’m not sure, it was a map of dominant social media spaces…

Q3) I wanted to talk about the hierarchy of content… Written posts, visual content etc… What seemed to do best was sponsored video content that was subtitled.

A3) Facebook itself, they prioritise video content – it is cheaper to use this in your marketing. If you do video yes, you have to have subtitles so that you can see rather than listen to the videos… And with videos, especially “authentic video” that will be heavily prioritised by Facebook. So we are doing a lot of video work.

Introduction to ECSM 2018 Niall Corcoran, Limerick Institute of Technology, Ireland

I wanted to start by thanking our hosts this year, Vilnius has been excellent this year. Next year we’ll a bit earlier in the year – late June – and we’ll be at the Limerick Institute of Technology, Ireland. We have campuses around the region with 7000 students and 650 staff, teaching from levels 6 to 10. The nearest airport is Shannon, or easy distance from Cork or Dublin airports.

In terms of social media we do research on Social MEdia Interactive Learning Environment, Limerick Interactive Storytelling Network, Social Media for teaching and research, Social Media for cancer recovery.

In terms of Limerick itself, 80-90% of the Europe’s contact lenses are manufactured there! There is a lot of manufacturing in Limerick, with many companies having their European headquarters there. So, I’ve got a short video made by one of our students to give you a sense of the town.

Social Media Competition Update

The top three placed entries are: Developing Social Paleantology – Lisa Ludgran; EDINA Digital Footprint Consulting and Training Service – Nicola Osborne (yay!); Traditions Mobile App – Adam Peruta.

Stream A: Mini track on Ethical use of social media data – Chair: Dragana Calic

The Benefits and Complications of Facebook Memorials – White Michelle, University of Hawai’i at Manoa, USA

Online Privacy: Present Need or Relic From the Past? – Aguirre-Jaramillo Lina Maria, Universidad Pontificia Bolivariana, Colombia

Constructing Malleable Truth: Memes from the 2016 U.S. Presidential Campaign – Wiggins Bradley, Webster University, Vienna, Austria, Austria

Stream B: Mini track on Enterprise Social Media – Chair: Paul Alpar

The Role of Social Media in Crowdfunding – Makina Daniel, University of South Africa, Pretoria, South Africa

Using Enterprise Social Networks to Support Staff Knowledge Sharing in Higher Education – Corcoran Niall, Limerick Institute of Technology, Ireland and Aidan Duane, Waterford Institute of Technology, Ireland


ReCon 2017 – Liveblog

Today I’m at ReCon 2017, giving a presentation later (flying the flag for the unconference sessions!) today but also looking forward to a day full of interesting presentations on publishing for early careers researchers.

I’ll be liveblogging (except for my session) and, as usual, comments, additions, corrections, etc. are welcomed. 

Jo Young, Director of the Scientific Editing Company, is introducing the day and thanking the various ReCon sponsors. She notes: ReCon started about five years ago (with a slightly different name). We’ve had really successful events – and you can explore them all online. We have had a really stellar list of speakers over the years! And on that note…

Graham Steel: We wanted to cover publishing at all stages, from preparing for publication, submission, journals, open journals, metrics, alt metrics, etc. So our first speakers are really from the mid point in that process.

SESSION ONE: Publishing’s future: Disruption and Evolution within the Industry

100% Open Access by 2020 or disrupting the present scholarly comms landscape: you can’t have both? A mid-way update – Pablo De Castro, Open Access Advocacy Librarian, University of Strathclyde

It is an honour to be at this well attended event today. Thank you for the invitation. It’s a long title but I will be talking about how are things are progressing towards this goal of full open access by 2020, and to what extent institutions, funders, etc. are being able to introduce disruption into the industry…

So, a quick introduction to me. I am currently at the University of Strathclyde library, having joined in January. It’s quite an old university (founded 1796) and a medium size university. Previous to that I was working at the Hague working on the EC FP7 Post-Grant Open Access Pilot (Open Aire) providing funding to cover OA publishing fees for publications arising from completed FP7 projects. Maybe not the most popular topic in the UK right now but… The main point of explaining my context is that this EU work was more of a funders perspective, and now I’m able to compare that to more of an institutional perspective. As a result o of this pilot there was a report commissioned b a British consultant: “Towards a competitive and sustainable open access publishing market in Europe”.

One key element in this open access EU pilot was the OA policy guidelines which acted as key drivers, and made eligibility criteria very clear. Notable here: publications to hybrid journals would not be funded, only fully open access; and a cap of no more than €2000 for research articles, €6000 for monographs. That was an attempt to shape the costs and ensure accessibility of research publications.

So, now I’m back at the institutional open access coalface. Lots had changed in two years. And it’s great to be back in this spaces. It is allowing me to explore ways to better align institutional and funder positions on open access.

So, why open access? Well in part this is about more exposure for your work, higher citation rates, compliant with grant rules. But also it’s about use and reuse including researchers in developing countries, practitioners who can apply your work, policy makers, and the public and tax payers can access your work. In terms of the wider open access picture in Europe, there was a meeting in Brussels last May where European leaders call for immediate open access to all scientific papers by 2020. It’s not easy to achieve that but it does provide a major driver… However, across these countries we have EU member states with different levels of open access. The UK, Netherlands, Sweden and others prefer “gold” access, whilst Belgium, Cyprus, Denmark, Greece, etc. prefer “green” access, partly because the cost of gold open access is prohibitive.

Funders policies are a really significant driver towards open access. Funders including Arthritis Research UK, Bloodwise, Cancer Research UK, Breast Cancer Now, British Heard Foundation, Parkinsons UK, Wellcome Trust, Research Councils UK, HEFCE, European Commission, etc. Most support green and gold, and will pay APCs (Article Processing Charges) but it’s fair to say that early career researchers are not always at the front of the queue for getting those paid. HEFCE in particular have a green open access policy, requiring research outputs from any part of the university to be made open access, you will not be eligible for the REF (Research Excellence Framework) and, as a result, compliance levels are high – probably top of Europe at the moment. The European Commission supports green and gold open access, but typically green as this is more affordable.

So, there is a need for quick progress at the same time as ongoing pressure on library budgets – we pay both for subscriptions and for APCs. Offsetting agreements are one way to do this, discounting subscriptions by APC charges, could be a good solutions. There are pros and cons here. In principal it will allow quicker progress towards OA goals, but it will disproportionately benefit legacy publishers. It brings publishers into APC reporting – right now sometimes invisible to the library as paid by researchers, so this is a shift and a challenge. It’s supposed to be a temporary stage towards full open access. And it’s a very expensive intermediate stage: not every country can or will afford it.

So how can disruption happen? Well one way to deal with this would be the policies – suggesting not to fund hybrid journals (as done in OpenAire). And disruption is happening (legal or otherwise) as we can see in Sci-Hub usage which are from all around the world, not just developing countries. Legal routes are possible in licensing negotiations. In Germany there is a Projekt Deal being negotiated. And this follows similar negotiations by open access.nl. At the moment Elsevier is the only publisher not willing to include open access journals.

In terms of tools… The EU has just announced plans to launch it’s own platform for funded research to be published. And Wellcome Trust already has a space like this.

So, some conclusions… Open access is unstoppable now, but still needs to generate sustainable and competitive implementation mechanisms. But it is getting more complex and difficult to disseminate to research – that’s a serious risk. Open Access will happen via a combination of strategies and routes – internal fights just aren’t useful (e.g. green vs gold). The temporary stage towards full open access needs to benefit library budgets sooner rather than later. And the power here really lies with researchers, which OA advocates aren’t always able to get informed. It is important that you know which are open and which are hybrid journals, and why that matters. And we need to think if informing authors on where it would make economic sense to publish beyond the remit of institutional libraries?

To finish, some recommended reading:

  • “Early Career Researchers: the Harbingers of Change” – Final report from Ciber, August 2016
  • “My Top 9 Reasons to Publish Open Access” – a great set of slides.


Q1) It was interesting to hear about offsetting. Are those agreements one-off? continuous? renewed?

A1) At the moment they are one-off and intended to be a temporary measure. But they will probably mostly get renewed… National governments and consortia want to understand how useful they are, how they work.

Q2) Can you explain green open access and gold open access and the difference?

A2) In Gold Open Access, the author pays to make your paper open on the journal website. If that’s a hybrid – so subscription – journal you essentially pay twice, once to subscribe, once to make open. Green Open Access means that your article goes into your repository (after any embargo), into the world wide repository landscape (see: https://www.jisc.ac.uk/guides/an-introduction-to-open-access).

Q3) As much as I agree that choices of where to publish are for researchers, but there are other factors. The REF pressures you to publish in particular ways. Where can you find more on the relationships between different types of open access and impact? I think that can help?

A3) Quite a number of studies. For instance is APC related to Impact factor – several studies there. In terms of REF, funders like Wellcome are desperate to move away from the impact factor. It is hard but evolving.

Inputs, Outputs and emergent properties: The new Scientometrics – Phill Jones, Director of Publishing Innovation, Digital Science

Scientometrics is essentially the study of science metrics and evaluation of these. As Graham mentioned in his introduction, there is a whole complicated lifecycle and process of publishing. And what I will talk about spans that whole process.

But, to start, a bit about me and Digital Science. We were founded in 2011 and we are wholly owned by Holtzbrink Publishing Group, they owned Nature group. Being privately funded we are able to invest in innovation by researchers, for researchers, trying to create change from the ground up. Things like labguru – a lab notebook (like rspace); Altmetric; Figshare; readcube; Peerwith; transcriptic – IoT company, etc.

So, I’m going to introduce a concept: The Evaluation Gap. This is the difference between the metrics and indicators currently or traditionally available, and the information that those evaluating your research might actually want to know? Funders might. Tenure panels – hiring and promotion panels. Universities – your institution, your office of research management. Government, funders, policy organisations, all want to achieve something with your research…

So, how do we close the evaluation gap? Introducing altmetrics. It adds to academic impact with other types of societal impact – policy documents, grey literature, mentions in blogs, peer review mentions, social media, etc. What else can you look at? Well you can look at grants being awarded… When you see a grant awarded for a new idea, then publishes… someone else picks up and publishers… That can take a long time so grants can tell us before publications. You can also look at patents – a measure of commercialisation and potential economic impact further down the link.

So you see an idea germinate in one place, work with collaborators at the institution, spreading out to researchers at other institutions, and gradually out into the big wide world… As that idea travels outward it gathers more metadata, more impact, more associated materials, ideas, etc.

And at Digital Science we have innovators working across that landscape, along that scholarly lifecycle… But there is no point having that much data if you can’t understand and analyse it. You have to classify that data first to do that… Historically we did that was done by subject area, but increasingly research is interdisciplinary, it crosses different fields. So single tags/subjects are not useful, you need a proper taxonomy to apply here. And there are various ways to do that. You need keywords and semantic modeling and you can choose to:

  1. Use an existing one if available, e.g. MeSH (Medical Subject Headings).
  2. Consult with subject matter experts (the traditional way to do this, could be editors, researchers, faculty, librarians who you’d just ask “what are the keywords that describe computational social science”).
  3. Text mining abstracts or full text article (using the content to create a list from your corpus with bag of words/frequency of words approaches, for instance, to help you cluster and find the ideas with a taxonomy emerging

Now, we are starting to take that text mining approach. But to use that data needs to be cleaned and curated to be of use. So we hand curated a list of institutions to go into GRID: Global Research Identifier Database, to understand organisations and their relationships. Once you have that all mapped you can look at Isni, CrossRef databases etc. And when you have that organisational information you can include georeferences to visualise where organisations are…

An example that we built for HEFCE was the Digital Science BrainScan. The UK has a dual funding model where there is both direct funding and block funding, with the latter awarded by HEFCE and it is distributed according to the most impactful research as understood by the REF. So, our BrainScan, we mapped research areas, connectors, etc. to visualise subject areas, their impact, and clusters of strong collaboration, to see where there are good opportunities for funding…

Similarly we visualised text mined impact statements across the whole corpus. Each impact is captured as a coloured dot. Clusters show similarity… Where things are far apart, there is less similarity. And that can highlight where there is a lot of work on, for instance, management of rivers and waterways… And these weren’t obvious as across disciplines…


Q1) Who do you think benefits the most from this kind of information?

A1) In the consultancy we have clients across the spectrum. In the past we have mainly worked for funders and policy makers to track effectiveness. Increasingly we are talking to institutions wanting to understand strengths, to predict trends… And by publishers wanting to understand if journals should be split, consolidated, are there opportunities we are missing… Each can benefit enormously. And it makes the whole system more efficient.

Against capital – Stuart Lawson, Birkbeck University of London

So, my talk will be a bit different. The arguements I will be making are not in opposition to any of the other speakers here, but is about critically addressing our current ways we are working, and how publishing works. I have chosen to speak on this topic today as I think it is important to make visible the political positions that underly our assumptions and the systems we have in place today. There are calls to become more efficient but I disagree… Ownership and governance matter at least as much as the outcome.

I am an advocate for open access and I am currently undertaking a PhD looking at open access and how our discourse around this has been coopted by neoliberal capitalism. And I believe these issues aren’t technical but social and reflect inequalities in our society, and any company claiming to benefit society but operating as commercial companies should raise questions for us.

Neoliberalism is a political project to reshape all social relations to conform to the logic of capital (this is the only slide, apparently a written and referenced copy will be posted on Stuart’s blog). This system turns us all into capital, entrepreneurs of our selves – quantification, metricification whether through tuition fees that put a price on education, turn students into consumers selecting based on rational indicators of future income; or through pitting universities against each other rather than collaboratively. It isn’t just overtly commercial, but about applying ideas of the market in all elements of our work – high impact factor journals, metrics, etc. in the service of proving our worth. If we do need metrics, they should be open and nuanced, but if we only do metrics for people’s own careers and perform for careers and promotion, then these play into neoliberal ideas of control. I fully understand the pressure to live and do research without engaging and playing the game. It is easier to choose not to do this if you are in a position of privelege, and that reflects and maintains inequalities in our organisations.

Since power relations are often about labour and worth, this is inevitably part of work, and the value of labour. When we hear about disruption in the context of Uber, it is about disrupting rights of works, labour unions, it ignores the needs of the people who do the work, it is a neo-liberal idea. I would recommend seeing Audrey Watters’ recent presentation for University of Edinburgh on the “Uberisation of Education”.

The power of capital in scholarly publishing, and neoliberal values in our scholarly processes… When disruptors align with the political forces that need to be dismantled, I don’t see that as useful or properly disruptive. Open Access is a good thing in terms of open access. But there are two main strands of policy… Research Councils have spent over £80m to researchers to pay APCs. Publishing open access do not require payment of fees, there are OA journals who are funded other ways. But if you want the high end visible journals they are often hybrid journals and 80% of that RCUK has been on hybrid journals. So work is being made open access, but right now this money flows from public funds to a small group of publishers – who take a 30-40% profit – and that system was set up to continue benefitting publishers. You can share or publish to repositories… Those are free to deposit and use. The concern of OA policy is the connection to the REF, it constrains where you can publish and what they mean, and they must always be measured in this restricted structure. It can be seen as compliance rather than a progressive movement toward social justice. But open access is having a really positive impact on the accessibility of research.

If you are angry at Elsevier, then you should also be angry at Oxford University and Cambridge University, and others for their relationships to the power elite. Harvard made a loud statement about journal pricing… It sounded good, and they have a progressive open access policy… But it is also bullshit – they have huge amounts of money… There are huge inequalities here in academia and in relationship to publishing.

And I would recommend strongly reading some history on the inequalities, and the racism and capitalism that was inherent to the founding of higher education so that we can critically reflect on what type of system we really want to discover and share scholarly work. Things have evolved over time – somewhat inevitably – but we need to be more deliberative so that universities are more accountable in their work.

To end on a more positive note, technology is enabling all sorts of new and inexpensive ways to publish and share. But we don’t need to depend on venture capital. Collective and cooperative running of organisations in these spaces – such as the cooperative centres for research… There are small scale examples show the principles, and that this can work. Writing, reviewing and editing is already being done by the academic community, lets build governance and process models to continue that, to make it work, to ensure work is rewarded but that the driver isn’t commercial.


Comment) That was awesome. A lot of us here will be to learn how to play the game. But the game sucks. I am a professor, I get to do a lot of fun things now, because I played the game… We need a way to have people able to do their work that way without that game. But we need something more specific than socialism… Libraries used to publish academic data… Lots of these metrics are there and useful… And I work with them… But I am conscious that we will be fucked by them. We need a way to react to that.

Redesigning Science for the Internet Generation – Gemma Milne, Co-Founder, Science Disrupt

Science Disrupt run regular podcasts, events, a Slack channel for scientists, start ups, VCs, etc. Check out our website. We talk about five focus areas of science. Today I wanted to talk about redesigning science for the internet age. My day job is in journalism and I think a lot about start ups, and to think about how we can influence academia, how success is manifests itself in the internet age.

So, what am I talking about? Things like Pavegen – power generating paving stones. They are all over the news! The press love them! BUT the science does not work, the physics does not work…

I don’t know if you heard about Theranos which promised all sorts of medical testing from one drop of blood, millions of investments, and it all fell apart. But she too had tons of coverage…

I really like science start ups, I like talking about science in a different way… But how can I convince the press, the wider audience what is good stuff, and what is just hype, not real… One of the problems we face is that if you are not engaged in research you either can’t access the science, and can’t read it even if they can access the science… This problem is really big and it influences where money goes and what sort of stuff gets done!

So, how can we change this? There are amazing tools to help (Authorea, overleaf, protocol.io, figshare, publons, labworm) and this is great and exciting. But I feel it is very short term… Trying to change something that doesn’t work anyway… Doing collaborative lab notes a bit better, publishing a bit faster… OK… But is it good for sharing science? Thinking about journalists and corporates, they don’t care about academic publishing, it’s not where they go for scientific information. How do we rethink that… What if we were to rethink how we share science?

AirBnB and Amazon are on my slide here to make the point of the difference between incremental change vs. real change. AirBnB addressed issues with hotels, issues of hotels being samey… They didn’t build a hotel, instead they thought about what people want when they traveled, what mattered for them… Similarly Amazon didn’t try to incrementally improve supermarkets.. They did something different. They dug to the bottom of why something exists and rethought it…

Imagine science was “invented” today (ignore all the realities of why that’s impossible). But imagine we think of this thing, we have to design it… How do we start? How will I ask questions, find others who ask questions…

So, a bit of a thought experiment here… Maybe I’d post a question on reddit, set up my own sub-reddit. I’d ask questions, ask why they are interested… Create a big thread. And if I have a lot of people, maybe I’ll have a Slack with various channels about all the facets around a question, invite people in… Use the group to project manage this project… OK, I have a team… Maybe I create a Meet Up Group for that same question… Get people to join… Maybe 200 people are now gathered and interested… You gather all these folk into one place. Now we want to analyse ideas. Maybe I share my question and initial code on GitHub, find collaborators… And share the code, make it open… Maybe it can be reused… It has been collaborative at every stage of the journey… Then maybe I want to build a microscope or something… I’d find the right people, I’d ask them to join my Autodesk 360 to collaboratively build engineering drawings for fabrication… So maybe we’ve answered our initial question… So maybe I blog that, and then I tweet that…

The point I’m trying to make is, there are so many tools out there for collaboration, for sharing… Why aren’t more researchers using these tools that are already there? Rather than designing new tools… These are all ways to engage and share what you do, rather than just publishing those articles in those journals…

So, maybe publishing isn’t the way at all? I get the “game” but I am frustrated about how we properly engage, and really get your work out there. Getting industry to understand what is going on. There are lots of people inventing in new ways.. YOu can use stuff in papers that isn’t being picked up… But see what else you can do!

So, what now? I know people are starved for time… But if you want to really make that impact, that you think is more interested… I undesrtand there is a concern around scooping… But there are ways to do that… And if you want to know about all these tools, do come talk to me!


Q1) I think you are spot on with vision. We want faster more collaborative production. But what is missing from those tools is that they are not designed for researchers, they are not designed for publishing. Those systems are ephemeral… They don’t have DOIs and they aren’t persistent. For me it’s a bench to web pipeline…

A1) Then why not create a persistent archived URI – a webpage where all of a project’s content is shared. 50% of all academic papers are only read by the person that published them… These stumbling blocks in the way of sharing… It is crazy… We shouldn’t just stop and not share.

Q2) Thank you, that has given me a lot of food for thought. The issue of work not being read, I’ve been told that by funders so very relevant to me. So, how do we influence the professors… As a PhD student I haven’t heard about many of those online things…

A2) My co-founder of Science Disrupt is a computational biologist and PhD student… My response would be about not asking, just doing… Find networks, find people doing what you want. Benefit from collaboration. Sign an NDA if needed. Find the opportunity, then come back…

Q3) I had a comment and a question. Code repositories like GitHub are persistent and you can find a great list of code repositories and meta-articles around those on the Journal of Open Research Software. My question was about AirBnB and Amazon… Those have made huge changes but I think the narrative they use now is different from where they started – and they started more as incremental change… And they stumbled on bigger things, which looks a lot like research… So… How do you make that case for the potential long term impact of your work in a really engaging way?

A3) It is the golden question. Need to find case studies, to find interesting examples… a way to showcase similar examples… and how that led to things… Forget big pictures, jump the hurdles… Show that bigger picture that’s there but reduce the friction of those hurdles. Sure those companies were somewhat incremental but I think there is genuinely a really different mindset there that matters.

And we now move to lunch. Coming up…


This will be me, so don’t expect an update for the moment…

SESSION TWO: The Early Career Researcher Perspective: Publishing & Research Communication

Getting recognition for all your research outputs – Michael Markie

Make an impact, know your impact, show your impact – Anna Ritchie

How to share science with hard to reach groups and why you should bother – Becky Douglas

What helps or hinders science communication by early career researchers? – Lewis MacKenzie



SESSION THREE: Raising your research profile: online engagement & metrics

Green, Gold, and Getting out there: How your choice of publisher services can affect your research profile and engagement – Laura Henderson

What are all these dots and what can linking them tell me? – Rachel Lammey

The wonderful world of altmetrics: why researchers’ voices matter – Jean Liu

How to help more people find and understand your work – Charlie Rapple




IIPC WAC / RESAW Conference 2017 – Day Two (Technical Strand) Liveblog

I am again at the IIPC WAC / RESAW Conference 2017 and, for today I am

Tools for web archives analysis & record extraction (chair Nicholas Taylor)

Digging documents out of the archived web – Andrew Jackson

This is the technical counterpoint to the presentation I gave yesterday… So I talked yesterday about the physical workflow of catalogue items… We found that the Digital ePrints team had started processing eprints the same way…

  • staff looked in an outlook calendar for reminders
  • looked for new updates since last check
  • download each to local folder and open
  • check catalogue to avoid re-submitting
  • upload to internal submission portal
  • add essential metadata
  • submit for ingest
  • clean up local files
  • update stats sheet
  • Then inget usually automated (but can require intervention)
  • Updates catalogue once complete
  • New catalogue records processed or enhanced as neccassary.

It was very manual, and very inefficient… So we have created a harvester:

  • Setup: specify “watched targets” then…
  • Harvest (harvester crawl targets as usual) –> Ingested… but also…
  • Document extraction:
    • spot documents in the crawl
    • find landing page
    • extract machine-readable metadata
    • submit to W3ACT (curation tool) for review
  • Acquisition:
    • check document harvester for new publications
    • edit essemtial metaddta
    • submit to catalogue
  • Cataloguing
    • cataloguing records processed as neccassry

This is better but there are challenges. Firstly, what is a “publication?”. With the eprints team there was a one-to-one print and digital relationship. But now, no more one-to-one. For example, gov.uk publications… An original report will has an ISBN… But that landing page is a representation of the publication, that’s where the assets are… When stuff is catalogued, what can frustrate technical folk… You take date and text from the page – honouring what is there rather than normalising it… We can dishonour intent by capturing the pages… It is challenging…

MARC is initially alarming… For a developer used to current data formats, it’s quite weird to get used to. But really it is just encoding… There is how we say we use MARC, how we do use MARC, and where we want to be now…

One of the intentions of the metadata extraction work was to proide an initial guess of the catalogue data – hoping to save cataloguers and curators time. But you probably won’t be surprised that the names of authors’ names etc. in the document metadata is rarely correct. We use the worse extractor, and layer up so we have the best shot. What works best is extracting the HTML. Gov.uk is a big and consistent publishing space so it’s worth us working on extracting that.

What works even better is the gov.uk API data – it’s in JSON, it’s easy to parse, it’s worth coding as it is a bigger publisher for us.

But now we have to resolve references… Multiple use cases for “records about this record”:

  • publisher metadata
  • third party data sources (e.g. Wikipedia)
  • Our own annotations and catalogues
  • Revisit records

We can’t ignore the revisit records… Have to do a great big join at some point… To get best possible quality data for every single thing….

And this is where the layers of transformation come in… Lots of opportunities to try again and build up… But… When I retry document extraction I can accidentally run up another chain each time… If we do our Solaar searches correctly it should be easy so will be correcting this…

We do need to do more future experimentation.. Multiple workflows brings synchronisation problems. We need to ensure documents are accessible when discocerale. Need to be able to re-run automated extraction.

We want to iteractively ipmprove automated metadat extraction:

  • improve HTML data extraction rules, e.g. Zotero translators (and I think LOCKSS are working on this).
  • Bring together different sources
  • Smarter extractors – Stanford NER, GROBID (built for sophisticated extraction from ejournals)

And we still have that tension between what a publication is… A tension between established practice and publisher output Need to trial different approaches with catalogues and users… Close that whole loop.


Q1) Is the PDF you extract going into another repository… You probably have a different preservation goal for those PDFs and the archive…

A1) Currently the same copy for archive and access. Format migration probably will be an issue in the future.

Q2) This is quite similar to issues we’ve faced in LOCKSS… I’ve written a paper with Herbert von de Sompel and Michael Nelson about this thing of describing a document…

A2) That’s great. I’ve been working with the Government Digital Service and they are keen to do this consistently….

Q2) Geoffrey Bilder also working on this…

A2) And that’s the ideal… To improve the standards more broadly…

Q3) Are these all PDF files?

A3) At the moment, yes. We deliberately kept scope tight… We don’t get a lot of ePub or open formats… We’ll need to… Now publishers are moving to HTML – which is good for the archive – but that’s more complex in other ways…

Q4) What does the user see at the end of this… Is it a PDF?

A4) This work ends up in our search service, and that metadata helps them find what they are looking for…

Q4) Do they know its from the website, or don’t they care?

A4) Officially, the way the library thinks about monographs and serials, would be that the user doesn’t care… But I’d like to speak to more users… The library does a lot of downstream processing here too..

Q4) For me as an archivist all that data on where the document is from, what issues in accessing it they were, etc. would extremely useful…

Q5) You spoke yesterday about engaging with machine learning… Can you say more?

A5) This is where I’d like to do more user work. The library is keen on subject headings – thats a big high level challenge so that’s quite amenable to machine learning. We have a massive golden data set… There’s at least a masters theory in there, right! And if we built something, then ran it over the 3 million ish items with little metadata could be incredibly useful. In my 0pinion this is what big organisations will need to do more and more of… making best use of human time to tailor and tune machine learning to do much of the work…

Comment) That thing of everything ending up as a PDF is on the way out by the way… You should look at Distil.pub – a new journal from Google and Y combinator – and that’s the future of these sorts of formats, it’s JavaScript and GitHub. Can you collect it? Yes, you can. You can visit the page, switch off the network, and it still works… And it’s there and will update…

A6) As things are more dynamic the re-collecting issue gets more and more important. That’s hard for the organisation to adjust to.

Nick Ruest & Ian Milligan: Learning to WALK (Web Archives for Longitudinal Knowledge): building a national web archiving collaborative platform

Ian: Before I start, thank you to my wider colleagues and funders as this is a collaborative project.

So, we have a fantastic web archival collections in Canada… They collect political parties, activist groups, major events, etc. But, whilst these are amazing collections, they aren’t acessed or used much. I think this is mainly down to two issues: people don’t know they are there; and the access mechanisms don’t fit well with their practices. Maybe when the Archive-it API is live that will fix it all… Right now though it’s hard to find the right thing, and the Canadian archive is quite siloed. There are about 25 organisations collecting, most use the Archive-It service. But, if you are a researcher… to use web archives you really have to interested and engaged, you need to be an expert.

So, building this portal is about making this easier to use… We want web archives to be used on page 150 in some random book. And that’s what the WALK project is trying to do. Our goal is to break down the silos, take down walls between collections, between institutions. We are starting out slow… We signed Memoranda of Understanding with Toronto, Alberta, Victoria, Winnipeg, Dalhousie, SImon Fraser University – that represents about half of the archive in Canada.

We work on workflow… We run workshops… We separated the collections so that post docs can look at this

We are using Warcbase (warcbase.org) and command line tools, we transferred data from internet archive, generate checksums; we generate scholarly derivatives – plain text, hypertext graph, etc. In the front end you enter basic information, describe the collection, and make sure that the user can engage directly themselves… And those visualisations are really useful… Looking at visualisation of the Canadan political parties and political interest group web crawls which track changes, although that may include crawler issues.

Then, with all that generated, we create landing pages, including tagging, data information, visualizations, etc.

Nick: So, on a technical level… I’ve spent the last ten years in open source digital repository communities… This community is small and tightknit, and I like how we build and share and develop on each others work. Last year we presented webarchives.ca. We’ve indexed 10 TB of warcs since then, representing 200+ M Solr docs. We have grown from one collection and we have needed additional facets: institution; collection name; collection ID, etc.

Then we have also dealt with scaling issues… 30-40Gb to 1Tb sized index. You probably think that’s kinda cute… But we do have more scaling to do… So we are learning from others in the community about how to manage this… We have Solr running on an Open Stack… But right now it isn’t at production scale, but getting there. We are looking at SolrCloud and potentially using a Shard2 per collection.

Last year we had a solr index using the Shine front end… It’s great but… it doesn’t have an active open source community… We love the UK Web Archive but… Meanwhile there is BlackLight which is in wide use in libraries. There is a bigger community, better APIs, bug fixees, etc… So we have set up a prototype called WARCLight. It does almost all that Shine does, except the tree structure and the advanced searching..

Ian spoke about dericative datasets… For each collection, via Blacklight or ScholarsPortal we want domain/URL Counts; Full text; graphs. Rather than them having to do the work, they can just engage with particular datasets or collections.

So, that goal Ian talked about: one central hub for archived data and derivatives…


Q1) Do you plan to make graphs interactive, by using Kebana rather than Gephi?

A1 – Ian) We tried some stuff out… One colleague tried R in the browser… That was great but didn’t look great in the browser. But it would be great if the casual user could look at drag and drop R type visualisations. We haven’t quite found the best option for interactive network diagrams in the browser…

A1 – Nick) Generally the data is so big it will bring down the browser. I’ve started looking at Kabana for stuff so in due course we may bring that in…

Q2) Interesting as we are doing similar things at the BnF. We did use Shine, looked at Blacklight, but built our own thing…. But we are looking at what we can do… We are interested in that web archive discovery collections approaches, useful in other contexts too…

A2 – Nick) I kinda did this the ugly way… There is a more elegant way to do it but haven’t done that yet..

Q2) We tried to give people WARC and WARC files… Our actual users didn’t want that, they want full text…

A2 – Ian) My students are quite biased… Right now if you search it will flake out… But by fall it should be available, I suspect that full text will be of most interest… Sociologists etc. think that network diagram view will be interesting but it’s hard to know what will happen when you give them that. People are quickly put off by raw data without visualisation though so we think it will be useful…

Q3) Do you think in few years time

A3) Right now that doesn’t scale… We want this more cloud-based – that’s our next 3 years and next wave of funded work… We do have capacity to write new scripts right now as needed, but when we scale that will be harder,,,,

Q4) What are some of the organisational, admin and social challenges of building this?

A4 – Nick) Going out and connecting with the archives is a big part of this… Having time to do this can be challenging…. “is an institution going to devote a person to this?”

A4 – Ian) This is about making this more accessible… People are more used to Backlight than Shine. People respond poorly to WARC. But they can deal with PDFs with CSV, those are familiar formats…

A4 – Nick) And when I get back I’m going to be doing some work and sharing to enable an actual community to work on this..



Behind the scenes at the Digital Footprint MOOC

Last Monday we launched the new Digital Footprint MOOC, a free three week online course (running on Coursera) led by myself and Louise Connelly (Royal (Dick) School of Veterinary Studies). The course builds upon our work on the Managing Your Digital Footprints research project, campaign and also draws on some of the work I’ve been doing in piloting a Digital Footprint training and consultancy service at EDINA.

It has been a really interesting and demanding process working with the University of Edinburgh MOOCs team to create this course, particularly focusing in on the most essential parts of our Digital Footprints work. Our intention for this MOOC is to provide an introduction to the issues and equip participants with appropriate skills and understanding to manage their own digital tracks and traces. Most of all we wanted to provide a space for reflection and for participants to think deeply about what their digital footprint means to them and how they want to manage it in the future. We don’t have a prescriptive stance – Louise and I manage our own digital footprints quite differently but both of us see huge value in public online presence – but we do think that understanding and considering your online presence and the meaning of the traces you leave behind online is an essential modern life skill and want to contribute something to that wider understanding and debate.

Since MOOCs – Massive Open Online Courses – are courses which people tend to take in their own time for pleasure and interest but also as part of their CPD and personal development so that fit of format and digital footprint skills and reflection seemed like a good fit, along with some of the theory and emerging trends from our research work. We also think the course has potential to be used in supporting digital literacy programmes and activities, and those looking for skills for transitioning into and out of education, and in developing their careers. On that note we were delighted to see the All Aboard: Digital Skills in Higher Education‘s 2017 event programme running last week – their website, created to support digital skills in Ireland, is a great complementary resource to our course which we made a (small) contribution to during their development phase.

Over the last week it has been wonderful to see our participants engaging with the Digital Footprint course, sharing their reflections on the #DFMOOC hashtag, and really starting to think about what their digital footprint means for them. From the discussion so far the concept of the “Uncontainable Self” (Barbour & Marshall 2012) seems to have struck a particular chord for many of our participants, which is perhaps not surprising given the degree to which our digital tracks and traces can propagate through others posts, tags, listings, etc. whether or not we are sharing content ourselves.

When we were building the MOOC we were keen to reflect the fact that our own work sits in a context of, and benefits from, the work of many researchers and social media experts both in our own local context and the wider field. We were delighted to be able to include guest contributors including Karen Gregory (University of Edinburgh), Rachel Buchanan (University of Newcastle, Australia), Lilian Edwards (Strathclyde University), Ben Marder (University of Edinburgh), and David Brake (author of Sharing Our Lives Online).

The usefulness of making these connections across disciplines and across the wider debate on digital identity seems particularly pertinent given recent developments that emphasise how fast things are changing around us, and how our own agency in managing our digital footprints and digital identities is being challenged by policy, commercial and social factors. Those notable recent developments include…

On 28th March the US Government voted to remove restrictions on the sale of data by ISPs (Internet Service Providers), potentially allowing them to sell an incredibly rich picture of browsing, search, behavioural and intimate details without further consultation (you can read the full measure here). This came as the UK Government mooted the banning of encryption technologies – essential for private messaging, financial transactions, access management and authentication – claiming that terror threats justified such a wide ranging loss of privacy. Whilst that does not seem likely to come to fruition given the economic and practical implications of such a measure, we do already have the  Investigatory Powers Act 2016 in place which requires web and communications companies to retain full records of activity for 12 months and allows police and security forces significant powers to access and collect personal communications data and records in bulk.

On 30th March, a group of influential privacy researchers, including danah boyd and Kate Crawford, published Ten simple rules for responsible big data research in PLoSOne. The article/manifesto is an accessible and well argued guide to the core issues in responsible big data research. In many ways it summarises the core issues highlight in the excellent (but much more academic and comprehensive) AoIR ethics guidance. The PLoSOne article is notably directed to academia as well as industry and government, since big data research is at least as much a part of commercial activity (particularly social media and data driven start ups, see e.g. Uber’s recent attention for profiling and manipulating drivers) as traditional academic research contexts. Whilst academic research does usually build ethical approval processes (albeit conducted with varying degrees of digital savvy) and peer review into research processes, industry is not typically structured in that way and often not held to the same standards particularly around privacy and boundary crossing (see, e.g. Michael Zimmers work on both academic and commercial use of Facebook data).

The Ten simple rules… are also particularly timely given the current discussion of Cambridge Analytica and it’s role in the 2016 US Election, and the UK’s EU Referendum. An article published in Das Magazin in December 2016, and a subsequent English language version published on Vice’s Motherboard have been widely circulated on social media over recent weeks. These articles suggest that the company’s large scale psychometrics analysis of social media data essentially handed victory to Trump and the Leave/Brexit campaigns, which naturally raises personal data and privacy concerns as well as influence, regulation and governance issues. There remains some skepticism about just how influential this work was… I tend to agree with Aleks Krotoski (social psychologist and host of BBC’s The Digital Human) who – speaking with Pat Kane at an Edinburgh Science Festival event last night on digital identity and authenticity – commented that she thought the Cambridge Analytica work was probably a mix of significant hyperbole but also some genuine impact.

These developments focus attention on access, use and reuse of personal data and personal tracks and traces, and that is something we we hope our MOOC participants will have opportunity to pause and reflect on as they think about what they leave behind online when they share, tag, delete, and particularly when they consider terms and conditions, privacy settings and how they curate what is available and to whom.

So, the Digital Footprint course is launched and open to anyone in the world to join for free (although Coursera will also prompt you with the – very optional – possibility of paying a small fee for a certificate), and we are just starting to get a sense of how our videos and content are being received. We’ll be sharing more highlights from the course, retweeting interesting comments, etc. throughout this run (which began on Monday 3rd April), but also future runs since this is an “on demand” MOOC which will run regularly every four weeks. If you do decide to take a look then I would love to hear your comments and feedback – join the conversation on #DFMOOC, or leave a comment here or email me.

And if you’d like to find out more about our digital footprint consultancy, or would be interested in working with the digital footprints research team on future work, do also get in touch. Although I’ve been working in this space for a while this whole area of privacy, identity and our social spaces seems to continue to grow in interest, relevance, and importance in our day to day (digital) lives.



Jisc Digifest 2017 Day One – LiveBlog

Liam Earney is introducing us to the day, with the hope that we all take some away from the event – some inspiration, an idea, the potential to do new things. Over the past three Digifest events we’ve taken a broad view. This year we focus on technology expanding, enabling learning and teaching.

LE: So we will be talking about questions we asked through Twitter and through our conference app with our panel:

  • Sarah Davies, head of change implementation support – education/student, Jisc
  • Liam Earney, director of Jisc Collections
  • Andy McGregor, deputy chief innovation officer, Jisc
  • Paul McKean, head of further education and skills, Jisc

Q1: Do you think that greater use of data and analytics will improve teaching, learning and the student experience?

  • Yes 72%
  • No 10%
  • Don’t Know 18%

AM: I’m relieved at that result as we think it will be important too. But that is backed up by evidence emerging in the US and Australia around data analytics use in retention and attainment. There is a much bigger debate around AI and robots, and around Learning Analytics there is that debate about human and data, and human and machine can work together. We have several sessions in that space.

SD: Learning Analytics has already been around it’s own hype cycle already… We had huge headlines about the potential about a year ago, but now we are seeing much more in-depth discussion, discussion around making sure that our decisions are data informed.. There is concern around the role of the human here but the tutors, the staff, are the people who access this data and work with students so it is about human and data together, and that’s why adoption is taking a while as they work out how best to do that.

Q2: How important is organisational culture in the successful adoption of education technology?

  • Total make or break 55%
  • Can significantly speed it up or slow it down 45%
  • It can help but not essential 0%
  • Not important 0%

PM: Where we see education technology adopted we do often see that organisational culture can drive technology adoption. An open culture – for instance Reading College’s open door policy around technology – can really produce innovation and creative adoption, as people share experience and ideas.

SD: It can also be about what is recognised and rewarded. About making sure that technology is more than what the innovators do – it’s something for the whole organisation. It’s not something that you can do in small pockets. It’s often about small actions – sharing across disciplines, across role groups, about how technology can make a real difference for staff and for students.

Q3: How important is good quality content in delivering an effective blended learning experience?

  • Very important 75%
  • It matters 24%
  • Neither 1%
  • It doesn’t really matter 0%
  • It is not an issue at all 0%

LE: That’s reassuring, but I guess we have to talk about what good quality content is…

SD: I think materials – good quality primary materials – make a huge difference, there are so many materials we simply wouldn’t have had (any) access to 20 years ago. But also about good online texts and how they can change things.

LE: My colleague Karen Colbon and I have been doing some work on making more effective use of technologies… Paul you have been involved in FELTAG…

PM: With FELTAG I was pleased when that came out 3 years ago, but I think only now we’ve moved from the myth of 10% online being blended learning… And moving towards a proper debate about what blended learning is, what is relevant not just what is described. And the need for good quality support to enable that.

LE: What’s the role for Jisc there?

PM: I think it’s about bringing the community together, about focusing on the learner and their experience, rather than the content, to ensure that overall the learner gets what they need.

SD: It’s also about supporting people to design effective curricula too. There are sessions here, talking through interesting things people are doing.

AM: There is a lot of room for innovation around the content. If you are walking around the stands there is a group of students from UCL who are finding innovative ways to visualise research, and we’ll be hearing pitches later with some fantastic ideas.

Q4: Billions of dollars are being invested in edtech startups. What impact do you think this will have on teaching and learning in universities and colleges?

  • No impact at all 1%
  • It may result in a few tools we can use 69%
  • We will come to rely on these companies in our learning and teaching 21%
  • It will completely transform learning and teaching 9%

AM: I am towards the 9% here, there are risks but there is huge reason for optimism here. There are some great companies coming out and working with them increases the chance that this investment will benefit the sector. Startups are keen to work with universities, to collaborate. They are really keen to work with us.

LE: It is difficult for universities to take that punt, to take that risk on new ideas. Procurement, governance, are all essential to facilitating that engagement.

AM: I think so. But I think if we don’t engage then we do risk these companies coming in and building businesses that don’t take account of our needs.

LE: Now that’s a big spend taking place for that small potential change that many who answered this question perceive…

PM: I think there are saving that will come out of those changes potentially…

AM: And in fact that potentially means saving money on tools we currently use by adopting new, and investing that into staff..

Q5: Where do you think the biggest benefits of technology are felt in education?

  • Enabling or enhancing learning and teaching activities 55%
  • In the broader student experience 30%
  • In administrative efficiencies 9%
  • It’s hard to identify clear benefits 6%

SD: I think many of the big benefits we’ve seen over the last 8 years has been around things like online timetables – wider student experience and administrative spaces. But we are also seeing that, when used effectively, technology can really enhance the learning experience. We have a few sessions here around that. Key here is digital capabilities of staff and students. Whether awareness, confidence, understanding fit with disciplinary practice. Lots here at Digifest around digital skills. [sidenote: see also our new Digital Footprint MOOC which is now live for registrations]

I’m quite surprised that 6% thought it was hard to identify clear benefits… There are still lots of questions there, and we have a session on evidence based practice tomorrow, and how evidence feeds into institutional decision making.

PM: There is something here around the Apprentice Levy which is about to come into place. A surprisingly high percentage of employers aren’t aware that they will be paying that actually! Technology has a really important role here for teaching, learning and assessment, but also tracking and monitoring around apprenticeships.

LE: So, with that, I encourage you to look around, chat to our exhibitors, craft the programme that is right for you. And to kick that off here is some of the brilliant work you have been up to. [we are watching a video – this should be shared on today’s hashtag #digifest17]


TEDxYouth@Manchester video live: What do your digital footprints say about you?

This is a very wee blog post/aside to share the video of my TEDxYouth@Manchester talk, “What do your digital footprints say about you?”:

You can read more on the whole experience of being part of this event in my blog post from late November.

It would appear that my first TEDx, much like my first Bright Club, was rather short and sweet (safely within my potential 14 minutes). I hope you enjoy it and I would recommend catching up with my fellow speakers’ talks:

Kat Arney

Click here to view the embedded video.

Ben Smith

Click here to view the embedded video.

VV Brown

Click here to view the embedded video.

Ben Garrod

Click here to view the embedded video.

I gather that the videos of the incredible teenage speakers and performers will follow soon.



Last chance to submit for the “Social Media in Education” Mini Track for the 4th European Conference on Social Media (ECSM) 2017

This summer I will be co-chairing, with Stefania Manca (from The Institute of Educational Technology of the National Research Council of Italy) “Social Media in Education”, a Mini Track of the European Conference on Social Median (#ECSM17) in Vilnius, Lithuania. As the call for papers has been out for a while (deadline for abstracts: 12th December 2016) I wanted to remind and encourage you to consider submitting to the conference and, particularly, for our Mini Track, which we hope will highlight exciting social media and education research.

You can download the Mini Track Call for Papers on Social Media in Education here. And, from the website, here is the summary of what we are looking for:

An expanding amount of social media content is generated every day, yet organisations are facing increasing difficulties in both collecting and analysing the content related to their operations. This mini track on Big Social Data Analytics aims to explore the models, methods and tools that help organisations in gaining actionable insight from social media content and turning that to business or other value. The mini track also welcomes papers addressing the Big Social Data Analytics challenges, such as, security, privacy and ethical issues related to social media content. The mini track is an important part of ECSM 2017 dealing with all aspects of social media and big data analytics.

Topics of the mini track include but are not limited to:

  • Reflective and conceptual studies of social media for teaching and scholarly purposes in higher education.
  • Innovative experience or research around social media and the future university.
  • Issues of social media identity and engagement in higher education, e.g: digital footprints of staff, students or organisations; professional and scholarly communications; and engagement with academia and wider audiences.
  • Social media as a facilitator of changing relationships between formal and informal learning in higher education.
  • The role of hidden media and backchannels (e.g. SnapChat and YikYak) in teaching, learning.
  • Social media and the student experience.

The conference, the 4th European Conference on Social Media (ECSM) will be taking place at the Business and Media School of the Mykolas Romeris University (MRU) in Vilnius, Lithuania on the 3-4 July 2017. Having seen the presentation on the city and venue at this year’s event I feel confident it will be lovely setting and should be a really good conference. (I also hear Vilnius has exceptional internet connectivity, which is always useful).

I would also encourage anyone working in social media to consider applying for the Social Media in Practice Excellence Awards, which ECSM is hosting this year. The competition will be showcasing innovative social media applications in business and the public sector, and they are particularly looking for ways in which academia have been working with business around social media. You can read more – and apply to the competition (deadline for entries: 17th January 2017)- here.

This is a really interdisciplinary conference with a real range of speakers and topics so a great place to showcase interesting applications of and research into social media. The papers presented at the conference are published in the conference proceedings, widely indexed, and will also be considered for publication in: Online Information Review (Emerald Insight, ISSN: 1468-4527); International Journal of Social Media and Interactive Learning Environments (Inderscience, ISSN 2050-3962); International Journal of Web-Based Communities (Inderscience); Journal of Information, Communication and Ethics in Society (Emerald Insight, ISSN 1477-996X).

So, get applying to the conference  and/or to the competition! If you have any questions or comments about the Social Media in Education track, do let me know.