LiveBlog: Closing Keynote

Peter Burnhill, Director of EDINA is introducing our closing keynote, something of a Repository Fringe frequent flyer. But he is also announcing that this year is the 30th birthday of the University of Edinburgh Data Library. There was a need for social scientists to store data and work with it. That has come a long way since. And we now face questions like curation, access, etc. Back to my first duty here… I had an email from Robin Rice in 2011 “we like FigShare” and wrote to the organising list “FigShare: could be the new data sharing killer app!” a bit of an understatement there. So, let’s find out what’s happened in the last two days. So, over to Mark!

Mark Hahnel – FigShare

So I am doing this PK-style as it’s Friday afternoon and we have people on stilts going past! Here we have people from institutions, from libraries. I’m not. We have different ideas so I want your ideas and feedback!

So I’m going to talk about open and closed… We’ll see where we get.

So FigShare let’s you upload your research. Yo can manage your research in the cloud. This has evolved since 2011. We can’t ignore why not all data can be open… So we have a private side now. Our core goal is still being Discoverable, Sharable (social media), Citable (DOI). discoverable is tricky!

We are hosted on Amazon web services, we are ORCID launch partner (only one with non article data I think), we are on a COPE (committee on publication ethics), we are getting DOIs from DataCite AND we are backed up in LOCKSS.

We wanted dissemination of content on the internet – its a solved issue. Instead of going backwards… Let’s see how we go forward by copying this stuff. In common these services like flickr, sound loud etc. visualise content in the browser – you don’t have to download to use.

So live demo number 1. So we have a poster here. Content on the left. Author there. Simple metadata, DOI, and social media shares. We’ve just added embedding – upload content to FigShare and use on your own site. So datasets are custom built in the browser – want to see your 2GB file before you download. You shouldn’t even be downloading, should all be on the web and will be. Ad we have author profiles. With stats including sharing stats. That is motivating. That rewards sharing. Think about who is involved in research. E try to do the other side of incentives action here too! Metrics are good. So is doing something cool with it. So for instance here is a blogpost with a CSV and a graph. So we have a PNG of the data… You can’t interact. But the CSV let’s you create new interactive charts. And we also added in ways to filter data.

We are also looking at incentivising to give back – doing research like an instant T test. Moving towards the idea of interactive research. But this is something that allows you to make research more interactive.

Q – Pat McSweeney) is this live or forthcoming?

It’s live but manually done. A use case for groups that use FigShare the most, that need special interaction for journals.

We are a commercial company but you can upload data for free. We work with publishers. We visualise content really well. So this is additional materials for PLoS, these are all just here on afigShare – theres a video? Play it! It’s how the internet work! Don’t download! We do his for publishers. Another thing we created for a publisher is that click open a graph, you get a Dataset. A researcher asked for it, we built it!

So, back off the internet…

So discoverable. What does that mean? google finds us but… Well is it hearsay? So DataCite started tracking our DOIs. For three months we were 8 our of top ten, then 7 out of top ten, then 9 out of to ten for traffic. So hey, we are discoverable!

But the future of repositories… Who cares?

So who takes ownership of this problem now – funders, stakeholders, or academics? I think it’s institutions and more specifically librarians. librarians are badass. They have taken ownership. They lead change, they try new things.

but the funders? Funders are really reacting to the fact that they want their data – it may be about what researchers want to reuse but really it’s about the impact of their spending. But they are owning that problem. NSF requires sharing with other researchers, similarly humanities. The EU are also talking about this – but not owning the problem, just declaring it really.

So looking across funders… Some have policies… Some stipulations… wellcome Trust withhold 10% of cash if you do not share data. That will make a difference. But what do you do with that data?

What about academics? Well they share data! I generated 9GB a year – probably in middle of the curve in terms of scale – in my PhD. So globally 3PB/year ish. But how much of my PhD is available? A few KB of data. My PhD is under embargo until later in the year, but it will be there.

I felt there were moral and ethical obligations. Sharing detailed research data is associated with increased citation. Simplicity matters, visualisation is cool. I thought it was about an ego trip, academics have to disambiguate themselves…

Now two years after leaving I was asked t come back ion and print excel files for my data for a publication… I generated this without a research data plan. Two years after I left my boss thinks I still work for her. She will hand the next guy working for her… What does he do, copy them back in?

So there is so much more here. It is not just open or closed, it is about control. It’s the Cory Doctorrow thing, the further you are from a problem, the more data you’ll give up, the Facebook issue. you do want control, it matters.

So what motivates academics? Being easy, being useful, and what do funders what – we will jump through hoops for them.

So back to the web… My profile has new different stuff but you’ll see sharing folders – group projects and discussions, ways to reshare that data. Nudge your sharing. But you need the file uploaded now to share two years later. You can share otherwise closed things with colleagues, regardless of institution.

Btw on this slide we have our designers idea of an institutional library – looks a lot like a prison.

So back to those libraries. How much data does an institution generate? Very few know this, how do you assess. Right now we are doing stuff for PLoS we let them browse all their stuff. They can see what they produced. And this aggregation is great for SEO too. Makes it easy to Google then find the research article from there. So from this aggregation we can filter top most viewed, to particular titles. Essentially this is a repository of research outputs, we take all formats. You can imagine that this could be there for any institution. And this has an API.

Institutions also want stats. See where traffic is from. Not just location but institutional IP ranges. So we can show where that item has impact, where viewers come from. But, at the same time populating repositories is hard. But we have data from Nature from PLoS. We can hand that data back to your repositories. We can find the association with the institution.

So it’s about control. It’s Research Data Management as well as Research Output Dissemination all in one.

So we have launched FigShare for institutions. We have heard concerns about metadata standards and how much metadata we have, so Henry Winlaker used our API to build a way to add more metadata to fit institutional needs. So if you share responsibility… Well what’s the point of the institutional repository? I would say that I think IRs are about to move fast. They have to, it was idealistic but now it’s mandated! Next year repositories will look very different. RDM plans say they have to. Funders say they have to.

This community is amazing! resourceSync is great, I want to use it! PMRs Dev challenge idea is great. We are commercial but we can work together!

Do we need to go back further? People use Dropbox, drag files in. We have a desktop app too. But maybe whenever you save a file maybe you need to upload it then. So at projects.ac there is a project. A filesystem that nudges you to add metadata and do things as you are reqArded to do them. You can star things, it does version control. Digital science created this. It’s kind of like it can do so much more. So releasing it to see what’s needed. What’s really cool… You can download this now… If you press save now it saves it to FigShare. That sync would be ideal. Trying it out now. I work in the same office but there is no reason why these shouldn’t all be connected up to IRs to FigShare to all of these things…

And this is a slide specially for Peter Murray-Rust…

I know that openness is brilliant! But it’s also great to work with publishers. More files were made available for free, for academics, that’s great. Everything publicly available will ONLY be by CC0 and CC-BY. SHARE ALL THE DATA.

Q&A

Q1 – Paul) what is the business model?

A1) for PloS it’s about visualisations and data. They lay us to do that. They have a business model for that. And FigShare for Institutions is coming that’s also part of the model

Q2 – Peter MR) I trust you completely but I do not trust Elsevier or Google… Etc. so you have to build organisational DNA to prevent you becoming evil. If you left or died what would happen to FigShare, yo see the point?

A2) I see that. But this is aimed at this costs us money. E sell to institutions but there are economies of scale. Two institutions have built their own data repositories and they cost Â£1million and Â£2million. Thats a lot of money.

Q2) Mendeley have a copy of all the published scientific data these days. FigShare will have massive value of data in it, huge worth, institutions may want to know what staff are doing, t spy on the,. You have something of vast power, vast potential value. The time is now to create governance structure to address that.

peter Burnhill) there are some fundamental trust issues

Mark Hahnel) you can trust the internet to an extent. Make stuff available and it proliferates but you can reuse, you can sell it on etc.

Peter Burnhill) next year we need a discussion of ethics

Q3 – Kevin Ashley) FigShare for institutions. can you say anything about the background consultation around that. A contract is very different to free stuff

A3) sure, legally we have a lot of responsibility. Eve been working with universities, individual ones, to see what the needs are. We spoke to lots of people. Mainly in London but to see we didn’t tread on toes, we didn’t risk their research leaking out. We spoke to institutions more globally. Digital science is a good thing, this is where they come in.

Peter Burnhill) I am a member of the CLOCKSS brand. There is contract between all publishers that CLOCKSS ingests everything they make available and it says that if a failure to deliver happens – for whatever reason – then CLOCKSS have the right to make that data available via platforms (one here at EDINA, one at Stanford) so in terms of assurance that what comes in goes out, joining CLOCKSS does that. The agreement is supra government. You give up that right there that it will remain available.

Mark: absolutely. And all data is available via the API if you want to.

Final Wrap Up – Kevin Ashley

Thanks you to mark for a great final session. So, at an event like this we come here to share ideas, we come to share experience, we look for answers, we come to meet people and to make new connections. We come to learn. We may come with one or many objectives. We at the DCC certainly have been able to. Many of you are new here.

I have learnt lots of stuff. A few things stuck. A whole room of experts can’t put an object into an EPrints repository, there’s a lesson there somewhere about interfaces. And the other interesting idea I picked up from les Carr. Maintaining open access and having a business plan for what we do. So the Dcc how to set up RDM licenses are free but limited edition leather bound copies to come – great idea Les!

I hope all of you did one or several of those things then share, tell us, this is an unconference! We want to keep making this event better every year. We see the event as being about you, about facilitating you to meet and connect.

There will be a Repository Fringe next year. One reason for that is that we have fantastic sponsors. All of whom put into this event. And hopefully we can extend that further next year. But thank you also to session chairs, the speakers, and to the organising committee here. I know how much work goes into this. And a great deal happens and happens smoothly because of that work.

Two people to thank specifically. Florance Kennedy of the DCC and our chair Nicola Osborne!

EDINA Blogs

A Blogs.edina.ac.uk weblog

LiveBlog: Closing Keynote