Repository Fringe 2015 – Day One LiveBlog

Welcome to Repository Fringe 2015! We are live for two packed days of all things repository related. We will be sharing talks via our liveblog here, or you can join the conversation at #rfringe15. We are also taking images around the event and encourage you to share your own images, blog posts, etc. Just use the hashtag and/or let us know where to find them and we’ll make sure we link to your coverage, pictures and comments.

This is a liveblog and that means there will be a few spelling errors and may be a few corrections required. We welcome your comments and, if you do have any corrections or additional links, we encourage you to post them here. 

Welcome to Edinburgh – Jeremy Upton, Director, Library and University Collections, University of Edinburgh

It gives me great pleasure to introduce you to the University of Edinburgh to this great event organised jointly by staff from the Digital Curation Centre, EDINA, and the University of Edinburgh.

If you have come from outside Edinburgh then it really is a beautiful city and I encourage you to explore it if you have time. And of course it’s the Edinburgh festival and I’m sure you’ve already had a sense of that coming in today. It is an event with a huge impact on the city, and on the University – we get involved in hosting and running events and I have to give a plug for our current exhibition, Towards Dolly, featuring Dolly the Sheep at the University of Edinburgh library.

So, as the new Library Director I really am pleased that Repository Fringe is running again here. And in my time thus far two issues have really been a major priority: Open Access and Open Data, and I’m pleased to see both reflected in your programme. Amongst academics these issues can trigger quite a fair degree of panic and concern. But you can take a really positive opportunity here – the academic community is looking to our community to provide creative solutions.

I’m also delighted to be here because throughout my career I have been a fan of collaboration and shared working, and the areas of open access and open data are areas particularly ripe for collaborative and shared working, to share knowledge and share some of our pain, as we all meet these shared requirements.

We also find ourselves in an increasingly uncertain world and that makes the role of innovation and ideas so important – and that is why events like this are so important, giving us space to

Edinburgh was an early adopter of OA, starting in 2003. We have strong support for open access with champions within departments. We received over £1.1M RCUK funding of Gold OA this year. Our staff look at the landscape not only from our own institutional perspective, but also looking to the much wider sector. We work in collaboration with colleagues in EDINA and with DCC, talking regularly and sharing knowledge and expertise.

We are one of the partners in the new Alan Turing Institute, working with large data sets including large open data sets. And that is looking at the opportinities for new and innovative research in areas such as healthcare. And I heard our Vice Chancellor talking about the use of data in, for instance, treating diabetes in new ways.

Now, finally, I have a few practical items to mention. You all have stickers for voting on your favourite posters. Also Repository Fringe is very deliberately a fringe event – we want this event to have a looser structure than traditional conferences. The organisers want me to emphasise informality, please do dip in and out of sessions, move between them, your presenters will expect that so move as you wish. And if you want to create your own break out sessions there are rooms available – just ask at the Fringe registration desk. And the more that you put into this event, the more you will get out of it.

We would like to thank our sponsors this year: Arkivum, EPrints Repository Services, and the University of London Computing Centre.

So, please do enjoy the next few days and take the opportunity to see some of Edinburgh. And hopefully you will have a fruitful event finding new solutions to the challenges we all face.

Now it is my pleasure to introduce your opening keynote speaker. David Prosser came from the “dark side” of medical publishing, then moving on to undertake his doctorate and move onto his work with Research Libraries UK…
Fulfilling their potential: is it time for institutional repositories to take centre stage? – David Prosser, Executive Director, RLUK,

As someone who has been involved in Open Access for the last 12 years I want to look back a bit at our successes and failures, and use that to set the scene with where we might go forward.

I wanted to start by asking “What are repositories for?”. When we first set up repositories they were very much about the distribution of research, for those beyond the institution, and for those without the funding to access all of the journals being published in. There was, and continues, to be debate about the high profits made by commercial publishers… There was a move towards non commercial publishers. And there was something of a move to remove “dirty profits” from the world of scholarly communications that helped drive the push to open access.

We have also seen a move from simpler journals and books, towards something much richer which repositories enable. We were going to revolutionise scholarly communications but we haven’t done that. We have failed to engage researchers adequately and I think that the busyness of academics is an insufficient reason to explain that. So why have we failed to engage and to get academics to see this as something they should do on a daily basis? I warned Les Carr years back when he was talking about the Schools timetabling… He was saying how hard that is to do but that he knew he had to do that because that was part of what he needed to do to achieve what he wanted. We have gotten fixated on making things easy in open access, even though people will take the time to do things they feel its important to do.

And there has also been confusion about standards, about publication status – what they can share as pre/post print and what they were allowed to do which took a long time to resolve. It didn’t help that publishers were confused too. There is an interesting pub conversation to be had about whether the confusion is a deliberate tactic from publishers… In charitable moments you can see statements coming out that suggest confusion, that publishers don’t understand the issues.

But there are places where those issues have been overcome. Arxiv is so well established in high energy physics that no publishers restricts authors in depositing there. Another subject based repository, PubMed Central, has also seen success but that is in part because of requirements on authors and publishers, and that space has seen success because of working closely with publishers. We have also seen FigShare and Mendeley that have seen great success – what is it about them that is attractive that we can learn from and borrow from for repositories?

Over the last 12 years there has been a real tension between pragmatism and idealism. When repositories first emerged we were happy to take in content without checkin the quality so carefully – for instance suitable and clear rights information. As a user the rights information is not always there or clear. Not all metadata we have, especially for older material, is neccassarily fit for purpose, for our needs. We have to some extent a read only corpus because of that. But is that enough? Or are there more interesting things we would want to do? It is difficult to look back and see those pragmatic decisions as the wrong one: we wanted to demonstrate the value; to show authors the potential for dissemination of their research… It is hard to say that was wrong but going forward we really need to make a concious decision about what it is that we want.

So one of the things about open access as a force for revolutionising scholarly communication… You can see scholarly communication as being made of three functions of registration, archiving and dissemination, now all of these can be fulfilled by repositories but we still seem to be using repositories for all of these things. We haven’t moved to using repositories for those functions first. Early on there was an idea that you would deposit work locally then get it accepted, kitemarked, etc. by a journal or process after that. We can see the journal has retained it’s position. Libraries in the Scope3 project, which looked at journals whose content was entirely available in Arxiv, spent 10 years persuading and working with publishers to get those journals to be open access so that post prints were as open access as all previous and parallel versions of the same paper. But that was about protecting journals. Libraries seem to be so keen on journals that they are desperate to protect them, sometimes in the face of huge opposition from publishers!

So we have a very conservative system. You have to see journals not as a form of scholarly communication, it is about reward mechanisms. If you are rewarded for being in one of those high energy physics journals it does make sense that you should be so invested in supporting their existance. The current reward structures are the issue, but what is the solution there? One of the governments key advisers, Dr Mike Walker, raised this issue without suggesting solutions. And in research institutions and libraries we are so far away in terms of our sphere of influence from those reward mechanisms which means all we can do is nudge and inform…

We can, however, see open access advocacy as a success. In the last government we saw some openness to talking about open access… We can talk about whether the impact of that has been totally helpful but there has been impact. Something like 80 institutions now have open access policies – they vary in effectiveness but those even being in place are remarkable. And where they work well they make a real difference, with Wellcome Trust, the University of the Age (?), RLUK and the HEFCE policy is really the game changer.

It has been interesting, over the last few weeks, to see a change in the HEFCE policy. It is interesting to see how ready institutions are for it – there are many that are not ready yet and that could mean a pressure to change that policy but we see them stating that policy mistakes  will be treated leniently which is helpful. Authors usually know if their paper has been accepted but it can be harder to know when it has been published, which is an important trigger. But it seems that the stick of the HEFCE policy is too strong. Universities don’t trust academics and researchers to deposit regularly, and they recognise the risks that that brings in terms of the REF and their funding in the future. This is why a lot of Russell group universities in particular have lobbied for acceptance rather than publication date…

It says a lot about scholarly communications that authors and institutions do not always know when a paper has been accepted or published. The idea of the notification of acceptance being a private transaction between the author and the publisher, that raises some concerns for research libraries.

Now, I wanted to make a small diversion here to talk a bit about RCUK. With the comprehensie spending review coming up in UK Government, and saber rattling about 40% cuts in research budgets. And I think funders, RCUK in particular, will look at what they are spending and ask if they are getting value for money. And I think researchers will also question, if their budgets are cut, why RCUK are paying so much money to Reed Elsevier. So there will be pressure to stop paying for open access. And there is a transition period where longer embargoes are allowed for open access – this has led to groteque growth of publishers decision trees! It could be that that the end of that transition period, and a cut in funding for gold OA, may put the focus back on repositories. That is an important scenario that we should be thinking seriously about. And the issue of embargoes means I need to say that there is still no harm in shorter embargoes. Any embargo is a concession to the publisher. It’s a concession that potentially slows down the communication and sharing of research.

There is also an important embargo change where publishers fail to respond to enquries about gold OA, such that those crucial first few weeks of interest may be lost to them. Now I think that’s another incompetance issue rather than something more sinister.

By failing to engage authors in the deposit process, to engage them in that way. We are making APCs payments easier – we just ask authors to tell us where they are publishing and we pay from them. I am concerned about separating the author from the process in general, but particularly from APCs. The author doesn’t know or care about the costs involved. If they do engage with that, if they do look, then they need to make that choice about whether the price charged is worth it for the relevance, impact or importance of that journal. Separating the author from the process makes us in danger of creating an APC crisis in the same way that we had a serials crisis.

TRaditionally Universities have shown a shocking indifference to their scholarly output – the research papers, publications, etc. It was very hard to understand what was published, what was created. Very little responsibility on scholars to capture their own published outputs – an assumption that library would purchase but that assumption was not always correct. Some of that is being addressed by REF, but also be Universities becoming much more aware of their intellectual output. Capturing and reflecting on that output is no longer seen as weird or alien, and that is good for our work, for our arguements about the value of open access, of respositories, etc. BUt universities do also care about cost benefit analysis for this work. And for data in particular there can be really high costs associated with making data available for reuse. We need better stories to explain how th ebenefits outweigh the costs.

We have had issues over the last 15 years around the visiion of open access that we originally had… In the UK we could talk about

Danny Kingsley, Cambridge University talked at LIBER about the idea that in a sense the compliance engine aspects of repository fringe can devalue the potential of repositories, of what they could be for open access in the academic community. If open access is “just” a side effect of repositories it is an amazing side effect! Making work available under open access is a real achievement, even if the route is rather tortuous, and has involved pain in negotiating the confusion and issues with publishers, we have made a real difference. And there is nothing wrong with being flexible over open access, and of jumping onto band wagons. Compliance is a useful band wagon right now, so we should use it! We should stop worrying about whether people do the right thing for the wrong reason, and just be glad that the right things are taking place.

But over the next few days we should be thinking about how we can use what is in our repositories, how easy to rights statements make licenses, how can we look across a topic easily across multiple repositories. And, in terms of preservation, how concerned are we and should we be about that? Are they more about dissemination? If we are going to get an explosion of material of the next few years do we have the capacity to handle and interpret that material?

So we have had a messy tortuous route here but open access is really happening, and we have several days to develop our vision for what we should be doing with this. David Willets has talked a lot about open access, I think he’s rather overplayed his hand based on what is happening in the US, but there is so much more that we could do with open access,

Q&A

Q1, Grant, University of Leicester) How many repositories should there be? There tends to be 1 per university. There are some joint ones between institutions…

A1) If you started with a blank state today would you set up 100-120 institutional repositories? I tend to think no, you wouldn’t… You would want something more centralised. There are a variety of institution with very varied expertise: Edinburgh is very skilled and engaged and would want their own repository but there are many institutions are really concerned about what they can set up to meet HEFCE requirements, and there is an opportunity there for someone to bring them together so that they can all meet those requirements in a centralised way. I think there should be more centralisation…

Comment – Paul Walk) There is a shadow issue there about not the number of repositories, but who controls it. A collaborative set up where control is retained seems the important thing…

A1) I think White Rose seems like a great example – a shared repository but it looks like each institution has their own space in terms of how that is presented on the web.

One of the big areas in fashion is the idea of library as publisher, of each institution publishing. I think what should be learned is that infrastructure for University presses should be shared, but content is where each institution should focus. The idea of all institutions using their own publishing platforms, different set ups, appearing to be but not quite interoperable, doesn’t seem like the way to go.

Q2 – Kevin Ashley, DCC) I remember Andrew Prestwick talking about institutional repositories in Wales where he commented that for smaller institutions the issue of control, of their own system, was really important to them.

A2) We live in a strange world where authors are hugely keen to give away all of their Intellectual Property to commercial publishers but can be odd about making it open access.

Comment – Rachel Bruce, Jisc) I remember the conversations Kevin talked about, and we set up a shared repository, the Depot but that was not a success. The institutional repository structure seemed more effective at that time.

A2) I think that may have been an issue of that being too early. The Depot has been more a repository for lost souls, for authors without institutions… But there wasn’t really an attempt to engage institutions.

Comment – RB again) It was a repository of last resort… And we would engage differently around that if we were doing that now.

Q3 – Les Carr, Southampton) In terms of where things should be put, should there be departmental repositories? As someone with a national view, looking over a national research ecology, how would you reshape the research landscape 15 years ago? We seem to have gotten stuck in commerciality, compliance, quality of journals, quality of research, and not questioning the system. How would you have shaken up the system in 2000 to change that?

A3) It is really hard. Many of the decisions of the last 15 years were made with good intent. The whole of scholarly communications is about the reward structure. It makes people write papers that are not really intended for communicating results, but for getting rewards. You see Peter Murray-Rust talking about this a lot… You have a huge range of data and outputs that you have to reduce to 5 pages of write up and results that are not easy to reuse. We do that because of pay and reward… Here we have the bizarre situation that HEFCE says that Impact Factor and where you publish isn’t the issue in REF, but everything academics and researchers believe is that that stuff matters. And so much of what we are doing are hampered by the idea that journals are how we decide funding, how people develop their careers. But if Dr Mike Walker can’t say what the alternative would be, I don’t think I can.

Repositories for Open Access, Research Data Management and beyond – Rory McNicholl, Timothy Miles-Board, University of London Computer Centre

I am going to start with a short potted history of the University of London Computing Centre… In 1966 the Flowers Report assessed the probably computer eeds during the next five years of users in Universities and civil research. The great and the good of the University of London met to discuss this and they commissioned a glamorous building in 1968 for computing. By the 1980s we had a new machine which had a fantastic amount of computing power which could be used by researchers around the region.

After the 1980s there was deemed to be less need for a single computer centre in quite the same way. But that there was a real need for computing for HE and Public sectors. So, what are we doing at Repository Fringe? Well back in 1997 Kevin Ashley and colleagues recognised the need to preserve at-risk digital objects and work was undertaken to address that through a project, NDAQ, that ran to 2009. Following that we have been working on a new project, from 2006, including a Digital Preservation Training Programme, and what we are now calling the Research Technologies Service.

The Research Technologies Service provide various things including Open access repositories; research data repositories; eJournals – which there has been growing interest in; Archvival storage; and Bespoke asset presentation – a way to have a front end customised for specific organisations.

To achieve this we are using ePrints, alongside OJS for our eJournals, and Arkivum (A-Stor in ULCC DC), as well as Python, Django and elasticsearch. And we do that for various institutions which means we need to be interoperable with 3rd party systems. So we are interoperable with institutional HR systems, Harvesters, etc… with crossref, fundref, CERIF, IRUS UK, Altmetric, BL, OpenAIRE, ORCID, DataCite, SHERPA. But there are so many more – too many to detail in full.

How do we do what we do? We are flexible, a small team which is very well supported with infrastructure expertise and a service desk. We are community driven, as part of the HE community responsive to that community. We are also fluid, platform agnostic, and ready to listen to our customers and embrace change.

That brings us on to the community platform, how we realise those things. Those funder (HEFCE and SFC) mandates, tend to drive what we do… That’s what keeps the community up at night, and thinking about what they can achieve and how. We engage in a way that takes best advantage of the shared code and initiatives around open source software. So developers write code, share on GitHub, and the people we host can then access that shared expertise and development via ePrints and the ePrints Bazaar – bypassing commercial coders and quickly ensuring they are able to address RDM, open access, etc. issues.

And we have a community platform for Open Access – oa_compliance, OpenAccess, rioxx2 – we’ve made that something we can put into a repository so it describes what needs to be described; datesdatesdates – a way to understand which dates count; reviewed_queue – to manage the process and workflow to crack the publication process; ref2014; and… more? The open access button is of interest… ePrints has had a “request copy” button for years and years… Maybe we need a “request open access copy” added?

Over the last couple of years there has been a huge push towards using repositories for research data, and with RDM. We have been working with University of Essex and University of Southampton to look at the recollect profile – keeping research data and describing it effectively. And we have put that into the community platform. And then we undertook work with University of East London, and the London School for Hygiene and Tropical Medicine; and we have worked on DataciteDOI, developing that on a bit with University of Southampton and DataCite; and arkivum has been a big part of the OA Mandate… Seeing that it became clear that there was a need for infrastructure, and work with arkivum has helped us top up access to the archive network. And another thing that came out of the EdShare world, UEL, and LSHTM which was about describing project data sets and collections. And more? We are working with Jisc, University of Creative Arts, and CREST on the next phase of the Research Data Spring project to improve the way that data moves from the researcher to the repository, to make that quicker and more efficient, and the presentations of that data.

And beyond… Lots of other things have happened… RepoLink – for linking research papers together, UEL have gone for this for linking research objects; pdf_publicationslist; soundcloud; iiif-manifest – coming out of work on presenting digitised objects and that is feeding back into how we do presentation; bootstrap – our colleague at UEL did some fantastic work around bootstrap to make repositories work well on mobile, that’s available to use and explore now; crosswalks_sgul – this has been around symplectic tools, we work with St George’s a lot on this and they have been happy to publish this back into the community.

So, all of the work we’ve done can be found on the community platform, ePrints bazaar (bazaar.eprints.org), but the source code around that isn’t always obvious so you can also find this work on GitHub (github.com/eprintsug).

So, what’s next? Well the Public Knowledge Project and the idea of university presses seems timely, there are more opportunities for more community platforms. There are exciting things coming from our siblings at the School of Advanced Study and Senate House Library, who have made an interesting appointment in the area of digital so exciting things should come out of that… And we are also looking at Preservation as a Service… working with Arkivum and artfactual… or maybe something more simple. And we are also creeping backwards through the research object lifecycle… And of course more collaborations, so we have ORCID in place but can we help institutions get more impact from it for instance?

Lastly… We have a job ad out – we’re hiring – so come join the team! Contact me: rory.mcnicholl@ulc.ac.uk. Thank you!

Q&A

Q1 – Rachel Bruce, Jisc) Who is using OJS?

A1) We have several universities using OJS, we’ve worked on a plugin for integrating in repositories, on a system for another universities to encourage universitity staff and students to set up their own journals. We have three universities using OJS in those ways so far, but lots of interest in this area at the moment.

Q2 – Dominic Tate, UoE) Is there one area of service you are particularly looking at suppoet?

A2) I think CREST has been an interesting example… provising archiving for organisations that can’t justify doing that separately. We do tend to focus on the technology

Poster Session – why should we look at your poster? – Martin Donnelly orchestrating the minute madness!

Sebastian Palucha: I am talking about Hydra, I can tell you about Hydra.. If you know about we moved from ePrints to Hydra, and we can also talk about how we integrate DataCite.

Gareth Knight from London School of Hygiene: We developed a plugin to add geospatial data to items in ePrints. Come and ask us about it!

Alan Hyndman from FigShare: My poster is on how FigShare can interoperate with institutional repositories, and also some of the other interoperabilities we are already doing…

Robin Burgess, from GSA: Apologies, no guitar this year! This is on exploring research data manager in the digital arts, and in the communities fields. And this is my last chance to present here – I’m moving on to Sydney as their Repository and Digitisation manager so I wanted to go out with a bang!

Adam Carter, from the EPCC: I’m here for the Pericles project, and EU FP7 project on digital preservation. We are not building a digital repository, we are about the various different aspects of managing change around a digital repository. We are arguing getting data in is easy, how do you deal with technological change in terms of accesing and using data, and the change in who uses your data and how, so it ties into repositories in many ways. The poster includes some work we are doing modelling the preservation ecosystem. Also on sheer curation – that preservation when the data object is created, not when you deposit it.

Rory Macneil, from RSpace: on integrating electronic lab notebooks with RDM and linking in DataStore at University of Edinburgh, we’ll be doing a demo at lunchtime on this too! RSpace supports export of documents, folders and associated metadata in XML files, and that work leads to an integrated RDM workflow for researchers and the institution, so that the data is collected, structured, and archived and shared. That’s possible by working with researchers, RDM professionals and IT managers.

Pablo de Castro, from LIBER: my poster is on the EU FP7 post cancellation access project. This is an experiment which OpenAIRE has been managing in order to implement fair Gold OA. Some specific constraints that this project looks at is that publication in hybrid journals will not be funded. And we are working on what we call the APC alternative funding project. We are working with a €4 M with a pilot that began in May, with significant help from University of Glasgow. And given those constraints we are keen to engage institutions to make this a success, this idea for an alternative way to implement Gold OA. And we have some idea of the main places requests are coming from, etc. But it should grow quite a lot in the forthcoming months.

Martin: In the spirit of the Fringe please do make use of the blank poster boards! Add your own literature, arrows, etc!

Hardy Schwamm: DMA Online is an online dashboard, funded under Jisc Research Data Spring, which provides a view of how many data sets are funded and created in your institution, how many have an RDM plan, how much data they plan to use. It takes data from various places and hopefully DMP Online, and any information that is held spreadsheets. You can see our poster and our demo. Do come and tell us what you would like to see from the dashboard…

Dominic Tate: I’ve been asked by my colleague Pauline Ward, there are some noticeboards up in the forum for comments on tomorrow’s workshop – do come and ask me if you have any questions about that.

Lunch, which includes: and Demo:RSpace – Rory Macneil, Research Space

And we are back…

Open Access Workshop – Valerie McCutcheon, University of Glasgow

We are using the EPrints repository at Glasgow in this session, but this is just one example of open access. But you may have your own set up or perspective. And we’ll talk for about 40 minutes, then you can choose what you want to talk about in more details.

So, we are going to have a live demo of the journey an open access article goes through when it goes into our repository, Enlighten.

So, we would select the type of item, we upload that file, and then we add details about that paper – the title, abstract, etc. [to fill these in Valerie is taking audience suggestions – not all of the journal titles being suggested sound quite authentic!]. Then our next screen adds the source of funding for the publication. That’s so far, so traditional… But wouldn’t it be nice to get some of that data from the journals?

So, without further ado, I’m handing over to Steve Byford from the Jisc to talk about the Jisc Publications Router

Wouldn’t it be lovely if publications data could automatically go into your institutional repository in a timely and REF compliant sort of way. Now, to manage expectations a bit, this won’t fix all the possible problems but the Router will prompt at two key stages at the publications process. The Router gathes details of research articles from publishers etc. Then it directs articles to appropriate institutions, alerting them to the outputs and helping capture of the content into a repository or CRIS.

This has been funded as a project based at EDINA. That project reached it’s conclusion on Friday. The aim of that project was to demonstrate a viable prototype, which it did. It processed real publications information and that worked well. And now that that project has finished a successor system is currently being developed, to migrate existing participants in August to September 2015. Then we will be recruiting new participants, and then aiming for rapid expansion of content captured. And we have the intention to move to full service by August 2016. So, if you want to hear more about that, then choose that for your breakout session following this one.

Back to Valerie

Now, once I’ve uploaded my article, and the information, I might want to look at the access status of that article. And on that note I’d like to introduce Bill Hubbard to give you an update on SHERPA services.

I’m here with my colleague who actually manages the SHERPA services… If you select us for a breakout session we’ll be doing a double act! So, we have five minutes to tell you what’s new… Hopefully you have already heard of us, and use the site. If you do then I hope you find us useful. We support open access processes around publishers rights, open access statuses, etc. We are about making your job easier, and so part of what I want to find out from you today is how we can do that, what we can do to help make your life easier.

RoMEO, which we started over 10 years ago, the world was a lot simpler but the policies and rights picture has only become more complex. JULIET is a registry of policies on Open Access and that is a more straightforward process. OenDOAR is the world’s authoritative and quality assured directory of open access resources. We also run FACT and then also REF – advice to UK authors on compliance with HEFCE’s OA policy which will launch soon!

In RoMEO we have rights data on over 19,000 journals, In Juliet we have over 155 funders, and in OpenDOAR we have around 2937 IR listings.

Futures… I’m asking you not to tweet pictures… This is work in progress… We have a new interface and improved funcationality coming in OpenDOAR, FACT, for RoMEO we are working on Improved User Feedback, and improved international collaboration, and maybe even improved policies – we are working with publishers about the quality of expression. And REF is a new service of course. And what else? Well we are moving towards an improved range of shared services with Jisc… Come to our session to find out more…

And now back to Valerie

So we are going to look at some of these services just now… So I will look up our article on SHERPA/RoMEO… We have integrated more open access information in our repository – we have a whole screen for this now. And so on this screen we see the estimated cost – lets assume we’ve gone for the Gold option – and I can later update with actual costs to reflect any changes in currency/price etc. And we can select the status of the paper – including Green, Gold but also “No OA Option”, Pending etc. And we can add the Article Reference, Date of compliant deposit, funder acknowledgement, etc. Then we have an RCUK screen for completion… And finally a deposit screen.

And now over to Balviar Notay to talk about how Jisc are working on RCUK Compliance.

I will be talking about the RIOXX metadata application profile and guidelines for research papers, worked with RCUK and HEFCE. This was developed by Paul Walk (EDINA) and Sheridan Brown (Key Perspectives). You probably collect this data already, but this is about standardising this. RIOXX doesn’t cover all REF requirements but will cover many of the key areas. It has been a long time getting to this place but now at a place where if we do this, we can really see consistency of tracking research papers across systems in a really coherent way.

RCUK will be releasing some communication in the coming weeks and strongly recommending that all institutional repositories at research organisations in receipt of RCUK funding use RIOXX. We have developed Plug ins and patches to support implementation, plug ins for EPrints RIOXX, also DSpace and CRIS user groups have also started to engage with RIOXX but we need more engagement here.

Now, onto the REF plugin. This has been developed with HEFCE. This will build on the original plugin developed for the 2014 REF. Institutions wishing to use the REF plugin must also install the RIOXX plugin. And we are looking for expressions of interest to trial the EPrints plugin. We are also looking at developing a DSPace plugin. The development team are Tim Miles-Board and Sheridan Brown (Key Perspectives).

Back to Valerie

The next screen after RCUK, moves us to a page where you can capture OpenAIRE compliance…

Balviar again: CORE is a system to aggregate open access content, providing access to content through a set of services. There are about 74 million items from around 666 repositories, 10k journals, and 60 countries. In terms of services there is a service, API and Data dump. And you can access and data mine that service. They are also developing a dashboard for institutions to track data in CORE, how the contest has been harvested, etc. Looking at how we support funder compliance through that dashboard as well. And we are looking at that work through a project called Jisc Monitor…

Back to Valerie

We now have a choice of breakout sessions… We have a sheet

Comment: There has been a lot of discussion on UK CORE about peer to peer repositories, so that may be useful session to add to the list.

Paul Walk: This has been about repositories alerting each other about non-corresponding authors. So discussion around peer to peer repositories. There is a Google Doc on this that can be accessed, I’ll add that under a working title of COAX.

We are now hearing a quick recap of the breakout groups looking at some themes in more detail:

  • SHERPA Services, Azhar Hussain, Jisc
·        
  • Open Access Metadata, Valerie McCutcheon, William Nixon, University of Glasgow
·        
  • Publications Router, Steve Byford, Jisc
·        
  • Profiles for Reporting to Funders (e.g. RCUK, REF, EU), Balviar Notay, Jisc
·        
  • Aggregation Services, Balviar Notay, Jisc, Lucas Anastasiou, The Open University
  • What advice do we want about open access – Helen Blanchett, Jisc Customer Services
  • COAX / Co-operative Open Access eXchange (Peer to Peer Repository information sharing) – Paul Walk, EDINA

We’ll try and take the blog to some good sessions… 

In fact we have loitered in the main room where Valerie and her colleague William Nixon are talking about the drivers for the various add ons and customisations they have made to EPrints. They were doing a lot of this work via spreadsheets and this system represents a major saving of time and improves accuracy.

Valerie and William, in response to questions, are also going through some of the features of their item records in more detail – for instance there is the potential to add multiple transactions for one article, at that sometimes apply. Valerie: we would love a better solution for capturing some of the finance information on open access and welcome your comments and feedback, or if you have a solution to the issue…

We are now moving on to roam the other Repository Fringe breakouts taking pictures and tweeting. Best place to catch summaries/highlights from the breakouts is over on the hashtag #rfringe15. And if you are in or leading a session that you’d like to write up, just leave a comment here and we’d welcome a follow up blog post from you!

Tea and coffee Demo:
DMPonline
 – Jonathan Rans, Digital Curation Centre

Demo:DMAOnline – Hardy Schwamm, Lancaster University

The blog will be in the EPrints session in the main room at Repository Fringe 2015 but there are three sessions taking place for the next half hour, keep an eye on Twitter for more on all of these:

Parallel sessions:

  • EPrints update, Les Carr, University of Southampton
  • Dspace update, Sarah Molloy, QMUL
  • PURE update, Dominic Tate, University of Edinburgh, Appleton Tower, Lecture Theatre 3

EPrints update, Les Carr, University of Southampton

Adam Fields, the Community Manager for EPrints is presenting this session via live video link from South Korea! It is around midnight there so he’s also presenting from the future!

We are starting (after a few snapshots to set the scene) with a chart of the EPrints services team, to give a sense of how many are in the team.

EPrints Services exists to effectively serve the community for expertise and support, initially for the open access agenda, but it is becoming a lot more than that with what is happening with RDM in the sector. We are a not for profit service, and we exist to serve the OA community through Commercial Services (hosting, support, etc), we Lead on EPrints roadmap and releases, and provides funding for the development of the EPrints software.

What I’m going to talk about with the software side of things, the general trend of where we are going and the software side of things. I want to start with the past. The main feature released in EPrints 3.3.14 was a change to the EPrints Bazaar. It had been a bucket of packages but there had been little by way of tagging and properties. So we have added an accolades section, to tag stuff as having particular properties etc. You can filter based on these accolades… These are alphabetically listed, with EPrints Services Recommended appearing at the top to indicate those packages that we have tested and recommended – anyone can contribute to the bazaar.

In EPrints 4 / 3.4.0 has the key philosophy that the “Base” EPrints storing and handling of generic data and objects. And “Layers” to handle specific metadata schema import/export, rendering, search, etc. for specific domains. So this concept is that the database and everything else are two separate aspects. So there would be a layer for publications, but another layer for a data repository. The reason for doing this is to more sustainability develop against the increasingly complex requirements of the sector. So the 3.4 releases will be collections of metadata schemas, renderers, etc. to support this. So here are two (diagrams) illustrating the difference between EPrints for Publications, for Open Data, or for Dataset Showcases… These releases are sort of all the same in abstract… For a Dataset Showcase it’s about showcasing or visualising a dataset, so you need a bespoke metadata schema for your data sets. But in abstract the set up is the same – the metadata schema and the tools you need to describe your data set. And similarly we have an EPrints for Social Media Data, for importing tweets and that has similar abstract shape but with specific functionality to reflect the large sale of the dataset.

Q1) Is this one instance or multiple instance?

A1 – Les) You can have multiple repositories running on one EPrints repository. So the idea is of having a range of repositories but that could be on one installation. But you can connect up repositories and mash things up of course. But the idea is to trim down how many metadata fields and how much data you collect – that you only need to gather the relevant information for the relevant type of item.

A1 – Adam) This approach is about having a repository with key features… We categorise particular components, and these are examples of what that might look like. But anyone can customise EPrints for the combination of features and functionality that they need.

Adam: Now I want to talk a bit about my role. I am here to engage with the community, and to engage with the community, understanding what it is you need and want from EPrints. I hang out a lot in the various community spaces but I have also been working on supported developments – where individual organisations require something specific but need help to do those, for instance a thesis deposit tool. I am creating training videos for those supporting and administrating EPrints repository. We also have community members discussing improvements to the wiki, and I’m expecting lots of progress there from the last 6 to 8 months. I’m also always encouraging everyone to share documentation, or write documentation – everyone in the room here will have knowledge and expertise to share with other EPrints community members. Are there other things we can do to help? [no, not at the moment based on audience response].

One of the side effects of creating videos is that I get feedback and statistics on who is viewing those videos. So, for instance, I put up a video on installing an EPrints repository. It has been viewed several times a week since it went live. But the intriguing thing is the countries it has been viewed from – the top two countries have been Indonesia and India. It has been viewed around 40 times in the UK, but 120 countries in Indonesia, and 20 views in Iraq and Guatemala. That suggests a truly global community, but also means we need to think about how we could bring this community together.

Finally, a quick plug for the EPrints UK User Group meeting, which is on 11th September in Southampton. If you would like to present please do post to the EPrints UK User Group Google Group, or contact Adam directly (af05v@ecs.soton.ac.uk).

Q2) Is there more information on when these versions will be released?

A2 – Les) The release version 3.4 is coming soon and that will take us to the modular stage. But at the moment we are waiting for a developer to get us there. But in terms of getting to EPrints 4.0… Much of what we needed will have been delivered in 3.4. But the whole point of 3.4 is that it will contain the same underlying system but moves us to that modular layer idea.

And on that note we leave Adam to get some much needed rest in Korea, as we turn to our final session of the day… 

Panel Session: Building data networks: exploring trust and interoperability between authors, repositories and journals with:

  • Varsha Khodiyar (VK), Scientific Data(Chair);
  • Neil Chue Hong (NCH), Journal of Open Research Software;
  • Rachael Kotarski (RK), DataCite;
  • Reza Salek (RS), European Bioinformatics Institute;
  • Peter McQuilton (PM), Biosharing, 

Varsha is introducing this session for us: I work for Nature Publishing Group, one of the “evil publishers” and I work as a data curator at Scientific Data, part of the NPG group. An example of the sort of repository we work on is PhysioNet, a very specialist space in which data is shared.

We have a number of requirements for data journals and our criteria include that that they must be (1) recognised within their scientific community (2) long term preservation of datasets (3) implementation of relevant reporting standards (4) allow confidential review of submitted datasets (5) stable identifiers for submitted datasets (6) allow public access to data without unnecessary restrictions. And we have a questionnaire online to help assess repositories of data against these.

Neil Chue Hong – I am Director of the Software Sustainability Institute but for the purpose of this presentation I am also Editor in Chief of the Journal of Open Research Software. So what does a metapaper in this journal look like? Well it describes the software, the license, the potential for reuse, etc. And so a paper as a whole tends to include an introduction, how it came to be, screenshots, implementation, quality control, metadata, reuse, references. In some ways it is a proxy for

For the panel we are trying to do the same things in Software as research data in some ways, but have concerns about preserving code – Google Chrome is shuttering, how do we preserve that?

Rachael Kotarski: We are looking at assigning DOIs to data theses and software, among other things, so that DOIs remain stable even as data develops and changes over time. We are working with 52 organisations across the UK. I also have a role at the British Library around providing collections as data – so enabling researchers to use large scale collections of data. And the Alan Turing Institute is to be physically hosted at the British Library – we aren’t a partner in that project but we are hosting it.

Reza Salek: I am at the European Bioinformatics Institute, the largest freely available data on life sciences and it is available for reuse, and completely open open access terms. The repository at EMBL houses a couple of experiments, we were the first one to provide a repository for sharing data in this way. Historically this community was not as happy to share their data. We learned quite a lot – hope to learn a bit more.

Peter McQuilton: I work at Biosharing.org and we are a web-based, curated and searchable portal where biological standards and databases are registered, linked and discoverable. We have a database registry, a standards registry, and a policies registry. You can also make a collection of your own from a sub set of these collections of materials.

Our mission is to help people make the right choice – for researchers, developers and curators who lack support and guidance on which format or checklist standards to use. We are a small team with collaborators that include NPG, EMBOPress, BioMedCentral, Jisc and others.

Varsha: I have some questions for our panel, but do just jump in…

Q1 – Varsha) How did your community embed your repository?

A1 – Rachael) For us the persistent identifiers are key to enable reuse over time. Specifically for DataCite we have very few requirements: we have five fields that enable you to cite the DOI. There are more fields one would want to actually use the data, but because it is cross subject and format we can’t specify exactly what that should be. The other thing we require is a landing page – a target for the DOI, so that the object can be found and used. You could make a DOI link to, say, your Excel spreadsheet, but it is preferable to have a landing page with more information on how that object can be used. We also expect longevity, but we leave it to community to decide what longevity means for them.

Comment – Paul Walk) I absolutely agree about the necessity for that, but from a machine to machine process that isn’t as much of a priority…

A1 – Rachael) We recognise the importance of M2M interfaces, but we argue that shouldn’t be the default. So from that page you might then have the information there on how to access in an M2M way.

A1 – Varsha) Actually for privacy and sensitive data

Q2) For some of our researchers longevity might mean 50 or 100 or 200 years and longevity can really be about preservation in the long term…

A2 – Peter M) And that is about format of course, having the technology to read that data is its own challenge.

A2 – Paul Walk) I was at something at British Library talking about longevity in terms of generation, and that seemed like a useful approach.

Comment – Rachel Bruce) That came from the National Science Foundation work

Q2) It is also about funding around preservation.

A2 – Reza S) The scale of data and data sets is also changing really quickly. But even recent data sets are effectively archived. But it is so hard to know what the technology will be, what science will evolve into…Is there a solution or approach that works here?

Comment) An astronomical image 10 years back versus now allows you to see what has changed, an archeological site you probably dig up once… You can’t re create that data… But then we can’t keep everything!

A2 – Neil CH) I think sometimes the data can be recaptured, sometimes we only have one shot. But in many cases it is interesting that preserve the data. Is it for reuse and sharing? Or is it for checking and comparison? Those two approaches have very different timelines and requirements associated with them. It is not always the data that needs preserving.

Comment) Surely the whole point is that we cannot predict what others might want to do with our data…?

A2 – Varsha) Sure, the historical ships logs being used in climate change are a great example.

A2 – Neil CH) Interestingly those ships logs can be used as our means of expression haven’t been changing that much. But in software we are used to moving on… And that is much harder to go back to. If we forget how to read a PDF file, that would be a disaster… But we have a lot of examples. We have to be careful not to support niche standards if we are talking about long term preservation

Comment) Do we know of data that has been well preserved but the means to read them has been lost?

Comment) I have word perfect files on my computer!

Varsha) A fellow researcher had a similar issue around use of Floppy Discs, which nowhere in his university was there any way to read those…

Comment – Kevin Ashley) The issue is also about what is worth doing… You can read a floppy disc but you have to want it enough to be worth a high level of expense.

Varsha) Do you have researchers depositing data? What are the issues there about deposit and reuse?

Les) A lot of my research is about social media and existing data, and there the existance and readibility isn’t the issue, but making some sort of collection of it, away from the wild west of the web, but curating as a selection within the University, we create all sorts of ethical and legal problems. That is the issue when we are gathering data from lots of people are interacting. The deposit mechanism isn’t the issue, it’s convincing people that is the right thing to do, the processes around thinking that through, the data for access, for anonymity…

Neil CH) In my experience as a part time researcher and we have been creating data sets. Because I give talks on license and data policy for RCUK I feel I should be able to do all those right things with my own data… So for me, asking this room, why is it so difficult to do the right thing here? I put my data in a data repositories, share my colleagues names, also my asset register… But I can’t just give that my DOI so that all of those details get imported in. This is where trust breaks down. If I can’t be bothered to add all authors, and it’s just me, then I’ve broken the compliance. I use PURE and if I have a copy in RoMEO that can be one click, and I love that. But everything else should be easy too.

Comment) Rather than have a go at publishers, lets have a go at museums! I am a palaeontologist. I have a great 3D scan of bones I am researching… I’d love to share that with the world but if I did I would be in trouble as the museum believes all images created there are their property. It is a political issue though. If collections management and commercial arms of the museum can be talked to, you are fine… Unless there is deemed to be a potential commercial application/use of that scan.

Rachael K) There is one library allowing photographs of their material, if in appropriate copyright state, and those are shared on Flickr. But the people taking pictures have to understand what they can take pictures of… Digitisation is expensive. Phone cameras in a reading room isn’t great, but

Comment) My feeling is that the copyright on a 110 million year old bone should have expired!

Neil CH) We are looking to work on a project at the Natural History Museum where some of the same issues arise – about who owns copyright of derivative products in that way, for educational use. It may be that educational use may be a way to do that in future, but still too early days yet.

Comment) In Germany they have the view that if they make the scan of their own materials, they hold data, but others scanning it can do as they wish.

Neil CH) I think in Australia they have also had some quite forward thinking examples there.

Varsha) We have drifted from repositories a little… But in our last few minutes what are the best ways to support our communities around repositories? How can we say that a repository is trustworthy?

Comment) I think for me that issue of trust being part of how easy it is to deposit, is important. The issue I find is that it is also hard to find and discover data…

Peter M) That is changing though… In biology that is improving, asit is known that it is important that data is discoverable.

Comment) Perhaps rather than prescribed repositories or journals, there is a peer review process. When you say that it is peer reviewed, does that include the data?

Varsha) Yes, that includes the data and that it is shared in the right repository. We make sure that we can access the data, download files, etc. before we will publish. We only publish if that is appropriate.

Neil CH) We do similar. We have a list of repositories and documentation that helps ensure that data is accessible. And of having identifiers, and some sort of plan for managing that software. I actually kicked off a debate, inadvertantly, that this is an expensive checking process and it is at the wrong end of the cycle… So there is an arguement that you should pre-register before you generate data, and that that should be signed off at the end. An interesting idea for having peer review atthe outset, not after generation of data.

Reza S) These are good questions. It can take a long time to go through that process. Repositories are usualy at the end of the process, and there are issues there… It takes time… But that is culture change. For every year working on data you should expect maybe 3 days of curation work before depositing, in my experience.

Varsha) And on that note, thank you to all of our panel and for all your excellent questions.

Dominic Tate is announcing our drinks reception, remember whilst you are out there to vote on your favourite poster!

And, with that, the blog is done for the day. Remember to pass on comments, corrections, etc. and we will be back tomorrow for Day Two of Repository Fringe!Â