Repository Fringe 2017 (#rfringe17) – Day One Liveblog

Welcome – Janet Roberts, Director of EDINA

My colleagues were explaining to me that this event came from an idea from Les Carr that should be not just one repository conference, but also a fringe – and here were are at the 10th Repository Fringe on the cusp of the Edinburgh Fringe.

So, this week we celebrate ten years of repository fringe, the progress we have made over the last 10 years to share content beyond borders. It is a space for debating future trends and challenges.

At EDINA we established the OpenDepot to provide a space for those without a repository… That has now migrated to Zenodo… and the challenges are changing, around the size of data, how we store and access that data, and what those next generation repositories will look like.

Over the next few days we have some excellent speakers as well as some fringe events, including the Wiki Datathon – so I hope you have all brought your laptops!

Thank you to our organising team from EDINA, DCC and the University of Edinburgh. Thank you also to our sponsors: Atmire; FigShare; Arkivum; ePrints; and Jisc!

Opening Keynote – Kathleen Shearer, Executive Director COARRaising our game – repositioning repositories as the foundation for sustainable scholarly communication

Theo Andrew: I am delighted to introduce Kathleen, who has been working in digital libraries and repositories for years. COAR is an international organisation of repositories, and I’m pleased to say that Edinburgh has been a member for some time.

Kathleen: Thank you so much for inviting me. It’s actually my first time speaking in the UK and it’s a little bit intimidating as I know that you folks are really ahead here.

COAR is now about 120 members. Our activities fall into four areas: presenting an international voice so that repositories are part of a global community with diverse perspective. We are being more active in training for repository managers, something which is especially important in developing countries. And the other area is value added services, which is where today’s talk on the repository of the future comes in. The vision here is about

But first, a rant… The international publishing system is broken! And it is broken for a number of reasons – there is access, and the cost of access. The cost of scholarly journals goes up far beyond the rate of inflation. That touches us in Canada – where I am based, in Germany, in the UK… But much more so in the developing world. And then we have the “Big Deal”. A study of University of Montreal libraries by Stephanie Gagnon found that of 50k subscribed-to journals, really there were only 5,893 unique essential titles. But often those deals aren’t opted out of as the key core journals separately cost the same as that big deal.

We also have a participation problem… Juan Pablo Alperin’s map of authors published in Web of Science shows a huge bias towards the US and the UK, a seriously reduced participation in Africa and parts of Asia. Why does that happen? The journals are operated from the global North, and don’t represent the kinds of research problems in the developing world. And one Nobel Prize winner notes that the pressure to publish in “luxury” journals encourages researchers to cut corners and pursue trendy fields rather than areas where there are those research gaps. That was the cake with Zika virus – you could hardly get research published on that until a major outbreak brought it to the attention of the dominant publishing cultures, then there was huge appetite to publish there.

Timothy Gowers talks about “perverse incentives” which are supporting the really high costs of journals. It’s not just a problem for researchers and how they publish, its also a problem of how we incentivise researchers to publish. So, this is my goats in trees slide… It doesn’t feel like goats should be in trees… Moroccan tree goats are taught to climb the trees when there isn’t food on the ground… I think of the researchers able to publish in these high end journals as being the lucky goats in the tree here…

In order to incentivise participation in high end journals we have created a lucrative publishing industry. I’m sure you’ve seen the recent Guardian article: “is the staggeringly profitable business of science publishing bad for science”. Yes. For those reasons of access and participation. We see very few publishers publishing the majority of titles, and there is a real

My colleague Leslie Chan, funded by the International Development Council, talked about openness not just being about gaining access to knowledge but also about having access to participate in the system.

On the positive side… Open access has arrived. A recent study (Piwowar et al 2017) found that about 45% of articles published in 2015 were open access. And that is increasing every year. And you have probably seen the May 27th 2016 statement from the EU that all research they fund must be open by 2020.

It hasn’t been a totally smooth transition… APCs (Article Processing Charges) are very much in the mix and part of the picture… Some publishers are trying to slow the growth of access, but they can see that it’s coming and want to retain their profit margins. And they want to move to all APCs. There is discussion here… There is a project called OA2020 which wants to flip from subscription based to open access publishing. It has some traction but there are concerns here, particularly about sustainability of scholarly comms in the long term. And we are not syre that publishers will go for it… Particularly one of them (Elsevier) which exited talks in The Netherlands and Germany. In Germany the tap was turned off for a while for Elsevier – and there wasn’t a big uproar from the community! But the tap has been turned back on…

So, what will the future be around open access? If you look across APCs and the average value… If you think about the relative value of journals, especially the value of high end journals… I don’t think we’ll see lesser increases in APCs in the future.

At COAR we have a different vision…

Lorcan Dempsey talked about the idea of the “inside out” library. Similarly a new MIT Future of Libraries Report – published by a broad stakeholder group that had spent 6 months working on a vision – came up with the need for libraries to be open, trusted, durable, interdisciplinary, interoperable content platform. So, like the inside out library, it’s about collecting the output of your organisation and making is available to the world…

So, for me, if we collect articles… We just perpetuate the system and we are not in a position to change the system. So how do we move forward at the same time as being kind of reliant on that system.

Eloy Rodrigues, at Open Repository earlier this year, asked whether repositories are a success story. They are ubiquitous, they are adopted and networked… But then they are also using old, pre-web technologies; mostly passive recipients; limited interoperability making value added systems hard; and not really embedded in researcher workflows. These are the kinds of challenges we need to address in next generation of repositories…

So we started a working group on Next Generation Repositories to define new technologies for repositories. We want to position repositories as the foundation for a distributed, globally networked infrastructure for scholarly communication. And on top of which we want to be able to add layers of value added services. Our principles include distributed control to guard againts failure, change, etc. We want this to be inclusive, and reflecting the needs of the research communities in the global south. We want intelligent openness – we know not everything can be open.

We also have some design assumptions, with a focus on the resources themselves, not just associated metadata. We want to be pragmatic, and make use of technologies we have…

To date we have identified major use cases and user stories, and shared those. We determined functionality and behaviours; and a conceptual models. At the moment we are defining specific technologies and architectures. We will publish recommendations in September 2017. We then need to promote it widely and encourages adoption and implementation, as well as the upgrade of repositories around the world (a big challenge).

You can view our user stories online. But I’d like to talk about a few of these… We would like to enable peer review on top of repositories… To slowly incrementally replace what researchers do. That’s not building peer review in repositories, but as a layer on top. We also want some social functionalities like recommendations. And we’d like standard usage metrics across the world to understand what is used and hw.. We are looking to the UK and the IRUS project there as that has already been looked at here. We also need to address discovery… Right now we use metadata, rather than indexing full text content… So contat can be hard to get to unless the metadata is obvious. We also need data syncing in hubs, indexing systems, etc. reflect changes in the repositories. And we also want to address preservation – that’s a really important role that we should do well, and it’s something that can set us apart from the publishers – preservation is not part of their business model.

So, this is a slide from Peter Knoth at CORE – a repository aggregator – who talks about expanding the repository, and the potential to layer all of these additional services on top.

To make this happen we need to improve the functionality of repositories: to be of and not just on the web. But we also need to step out of the article paradigm… The whole system is set up around the article, but we need to think beyond that, deposit other content, and ensure those research outputs are appropriately recognised.

So, we have our (draft) conceptual model… It isn’t around siloed individual repositories, but around a whole network. And some of our draft recommendations for technologies for next generation repositories. These are a really early view… These are things like: ResourceSync; Signposting; Messaging protocols; Message queue; IIIF presentation API; AOAuth; Webmention; and more…

Critical to the widespread adoption of this process is the widespread adoption of the behaviours and functionalities for next generation repositories. It won’t be a success if only one software or approach takes these on. So I’d like to quote a Scottish industrialist, Andrew Carnegie: “strength is derived from unity…. “. So we need to coalesce around a common vision.

Ad it isn’t just about a common vision, science is global and networked and our approach has to reflect and connect with that. Repositories need to balance a dual mission to (1) showcase and provide access to institutional research and (2) be nodes in a global research network.

To support better networking in repositories and in Venice, in May we signed an International Accord for Repository Networks, with networks from Australasia, Canada, China, Europe, Japan, Latin America, South Africa, United States. For us there is a question about how best we work with the UK internationally. We work with with OpenAIRE but maybe we need something else as well. The networks across those areas are advancing at different paces, but have committed to move forward.

There are three areas of that international accord:

  1. Strategic coordination – to have a shared vision and a stronger voice for the repository community
  2. Interoperability and common “behaviours” for repositories – supporting the development of value added services
  3. Data exchange and cross regional harvesting – to ensure redundancy and preservation. This has started but there is a lot to do here still, especially as we move to harvesting full text, not just metadata. And there is interest in redundancy for preservation reasons.

So we need to develop the case for a distributed community-managed infrastructure, that will better support the needs of diverse regions, disciplines and languages. Redundancy will safeguard against failure. With less risk of commercial buy out. Places the library at the centre… But… I appreciate it is much harder to sell a distributed system… We need branding that really attracts researchers to take part and engage in †he system…

And one of the things we want to avoid… Yesterday it was announced that Elsevier has acquired bepress. bepress is mainly used in the US and there will be much thinking about the implications for their repositories. So not only should institutional repositories be distributed, but they should be different platforms, and different open source platforms…

Concluding thoughts here… Repositories are a technology and technologies change. What its really promoting is a vision in which institutions, universities and their libraries are the foundational nodes in a global scholarly communication system. This is really the future of libraries in the scholarly communication community. This is what libraries should be doing. This is what our values represent.

And this is urgent. We see Elsevier consolidating, buying platforms, trying to control publishers and the research cycle, we really have to move forward and move quickly. I hope the UK will remain engaged with this. And i look forward to your participation in our ongoing dialogue.


Q1 – Les Carr) I was very struck by that comment about the need to balance the local and the global I think that’s a really major opportunity for my university. Everyone is obsessed about their place in the global university ranking, their representation as a global university. This could be a real opportunity, led by our libraries and knowledge assets, and I’m really excited about that!

A1) I think the challenge around that is trying to support common values… If you are competing with other institutions it’s not always an incentive to adopt systems with common technologies, measures, approaches. So there needs to be a benefit for institutions in joining this network. It is a huge opportunity, but we have to show the value of joining that network It’s maybe easier in the UK, Europe, Canada. In the US they don’t see that value as much… They are not used to collaborating in this way and have been one of the hardest regions to bring onboard.

Q2 – Adam ?) Correct me if I’m wrong… You are talking about a Commons… In some way the benefits are watered down as part of the Commons, so how do we pay for this system, how do we make this benefit the organisation?

A2) That’s where I see that challenge of the benefit. There has to be value… That’s where value added systems come in… So a recommender system is much more valuable if it crosses all of the repositories… That is a benefit and allows you to access more material and for more people to access yours. I know CORE at the OU are already building a recommender system in their own aggregated platform.

Q3 – Anna?) At the sharp end this is not a problem for libraries, but a problem for academia… If we are seen as librarians doing things to or for academics that won’t have as much traction… How do we engage academia…

A3) There are researchers keen to move to open access… But it’s hard to represent what we want to do at a global level when many researchers are focused on that one journal or area and making that open access… I’m not sure what the elevator pitch should be here. I think if we can get to that usage statistics data there, that will help… If we can build an alternative system that even research administrators can use in place of impact factor or Web of Science, that might move us forward in terms of showing this approach has value. Administrators are still stuck in having to evaluate the quality of research based on journals and impact factors. This stuff won’t happen in a day. But having standardised measures across repositories will help.

So, one thing we’ve done in Canada with the U15 (top 15 universities in Canada)… They are at the top of what they can do in terms of the cost of scholarly journals so they asked us to produce a paper for them on how to address that… I think that issue of cost could be an opportunity…

Q4) I’m an academic and we are looking for services that make our life better… Here at Edinburgh we can see that libraries are the naturally the consistent point of connection with repository. Does that translate globally?

A4) It varies globally. Libraries are fairly well recognised in Western countries. In developing world there are funding and capacity challenges that makes that harder… There is also a question of whether we need repositories for every library.. Can we do more consortia repositories or similar.

Q5 – Chris) You talked about repository supporting all kinds of materials… And how they can “wag the dog” of the article

A5) I think with research data there is so much momentum there around making data available… But I don’t know how well we are set up with research data management to ensure data can be found and reused. We need to improve the technology in repositories. And we need more resources too…

Q6) Can we do more to encourage academics, researchers, students to reuse data and content as part of their practice?

A6) I think the more content we have at Commons level, the more it can be reused. We have to improve discoverability, and improve the functionality to help that content to be reused… There is huge use of machine reuse of content – I was speaking with Peter Knoth about this – but that isn’t easy to do with repositories…

Theo) It would be really useful to see Open Access buttons more visible, using repositories for document delivery, etc.

Chris Banks, Director of Library Services, Imperial CollegeFocusing upstream: supporting scholarly communication by academics

10×10 presentations (Chair: Ianthe Sutherland, University Library & Collections)

  1. v2.juliet – A Model For SHERPA’s Mid-Term Infrastructure. Adam Field, Jisc
  1. CORE Recommender: a plug in suggesting open access content. Nancy Pontika, CORE
  1. Enhancing Two workflows with RSpace & Figshare: Active Data to Archival Data and Research to Publication. Rory Macneil, Research Space and Megan Hardeman of Figshare
  1. Thesis digitisation project. Gavin Willshaw, University of Edinburgh
  1. Weather Cloudy & Cool Harvest Begun’: St Andrews output usage beyond the repository. Michael Bryce, University of St Andrews

Impact and the REF panel session

Brief for this session: How are institutions preparing for the next round of the Research Excellence Framework #REF2021, and how do repositories feature in this? What lessons can we learn from the last REF and what changes to impact might we expect in 2021? How can we improve our repositories and associated services to support researchers to achieve and measure impact with a view to the REF? In anticipation of the forthcoming announcement by HEFCE later this year of the details of how #REF2021 will work, and how impact will be measured, our panel will discuss all these issues and answer questions from RepoFringers.

Pauline Jones, REF Manager and Head of Strategic Performance and Research Policy, University of Edinburgh

Anne-Sofie Laegran, Knowledge Exchange Manager, College of Arts, Humanities and Social Sciences, University of Edinburgh

Catriona Firth, REF Deputy Manager, HEFCE

Chair: Keith McDonald, Assistant Director, Research and Innovation Directorate, Scottish Funding Council

10×10 presentations

  1. National Open Data and Open Science Policies in Europe. Martin Donnelly, DCC
  1. IIIF: you can keep your head while all around are losing theirs! Scott Renton, University of Edinburgh
  1. Reference Rot in theses: a HiberActive pilot. Nicola Osborne, EDINA
  1. Lifting the lid on global research impact: implementation and analysis of a Request a Copy service. Dimity Flanagan, London School of Economics and Political Science
  1. What RADAR did next: developing a peer review process for research plans. Nicola Siminson, Glasgow School of Art
  1. Edinburgh DataVault: Local implementation of Jisc DataVault: the value of testing. Pauline Ward, EDINA
  1. Data Management & Preservation using PURE and Archivematica at Strathclyde. Alan Morrisson, University of Strathclyde
  1. Open Access… From Oblivion… To the Spotlight? Dawn Hibbert, University of Northampton
  1. Automated metadata collection from the researcher CV Lattes Platform to aid IR ingest. Chloe Furnival, Universidade Federal de São Carlos
  1. The Changing Face of Goldsmiths Research Online. Jeremiah Spillane, Goldsmiths, University of London

Chair: Ianthe Sutherland, University Library & Collections


A Mini Adventure to Repository Fringe 2016

After 6 years of being Repository Fringe‘s resident live blogger this was the first year that I haven’t been part of the organisation or amplification in any official capacity. From what I’ve seen though my colleagues from EDINA, University of Edinburgh Library, and the DCC did an awesome job of putting together a really interesting programme for the 2016 edition of RepoFringe, attracting a big and diverse audience.

Whilst I was mainly participating through reading the tweets to #rfringe16, I couldn’t quite keep away!

Pauline Ward at Repository Fringe 2016

Pauline Ward at Repository Fringe 2016

This year’s chair, Pauline Ward, asked me to be part of the Unleashing Data session on Tuesday 2nd August. The session was a “World Cafe” format and I was asked to help facilitate discussion around the question: “How can the respository community use crowd-sourcing (e.g. Citizen Science) to engage the public in reuse of data?” – so I was along wearing my COBWEB: Citizen Observatory Web and social media hats. My session also benefited from what I gather was an excellent talk on “The Social Life of Data” earlier in the event from the Erinma Ochu (who, although I missed her this time, is always involved in really interesting projects including several fab citizen science initiatives).


I won’t attempt to reflect on all of the discussions during the Unleashing Data Session here – I know that Pauline will be reporting back from the session to Repository Fringe 2016 participants shortly – but I thought I would share a few pictures of our notes, capturing some of the ideas and discussions that came out of the various groups visiting this question throughout the session. Click the image to view a larger version. Questions or clarifications are welcome – just leave me a comment here on the blog.

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016

Notes from the Unleashing Data session at Repository Fringe 2016


If you are interested in finding out more about crowd sourcing and citizen science in general then there are a couple of resources that made be helpful (plus many more resources and articles if you leave a comment/drop me an email with your particular interests).

This June I chaired the “Crowd-Sourcing Data and Citizen Science” breakout session for the Flooding and Coastal Erosion Risk Management Network (FCERM.NET) Annual Assembly in Newcastle. The short slide set created for that workshop gives a brief overview of some of the challenges and considerations in setting up and running citizen science projects:

Last October the CSCS Network interviewed me on developing and running Citizen Science projects for their website – the interview brings together some general thoughts as well as specific comment on the COBWEB experience:

After the Unleashing Data session I was also able to stick around for Stuart Lewis’ closing keynote. Stuart has been working at Edinburgh University since 2012 but is moving on soon to the National Library of Scotland so this was a lovely chance to get some of his reflections and predictions as he prepares to make that move. And to include quite a lot of fun references to The Secret Diary of Adrian Mole aged 13 ¾. (Before his talk Stuart had also snuck some boxes of sweets under some of the tables around the room – a popularity tactic I’m noting for future talks!)

So, my liveblog notes from Stuart’s talk (slightly tidied up but corrections are, of course, welcomed) follow. Because old Repofringe live blogging habits are hard to kick!

The Secret Diary of a Repository aged 13 ¾ – Stuart Lewis

I’m going to talk about our bread and butter – the institutional repository… Now my inspiration is Adrian Mole… Why? Well we have a bunch of teenage repositories… EPrints is 15 1/2; Fedora is 13 ½; DSpace is 13 ¾.

Now Adrian Mole is a teenager – you can read about him on Wikipedia [note to fellow Wikipedia contributors: this, and most of the other Adrian Mole-related pages could use some major work!]. You see him quoted in two conferences to my amazement! And there are also some Scotland and Edinburgh entries in there too… Brought a haggis… Goes to Glasgow at 11am… and says he encounters 27 drunks in one hour…

Stuart Lewis at Repository Fringe 2016

Stuart Lewis illustrates the teenage birth dates of three of the major repository softwares as captured in (perhaps less well-aged) pop hits of the day.

So, I have four points to make about how repositories are like/unlike teenagers…

The thing about teenagers… People complain about them… They can be expensive, they can be awkward, they aren’t always self aware… Eventually though they usually become useful members of society. So, is that true of repositories? Well ERA, one of our repositories has gotten bigger and bigger – over 18k items… and over 10k paper thesis currently being digitized…

Now teenagers also start to look around… Pandora!

I’m going to call Pandora the CRIS… And we’ve all kind of overlooked their commercial background because we are in love with them…!

Stuart Lewis at Repository Fringe 2016

Stuart Lewis captures the eternal optimism – both around Mole’s love of Pandora, and our love of the (commercial) CRIS.

Now, we have PURE at Edinburgh which also powers Edinburgh Research Explorer. When you looked at repositories a few years ago, it was a bit like Freshers Week… The three questions were: where are you from; what repository platform do you use; how many items do you have? But that’s moved on. We now have around 80% of our outputs in the repository within the REF compliance (3 months of Acceptance)… And that’s a huge change – volumes of materials are open access very promptly.


1. We need to celebrate our success

But are our successes as positive as they could be?

Repositories continue to develop. We’ve heard good things about new developments. But how do repositories demonstrate value – and how do we compare to other areas of librarianship.

Other library domains use different numbers. We can use these to give comparative figures. How do we compare to publishers for cost? Whats our CPU (Cost Per Use)? And what is a good CPU? £10, £5, £0.46… But how easy is it to calculate – are repositories expensive? That’s a “to do” – to take the cost to run/IRUS cost. I would expect it to be lower than publishers, but I’d like to do that calculation.

The other side of this is to become more self-aware… Can we gather new numbers? We only tend to look at deposit and use from our own repositories… What about our own local consumption of OA (the reverse)?

Working within new e-resource infrastructure – – lets us see where open versions are available. And we can integrate with OpenURL resolvers to see how much of our usage can be fulfilled.

2. Our repositories must continue to grow up

Do we have double standards?

Hopefully you are all aware of the UK Text and Data Mining Copyright Exception that came out from 1st June 2014. We have massive massive access to electronic resources as universities, and can text and data mine those.

Some do a good job here – Gale Cengage Historic British Newspapers: additional payment to buy all the data (images + XML text) on hard drives for local use. Working with local informatics LTG staff to (geo)parse the data.

Some are not so good – basic APIs allow only simple searchers… But not complex queries (e.g. could use a search term, but not e.g. sentiment).

And many publishers do nothing at all….

So we are working with publishers to encourage and highlight the potential.

But what about our content? Our repositories are open, with extracted full-text, data can be harvested… Sufficient but is it ideal? Why not do bulk download from one click… You can – for example – download all of Wikipedia (if you want to).  We should be able to do that with our repositories.

3. We need to get our house in order for Text and Data Mining

When will we be finished though? Depends on what we do with open access? What should we be doing with OA? Where do we want to get to? Right now we have mandates so it’s easy – green and gold. With gold there is PURE or Hybrid… Mixed views on Hybrid. Can also publish locally for free. Then for gree there is local or disciplinary repositories… For Gold – Pure, Hybrid, Local we pay APCs (some local option is free)… In Hybrid we can do offsetting, discounted subscriptions, voucher schemes too. And for green we have UK Scholarly Communications License (Harvard)…

But which of these forms of OA are best?! Is choice always a great thing?

We still have outstanding OA issues. Is a mixed-modal approach OK, or should we choose a single route? Which one? What role will repositories play? What is the ultimate aim of Open Access? Is it “just� access?

How and where do we have these conversations? We need academics, repository managers, librarians, publishers to all come together to do this.

4. Do we now what a grown-up repository look like? What part does it play?

Please remember to celebrate your repositories – we are in a fantastic place, making a real difference. But they need to continue to grow up. There is work to do with text and data mining… And we have more to do… To be a grown up, to be in the right sort of environment, etc.



Q1) I can remember giving my first talk on repositories in 2010… When it comes to OA I think we need to think about what is cost effective, what is sustainable, why are we doing it and what’s the cost?

A1) I think in some ways that’s about what repositories are versus publishers… Right now we are essentially replicating them… And maybe that isn’t the way to approach this.

And with that Repository Fringe 2016 drew to a close. I am sure others will have already blogged their experiences and comments on the event. Do have a look at the Repository Fringe website and at #rfringe16 for more comments, shared blog posts, and resources from the sessions.