Repository Fringe 2014 LiveBlog – Day Two

We are back for day two of Repository Fringe 2014!

We start the day with two parallel sessions. In Appleton Tower (M2ABC) we have Muriel Mewissen of EDINA, speaking about the Jisc Publications Router – Delivering Open Access Content to Institutions. In Informatics – and here on the blog – we have:

Unwrapping Digital Preservation, Steph Taylor, ULCC, University of London, Informatics Forum, G.07

Firstly a bit about why I am here, and why I’m talking about preservation. At ULCC I work with my colleague Ed providing training on digital preservation, but before that I was working with repositories.

The first time I heard of repositories as an archive was in the early days of RSP. A lot of repositories used the word “archive” or similar terms, but even then I was concerned about the use of that work as an archivist. Repositories can be a scary word, but these were not spaces doing preservation. As time moved on there was a perception that depositing material would, inevitably, mean it was being preserved. As funders really started backing deposit of papers the job of actually planning and conducting preservation increasingly fall on repository managers. I had quite a lot of phone calls as those changes came in and some of those issues of what preservation really mean are what I want to talk about.

A repository isn’t a digital archive. It may be, but it isn’t neccassarily that. But why do we have this idea of those terms being synonomous somehow? Well the definition of a repository is about depositing things especially “for storage or safe keeping” BUT there is no mention of the long term. That is the difference from an archive, from the practice of preservation. There are some things you might not have. You might not have a preservation plan – you need to define the long term and plan for what is needed to ensure the item is preserved. You need some selected preservation strategies – even with paper archives you need to check materials are safe, conduct some materials. And we need to have preservation strategies for digital materials – we need to ensure that formats are still readable, that files are intact, there are lots of interventions that may be required. At a previous library I worked at, when theses were deposited, we committed to preserve them. We had some theses already sitting with unreadable large form floppy discs – they were obsolete very quickly and we didn’t have the strategy to ensure that was preserved. So strategies could be emulation, switching format, etc. But you need strategies.

You may also not have an archival-quality digital object. It may be about the highest quality file – a much bigger file. But that’s something repositories do not really do. A version designed for the long term. In an archive you might preserve a TIFF file, but also produce a JPEG to enable distribution and access by others. And that can provide real flexibility. University of York have been able to preserve TIFFs for a gallery and, with their permission, make those available to view online.

For preservation you need preservation metadata – every time you access, change, view the item you need to record it. And you need technical metadata constructed with preservation in mind – everything you can gather in order to enable changes in format, ways to open the file, to understand software and hardware shifts.

And I want repository managers to think about access and rights management for the long term. Typically we only think about embargoes as the longest terms. But if we are preserving digital content we need to think 10, 20, more years into the future, to future people in our role. And we always need the permission of the rights owner to preserve and to make copies for preservation of an item. The law has shifted a little but you still need written permission. And copies for preservation are very different to rights to share publicly. But all of these are different to what we are used to doing with our repositories.

You need to do regular, planned checks to see if content is still accessible and can “play” ok. Most digital preservation centres check regularly – every 12 months say – but also at any trigger points like changes in hardware, changes or discontinuations of software. Those checks are essential. Formats that are easy to use and deposit isn’t enough for digital preservation. And the idea of content being accessible over time… we need procedures in place of handle problems of content becoming in-accessible over time. And we have to know what embargoes around data protection, sections that require removal for privacy or security reasons etc. These are very different to journals, publishers, and copyright issues.

There is a lot here. But…

A repository CAN expand to become a digital archive as well. Or maybe you have another system that you pass items onto. But the the system can be expanded, can be reshaped, can be turned into a digital archive. And if you do want to do that start by thinking about how long you want to keep it, for how long, and why. There is more information at DPC, Open Planets, DCC – all of whom provide huge amounts of information and support around digital preservation.

The other big difference between a repository and a digital archive is around selection. You will have some sort of selection process for your repository – may be about who someone is, what they are doing, which project – may be many triggers. But for an archive do you need to keep everything? What is your further selection process? So, for example, if you have many iterations of the same article over time, do you want to keep all of them or just the published (or another) version? Digital preservation can be expensive in terms of kit, in terms of people… it is about the long term and that means long term. So you may want another selection process for preservation. And if you do want to create a digital preservation policy, or use your repository as a preservation archive, I’d recommend talking to colleagues already working in preservation as they will have policies and experience to draw upon.

And if you do want to get into this do look at training courses – we have an interactive one in London (I teach on it so may be biased!), but also you’ll find training courses and information from those organisations like DCC or DPC.

Discussion/Q&A

Comment 1: I am OpenDOAR – I sit on the computer looking at everyone’s repository to see if it’s good enough to go on OpenDOAR and looking for policies. Many have metadata etc. policies, but very few have preservation policy. It is crucial. If you don’t have one already DO create one, do make it available, and contact us/add it to your OpenDOAR record.

Steph: Do people have policies in their repositories on preservation?

Comment 2: I’m in the lucky position of being repository manager and university archivist. My initial intention was to set up the repository as a digital archive. For the repository we have a check list of things to agree too… But in terms of the policy… I have it but how do I get it out there, into the university. Things like every time you have a new image on your computer, a new piece of equipment… letting me know so I understand what is needed.

Steph: You are not the first person I’ve seen with that joint role. But yes, it’s hard to get people to tell you this stuff. If you have senior buy in it is easier to get policies out, to mandate that information. Does anyone have a repository they’d like to use as an archive but are not doing that yet?

Quite a lot of hands up

Steph: We are increasingly seeing repository managers on our courses. And if your organisation isn’t yet engaging in digital preservation repositories are a good place to start – there is a body of work there, it is a great place to get started. I also wanted to ask whether you accept file formats knowing that you can make archival copies and maintain them? My own experience was that getting full text in was hard enough… let along worrying about how it was being sent.

Comment: For the repository that we had in the past you could sent anything in. But for the RDM policy there is a defined set of file formats that are encouraged, another set that are acceptable, and another set of formats that are not acceptable.

Steph: That’s really good. It’s tempting to be as flexible as possible, and to take anything in. I’d recommend looking at file formats and seeing what’s good in the area you work in, and then making some choices. A very prominent organisation working in preservation didn’t do this themselves… they took a large amount of data in formats that are hard to take copies from, to maintain over time, they are a bit stuck with it. It’s well worth thinking about what you do and don’t take. Discuss with users, make it workable, write it down, and make it policy. Send back formats that are not preservable – ask them to convert or change the format yourself with their agreement.

Comment: The Library of Congress released file formats in ranked order of which were most preservable. A really good resource to look at.

Comment: And the way to sell to academics is that if you submit in a more common file format, their work will also be readable and accessible to many more people and that’s really important to them.

Steph: Absolutely. More people accessing their work, and for longer, is a huge motivator. But also do explain why this is a preservation challenge, why a format isn’t workable. Don’t take stuff you can’t manage. Talking of which, the final thing I wanted you to think about is whether you know what you are going to keep for the long term? Whatever the long term means to you – some funders specify how long they want materials kept for, some are vague. And do you want to keep everything, forever? I started with a quite domestic idea, to clutch everything tight… but you have to be much more selective to make preservation work. Do you have a selection policy for preservation yet? Or policies on how long things stay in your repository.

Comment: I don’t have policies for that. But much of what we store in a repository is material that is certainly conceptually, if not legally, the property of the academics. I want to engage with them to select materials. They may want to keep everything but there needs to be a mature conversation to that.

Steph: Everyone’s kneejerk reaction is everything for ever, you want your hard work saved… but it may depend on the work they do. Research data may not be useful beyond a certain point for instance. It’s important to engage with users, and to get the institutional view on what should be saved.

Comment: We have an unofficial policy that, as long as people can support it, we will preserve what’s in the repository. But for RDM we have at least 10 years from deposit and and at least 10 years from the last point of access. That may be an SRC thing but it’s certainly a Lancaster thing. But it’s hard to sell to different people. Researchers love it. The information services and infrastructure people see it as a huge cost, they aren’t happy about it. And the library staff don’t have time to maintain everything there. So we need a selection policy… but a very big difference. Senior decision makers think that we have a repository, keep everything for ever.

Comment, Kevin Ashley: For those struggling with selection policy… the DPC, working with colleagues at the Australian National Data Centre, have released guidance on that and we are working on policies. We know that our guidelines have been used to write selection policies across the world. But ironically, as we understand this stuff more and more, we will need to throw away more and more stuff. And we will increasingly make wrong decisions – there is so much data – and you just have to live with it. Nothing is that bad.

Steph: Absolutely. Material will become inaccessible over time sometimes but making your best efforts to preserve at least means we don’t automatically loose these items.

Comment: To what extent should every repository expand to be digital preservation spaces? Or to what extent should digital preservation be collaborative, something service based.

Steph: On a practical and technical side, yes, everyone could use their repository as a preservation space. But there may be reasons not to do that. Organisations pooling resources for preservation can be much more sustainable than individual approaches. It’s a good idea. It’s the kind of thing that lends itself to collaboration. May be lots of talk with users, with other organisations… but it makes sense financially, in terms of number of copies required. Collaboration can be hard, it can be challenging to get the right people at every level engaged… but there are a lot of benefits in it. That would be a good way forward.

Comment: As a follow up… collaborations can come about in different ways… sometimes through funding stimulus, sometimes by locality, sometimes by chance… something that would be useful would be to find ways to initiate collaborations, to get things off the ground. Many think collaboration is a good idea, but they don’t always know how to go about it.

Steph: There is a great set of communities around digital preservation. We don’t have a central body in that way although there is a great deal of work from DCC, many members at DPC. DPC actually now offers a service to facilitate consultancy from one member to another. Sometimes collaborations are geographical, thematic, many criteria… It would be great to have a sort of dating service for digital preservation – to find places to engage well with.

Comment, Steph’s colleague Tim: One of your key messages was that the repository is not a digital archive. But at ULCC we have been integrating EPrints with Archivein?. We have some information on that work over on the registration desk about how we

Comment: Our repository, DataShare, is a DSpace system. Our system does a report and checksum every day. You can check every day automatically that files aren’t corrupting in the database.

Steph: Absolutely. I didn’t want to go into too much detail but yes, if you go away with one thing, think about checksums. Really easy to automate checks that everything is ok.

Comment: Do you have any advice on media. I know about archive quality CDs – you can’t get them everywhere and I know they have a limited life.

Steph: If my colleague Ed was here, he’d jump up and down at the notion of archive quality CDs…

Comment, Kevin Ashley: If you look at lifetime costs of those formats it can be huge, especially if they are more expensive to start with. Buying cheap, switching formats regularly, taking advantage of cheap technology is so much better than buying into expensive current tech and maintaining it. In exceptional cases there are times you need to do that, but it is rare. In the extreme case LOCKSS/CLOCKSS they use inexpensive systems but mass replication that allows for some or many instances to fail. That’s a very inexpensive way to do digital preservation.

Comment: Is it better for in-house repository systems (already in place locally) with check sums, etc. or outsourcing hosting and checking of archives?

Steph: There are pros and cons. With external companies really check out the company, that has a good standing and reputation, that you trust, and that has some sort of guarantee of what happens if things go wrong.

Following a coffee break (with Tunnocks, though none of them were dancing!), we now have a choice of sessions. In Appleton Tower (room 2ABC) there are two sessions from University of Edinburgh staff: at 10.45 we have Ianthe Sutherland on Collections.ed – Launching the University Collections Online; at 11.30 we have Angela Laurins and Dominic Tate running an Open Journal System Workshop. Here in Informatics we will be hearing a longer session on the Jisc Monitor Pilot Project.

Jisc Monitor Pilot Project: an exploration of how a Jisc managed shared service might support institutions in meeting the post-2014 REF Open Access policy, Brian Mitchell & Owen Stephens, Jisc, Informatics Forum, G.07

Brian: The origins are in the Jisc APC project which identified key challenges in the management of OA (see case studies online). So we wanted to build upon research outputs of Jisc APC, work on UK Policies relating to Open Access – HEFCE for instance. And we wanted to explore development of services to help universities monitor policies, including funder policies. Institutions had expressed a need for support and a role for Jisc in monitoring ALL publications – not just Gold and Green, in complying with funder mandates – such as licensing or embargoes, in monitoring spend, and to guide and share best practice.

So our outputs will be functioning prototypes mapped to 4 use cases and released as free and open source software by May 2014, robust user feedback, and a number of other components. Sitting at the heart of those use cases we have the Jisc Monitor use cases around:

  • Monitoring all publication activity to ensure compliance with funder mandates
  • Monitoring all publications activity to ensure a clear understanding of what has been published
  • Standards development to enable efficient data exchange
  • Monitoring spend on all items – also looking at invoicing and payment details and whether they can be standardised to interact with other systems such as finance systems.

The project benefits from a really collaborative range of participants and team members that spans experience. Collaboration is key to this project. We are taking a user centred approach to development – everything is shaped around the institutions. We are taking an Agile approach to allow us to be flexible as things change. And the work is open source.

  • Use Case 1: Monitoring all publications – for funders and institutions.
  • Use Case 2: Compliance – this is about bringing clarity to funder requirements and understanding what compliance means.
  • Use Case 3 is around standards development and interoperability.
  • Use Case 4 is about monitoring spend. Ideally, if we can provide invoice and payment details in a standardised and consistent way  for accurate capture and recording of information. And some transparency required about standard OA charges.

This work has synergies with the Jisc Pathfinder projects, with RIOXX, the Publications Router, SHERPA, etc. You can find out much more on the Monitor blog, there is a timeline on the Jisc Colletions website. And we have a number of webinars and workshops coming up that you would be very welcome to be part of.

The next steps for us is around user consultation and requirements gathering. Follow on webinars focused on Publications Activity and Funder Complicance. Prototypes from use cases will be available in September 2014. Systems interoperability and systems workshops will follow later in the year. Now over to Owen…

Owen: Now, you’ve signed up for a mammoth session here. I’m going to tell you about what we’ve done so far, how you can get involved, and then we want to get you discussing stuff in groups.

So, Brian has already described four strands of activities here. Tracking of publications; assurance of compliance; clarity of charging; interoperability across this information ecosystem. So, what’s our approach to this? We are working in 6 week periods producing working demonstrator software in that time. It’s a very rapid and intensive working period where we try to create working and meaningful stuff. As much as possible we are showing that stuff to you, to the community for feedback. So we are trying to get feedback from the community every two weeks based on wireframes or based on working software.

In the first 6 week period – which we are in now – we are looking at these two aspects: tracking of publication; and assurance of compliance. The interoperability stuff underlines/overarches this. But the APC type stuff will come in in the second period of 6 week periods.

We spent May-June 2014 planning our approach, having initial discussions about data models – what the data looks like and how it might fit together. And we have been synthesising existing input from the community into a Requirements Catalogue. You can find that list of user stories, comes from a lot of existing work in this area. We are that we are coming from work that has been happening over many years, and we sit alongside many other projects. So we bring together stories from APC, from workshops, from Jisc, from Mimas, from report on cost of ownership, publishers workshop in January this year, also work we’d done when we tendered initially. That can be found at: http://bit.ly/monitor-user-stories/. There are 135 stories, each rated either red, green or amber. Green are things we think Monitor could or should address. Amber could be approached but may not be solved. Red are things that are out of scope – we can’t or shouldn’t do anything about. And we put our own explanations against that.

In July we ran a face-to-face workshop (8th July) reviewing that draft set of requirements – looking for gaps, looking at whether the initial ranking was right. We had great feedback there. We hadn’t prioritised tracking Green OA, as opposed to Gold OA, and we hadn’t prioritised non-STEM stuff. And feedback was that non-STEM and Green OA were also high priority. And we then held our first online update on 23rd July, as part of ongoing process. There will be 3 more in next month and a half.

Some of what we reported at that online update included this diagram on the sources of data we have, how we can work with it. These include howopenisit.org (looks for textual descriptions of journal licenses from publisher websites); DOAJ and a database of (enhanced) Journal/article data from that, also working with PubMed via Lookup Article by Journal ISSN. So we are connecting up that combination of article, journal, journal license… We can dream up elegant technical solutions for which there is no data to support the process, we have to be careful of that. And we have to be careful of what data sources we look at. So we are also thinking about the Journal TOCs service (originally from Tic Tocs project), which would be a non STEM source for data. We might also look to Web of Science and Scopus – but let us know what other data sources we should be using if you see gaps there.

So, we’ve connected this stuff up, we’ve created a basic Jisc Monitor UI. It is a basic browse and search facility. We are also working on the PubMed data… We have meetings with SHERPA/FACT planned in August, we have meetings with CrossRef, we have been engaging with publishers and looking at the best way to do this across the board. We have spoken to PLoS, and also the JACS group that includes publishers like Ubiquity, some of the new publishers.

Looking at our very basic UI we can search for articles, we can filter by country of publication, by license, by subject classification, by Language Code of Content, by Provider, by Deposit Policy. Of course the Jisc Publications Router is work we will build on in terms of affiliation of publication. So this is a baseline for published stuff. This is material that is published, with declared licenses. But can material in press be included, can we get data from publishers at the point of submission. That would be useful but there are sensitivities there… not all academics want to advertise where and when they are submitting, and when their work is rejected. But if we could capture metadata/identifier very early on that would be very useful. So that’s a challenge for us.

So, how can you tell us what you think of what we are doing? We have online sessions on Wednesdays from 10am to 11am on 6 August, 20 August, 3 September. If you wish to contact us, email Frank Manista (frank.manista@manchester.ac.uk – though that email address may change, contact Brian if that one doesn’t work). We also have a face to face workshop in London on 19 September, and that will look forward to future work on charging, APCs, etc. (http://bit.ly/jisc-monitor-workshop-2).

But you don’t have to wait until then… You can participate now! We’d like you to think about several questions today:

  • What are the key local systems for us to interact with? – what systems are you using, are relevant, what is your local set up?
  • What data do these systems store?
  • What data formats/data models are used?

So we will ask you to break into groups, take a sheet of paper each, and in that group come up with an area of data you think you have – could be publications data, charging data, whatever – and then write down everything involved in terms of systems, data formats or data models… whatever is relevant to you.

We also want to ask you:

  • Monitoring of institutional mandates etc – should we be doing this? And what would we need to do?
  • What kinds of institutional “mandates” are in place? (how do these differ from other mandates.
  • What kind of compliance and monitoring already takes place? Is this for Jisc Monitor to do?

And do email me: owen@ostephens.com with any thoughts on:

  • Data sources we should consider for article data – what do you use already? What do you need?
  • Data sources we should consider for licensing/terms data
  • What data is relevant as regarding licensing/terms?

Q&A

Q1: Is there a presumption that local institutions will make their systems and data available to the Jisc Monitor?

A1: Not necessarily. It may be about pushing out rather than pulling out. But it may also be about making recommendations about how you would connect payment data to finance systems – rather than a requirement or expectation of that being made. As well as discussions with publishers about how they could support that. It’s not all about direct engagement with data etc. It’s about getting that picture as far as we can at the moment, so we can act on that.

Owen: OK, so now the difficult bit… I am going to suggest that you work in groups of six, go towards the back of the room, to discuss those three first questions on your local systems, data used by those systems, and data formats and data models that they use.

Feedback from groups

Chris, Hull from the large group: We talked about making a link between the funder and the publication. Not an easy thing to capture. People felt that they have both funder and publication data, but separately. But capturing that data and it’s relationship in a workflow would be useful. But some data may sit with publisher… Research Fish(?) has that information, useful to institutions, but issue that it covers many but not all funders right now.

There was a desire to… recognise this as a problem. That we have to get academics to make that link as they are the best placed to do that.

Owen: And is there variety of what is done there?

Chris: Not sure we got that far but seems to be Ad Hoc.

Rapporteur, group at left back: Publications data is all over the place. In repositories – DSpace and Hydra; CRIS systems – PURE and in-house data; we have institutional web pages that sometimes come from repositories, sometimes from schools or individuals. Some use Primo as a library catalogue – which picks up publications like theses. In some repositories we have articles, exhibitions, etheses, OERs, RDM metadata, research data sets in some places. In CRIS we have grant agreements, funder information, funder terms (sometimes patchy). Websites sits in several formats and locations, sometimes rogue individuals maintain own papers. And we had formats including CERIF.

Owen: Did any of you use publications tools like Symplectic.

Rapporteur: No.

Owen: OK, next group…

George, Strathclyde: We were looking at local systems. We have really wide varieties of systems or data formats. Some use Jisc APC service, some register APC payments on a CRIS – my own institution uses 4 systems. Some have bespoke systems. And other stakeholders – e.g. research office – duplicate some work and systems. But we really talked about APC process, really Ad Hoc approaches taken. Jisc APC process simple in theory… but we aren’t sure of status of that service. A while back there weren’t enough publishers to make it a useful service for institutions – that’s why my institution uses so many systems. And there were concerns about compliance. And a need for greater buy in from publishers for standard systems.

Pablo, final group: We were looking at funder information as well, trying to answer the question – how to get funder information at an earlier publication. Our group had PURE, EPrints, several content management systems – some being upgraded at the moment. Highly fractured picture in terms of systems and processes. All people in the group used spreadsheets – either spreadsheets or Google Docs. And these are not standardised. There were also grant holding problems. Institutional grant codes are not always the same as those provided by funders. And also a lack of communication between research officers, repository managers and librarians. And we then moved into motivation. What is the incentive for researchers here? Seems to be publishers, the area the information will finally come from. But we wondered about information managers. Submitting data at publication time. And a role for funders to offer training and best practice information to code information. In terms of article submission time… too many of them… a common feature across different manuscript submission systems which is ORCID… might be a way in there…

Owen: That’s great. We have to conclude now. I would say that issue of capturing information at the point of submission keeps arising. Thank you so much for your participation today. Do follow the project – we will be blogging, and we encourage you to be part of our webinars and workshops.

And now, with just a brief pause, it’s on to our next session:

IRUS: Does anyone actually use what’s in your repository?, Jo Alcock, Birmingham City Univerity, Informatics Forum, G.07

I work at Evidence Base and was previously a subject librarian. I’m going to talk about the IRUS-UK project. This was funded by Jisc and the role of Evidence Base in this project was in user engagement and evaluation. This project was an outcome of the PIRUS2 project, which looked at whether it was possible to combine publisher and repository usage statistics. But at the time publishers were reluctant to do that. So there was a desire for repositories to work together to combine usage statistics, using the COUNTER standard, in a way that would later be combinable with publisher statistics when they were more ready to come onboard.

We sit in the wider Jisc landscape, under usage statistics, and are doing so in a way interoperable with those other Jisc projects. For IRUS-UK our aims and objectives were to collect raw usage data from UK IRs for all item types within repositories – not just articles, and about downloads not record views. To process that data into COUNTER-compliant statistics. And to return those statistics back to the repository owners.

We considered two scenarios to gather data. A Push “tracker” code, or a Pull technique using OAI-PMH harvesting. We decided to go for a Push method as it would be easier, minimise data pushed. And we created patches for DSpace, plugins for EPrints, and last month we welcomed our first Fedora user.  The ingest process is thoroughly documented on the IRUS website. The key aspect is that we applied a code of practice to filter out robots and double clicks, we’ve also tried to remove noise including user agents, overactive IPs etc. But that’s a minimum level. We commissioned “Information Power” to investigate and report that process. They analysed raw data since July 2012. But they found that suspicious behaviour can’t necessarily be judged on the basis of one day’s usage – but it can be almost impossible to distinguish non genuine activity. So we have amended our processes to improve ingest as a result of these comments.

So, here is the interface. You access this through Shibboleth or Open Access. We can look at totals to date, we can drill down into the participating repositories’ data. The total numbers have to be understood as reflecting length of time in IRUS, not just total number of downloads. We can also drill down the repository ItemType – we map repository types to IRUS ItemTypes. But we retain that metadata so if the mapping changes we can do that in the future and retrospectively.

Another statistic we can look at is DOI Summary statistics – how many, by item types, items that have been downloaded has a DOI in them. We can also look at Article DOIs by repository – a great report for repository managers to look at, to see the coverage of DOIs for repository items. You can also do a search of IRUS-UK by title, author, keyword. This allows you to see which repository or repositories the article is in, and the number of downloads. This could be particularly useful for researchers to track usage, say after a talk. Because so many people are interested in the ingest process and filtering process, we also include Ingest Summary Statistics. We record the repository, the RawDataIn count, the COUNTER robots – what they choose to include, IRUS-UK Exclusions measyre those removed. And then you can see DoubleClicks and FilteredOut totals for each repository.

Now onto reports specific to a particular repository. As a repository manager you are likely to be most interested in. We are working with our community advisory group as we want to be very user led, we are reviewing our reports. Right now they are called Irus Report 1, 2 or 3, but those names may change. The functionality will stay though. So, IR1 report can be used for Sconul reporting, looks at article downloads. IR2 report is about item types, performance of different times. And we have the ETD1 report, we did quite a lot of work to capture usage of dissertations and theses. You can also look at item type. We are currently working on ETHoS integration at the moment, this will combine ETHoS and repository data, to allow tracking of usage and downloads. We are working on this with the British Library at the moment, but this is the same system/approach we could use with publishers.

Repository Report 1 gives you month by month downloads by repository, to enable benchmarking. Similarly you can do this by article type (Repository Report 2) or by Jisc Band (Repository Report 3), or by country/region (Repository Report 4). We also have a proof of concept, CAR1 Report, for consolidated article report – where repository and publisher data can be combined. At the demonstration stage (only) at the moment.

So that’s the reporting within IRUS-UK. But this project is all about community. There are a growing number of repositories participating in IRUS-UK – currenty 64 repositories. We communication through the IRUS-UK Mailing List; @IRUSNEWS Twitter account; IRUS-UK newsletter – you can subscribe or use online; and IRUS-UK webinars. We gather feedback from participating repositories via surveys and conversations. And we have a Community Advisory Group who provide feedback to the IRUS-UK project, they try new reports, we meet with them (virtually) regularly.

We conducted an IRUS-UK user survey in 2014. 68% reported that IRUS-UK has improved statistical reporting; 66% reported that IRUS-UK saves time collecting statistics; even those who used other statistics felt that IRUS-UK had enabled new or improved reporting. 86% reported using it for benchmarking, the rest weren’t sure if they would – no one said they did not intend to use it in benchmarking. The respondents liked that it is COUNTER compliant, that it is easy to set up and use.

So, if you are not currently a member, please do get in touch and we will let you know if we currently support your software, and what the next stage would be. You can contact us via irus@mimas.ac.uk or take a look at our website: http://irus.mimas.ac.uk/

Q&A

Q1: We have a CRIS system and web portal. Can you add this system to those tools?

A1: We allow it, but we don’t currently have plugins for CRIS systems. Some institutions have mirrored repositories, every set up is different really. If you CRIS system integrates with EPrints, DSpace and Fedora then you can use it, but if you don’t we can’t yet support you. But we are investigating options.

Q2: You mentioned Google Analytics. I cache a lot of information overnight… I want to understand the user journey around the repository, the website, and back and forth. Has anyone yet looked at using IRUS-UK stats to improve the user experience?

A2: Not yet, but that would be really interesting.

Q3: Something that came in at Claire’s DSpace session yesterday is that we need to get DSpace into the codebase – it’s a patch at present but it’s buggy. So is this sustainable?

A3: We do try to work with DSpace and EPrints to make sure we are keeping up with new versions, but there is a bit of delay there sometimes as we do work independently.

Q4: We run a data depository… I think you measure one item per download… but for some items we have a number of different files.

A4: Right now we register each download, so we would record multiple download for that item. We also have a similar issue for versions… so if there are pre or post publication versions… it can be in separate records, can be a replacement of the file… so many approaches… So we are looking at how to deal with that best, what people would want in their statistics.

Q4: I think that means it would take a while for us to use for benchmarking…

A4: Yes, well right now we are focused on institutional repositories, publication outputs. For teaching and learning repositories there is very different usage, for data repositories there are also very different usage patterns. They are currently out of scope for IRUS-UK at present because of that issue with usage styles and, therefore, benchmarking.

And now, a break for lunch… 

The Open to Open Access (O2OA) project, Miggie Pickton, Research Support Librarian, University of Northampton, Informatics Forum, G.07

I’m going to be presenting on another of the Open Access through Pathfinder project. You will see some similarities between our project and others that presented yesterday. We’ve also been up and running for only two months now so much of this will be about what we plan to do.

The O2OA project aims to establish shared institutional processes for facilitating, promoting and managing open access. Very much driven by HEFCE and RCUK requirements. We are a consortium of three post-92 “modern” universities – Coventry, De Montford, and University of Northampton. We are all at different stages in implementing OA policy and services. And we need to embed OA in our existing workflow, without additional resources.

So Coventry are at an early stages in implementation of an OA culture across the institution. Host repository CURVE, using EQUELLA, They have specialist expertise in research impact management – and ran a previous Jisc project. They are taking the lead on research impact management. De Montford University, Leicester have an existing repository and CRIS, they are leading on CRIS. And at Northampton we have a digital repository, NECTAR, using EPrints. We have focused on research data management recently and are currently developing policy and process for OA publishing so will take project lead on OA to publications. We have an internal project partner – the Institute of Health and Wellbeing – because I wanted them to have a stake in the work, for there to be genuine researcher voices in the project. So, our team overall brings together many perspectives which should be really interesting.

There are some unique features for O2OA: we are all midlands based and all post-92, we include business development link to the OA agenda and applied research. We have a real focus on impact. And this is very much a non technical policy. In terms of benefits for the sector, we will be providing a consolidated review of the OA needs of academics, information managers, research support staff, corporate leads and external funders. We will provide an understanding of perceived and actual relationships between OA publications, OA data and impact. A translation of OA needs into associated workflows, and we should inform on methods to adapt repository systems and address interoperability issues. We will be creating case studies and really engaging with what is needed for behaviour change – one of our participants has a psychology background and experience in behaviour change best practice that will be part of this.

So, to date we had a presentation to the Jisc Open Access Good Practice Workshop (17th June). We have a project plan. We have a project meeting at Coventry. We are working on focus groups, a survey is under discussion and a review of OA guidelines. And that’s where we are right now. And thank you to our funders: Jisc, SCONUL, UK ARMA, RLUK.

Q&A

Q1: A question about your psychologist: is she an academic or was this about using domain expertise of library staff?

A1: She is an academic, by training and by bent. So she is bringing her theoretical and academic knowledge here. She has blogged about her work, she talks about planned behaviour as an approach for instance. Understanding people’s intentions and actions will be really valuable for us.

HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practice, Chris Awre, University of Hull, Informatics Forum, G.07

Imagine yourself on a Hawaiian island…

We are a project which, like Miggie’s project, has a regional dimension. But there is a common thing that brought us together: a desire to find out how open access could support research development, in terms of research as an organisational strategy. Each of our institutions does research, at different levels, with different approaches. But the dissemination of research seems to have little connection back to organisational research strategy. That’s partly as dissemination is often down to the individual to do. But funders, policies, talk about dissemination and impact having a role in research strategy. So we are looking at this from a research strategy perspective. So the aim is to engage with researchers in our institutions. and we want to engage with research policy process, to embed OA, where it isn’t already embedded. There is recognition of the value and necessity of open access in institutional policy but… it feels like someone told us to do it, or “the library was very insistent and made it’s case well”, but less clear that there is an understanding of strategic benefits.

So we need to establish a baseline, and see what changes, what impact our project can have. There is another project in the programme in Northampton doing similar so we are keen to share experience and present a more combined message in terms of dissemination out of the (pathfinder) projects.

We have 6 objectives:

  1. To establish a baseline starting point
  2. to communicate the policy landscape internally ad understand research local strategy/policy – and not just throwing policies at our researchers, helping them understand why those policies are there and matter.
  3. In that context, to review and define options for open access service development
  4. To enhance local systems to serve OA needs and embed external services – and part of that Jisc Monitor discussion seems useful.
  5. To monitor the relationship between OA and research developments within the institutions
  6. To report and reflect work to the community

We are being quite technical in our approach. Two of our participants use EPrints – one hosted, one locally hosted – whilst we at Hull use Hydra with Open Fedora. We need to make sure all are fit to work with those elements that are relevant, including Jisc Monitor. A significant part of this project is about getting to develop relationships and better understand our different research environments better, to get to know our open access environments better. To build links. And we will be working with Jisc Monitor, IRUS-UK and the British Library on license rights. The BL has it’s own issues around theses – many institutions do not express the license for theses and they are keen to find a standard way to get them to express that, so we will be looking at how that could be facilitated (but recognising existing Jisc Monitor work there too).

Those objectives are effectively the work packages. Hull leads 1 and 5, Lincoln leads 2, Huddersfield leads 3 and combinations of us lead the other workpackages. We have a website and twitter account so do engage with us and track the project there. So, from now until September we are undertaking our baseline and planning work, from now until December we will be investigating and collaborating with services outside universities with relevance to open access, and looking at OA services that best meet research and policy needs. So the idea is we get up and running in the first year. Then, in April 2015 we want to run an event reflecting our work back to you, the community, and to see what else would be useful, what other areas we should be looking at in the second year  of the project.  This project just bring together specific innovation projects at each institution. We will be monitoring and communicating policy as it evolves – although a pause in that evolution would be useful to pause and reflect.

So we hope to demonstrate the link between OA and research strategy and development, and the development of tools and services that facilitate these aspects.

Q&A

Q1: Can you share some of your ideas about OA and research and impact, what indicators will you be looking for?

A1: Quite often research planning takes place. Research strategy – and institutional KPIs – really focus on increasing research income. So you take that statement and you turn it into staffing, resource etc. We have an office geared up to supporting that KPI. In many ways it strikes me that that in itself has the potential to increase the research development of the place, as a place to do research. You can fund as much research as you like, but without understanding, without exploiting, without understanding the impact of what comes out of that research in your strategy, it’s not necessarily useful. If we can show that link between research outputs and dissemination with OA as part fo it, and how that can feed into research planning that will be really useful. It’s about seeing the institution as a place to do research, not necessarily as a place to attract research income.

Preparing for the UK Research Data Registry and Discovery Service, Alex Ball, University of Bath, Informatics Forum, G.07

I am part of the team developing the UK Research Data Registry and Discovery Service, here at the DCC, on behalf of Jisc. It’s not developed yet but I’ll talk about what we hope to achieve, what has been done so far, and what you, as repositories, can do to prepare for it. Most of us on the team are from the DCC, but we also have participation from the UK Data Archive.

So our service is a bit of a Ronseal – it does what’s on the tin! So, it’s about UK Research Data. Not about e-prints or OER, or Universities’ administrative data. It’s about collections of evidence that underline written scholarly outputs. If we are interested in those other aspects, it’s only in relation to that central form of data. For some institutions there will be data archives, for some we will be needing to extract the research data aspect from the larger repository.

It is worth noting that it is a Discovery Service for UK Research Data – we won’t hold all of that data, we will be about discovering it. We can’t require researchers to inspect all of the data repositories individually, so it will remove that barrier. And our hope is for data sharing and data reuse – with appropriate credit for researchers, better impact and value for the funder, and a better record and evidence basis for that research.

Now, we are not the first people to think of this. But none of these have the same focus we do. We think we will slot in and complement the existing landscape. We will be collecting data from repositories, unifying, aggregating, and eventually we’d like to make those records available in other places may look, including search engines, international aggregators, other UK registries – including RCUK Research Gateway.

We have now completed phase 1, we built a registry based on ORCA software – used in Research Data Australia. We knew it was working well there and we had experience of making that software more portable. We think it works well as it’s search engine friendly – works well with Google for instance – and provides citation and rights information up front, promoting the idea of research data being first class scholarly output. For this pilot phase we needed volunteers to help us understand their requirements. We had 9 universities engaged, alongside UKDA, Archeology Data Service, 7 NERC Data Centres (via catalogue service).

The ORCA software is based around the Australian RIF-CS format, so we created Crosswalks for standards already in use by those repositories: DataCite, DDI Codebook, EPrints (native and ReCollect), MODS, OAI-PMH Dublin Core, UK Gemini 2 (used by NERC data catalogue service in NERC, data.gov.uk, compatible with ISO standard used internationally in the EU INSPIRE directive).

So, we created a pilot. It sits at http://rdrds.cloudapp.net/. Now before you go there, be aware that you won’t see much… the records automatically sit in a holding zone until they are made public – and I’m not sure any of the pilot data has been made public yet. But if we look at an example we see the metadata, a sample citation, identifiers, additional metadata. This metadata schema here doesn’t include everything so that link to further metadata in the source record is essential for researchers needing more information on e.g. use. Access rights are shown, records are connected to people, you can look at subject terms – and navigate to similar content through those. And you can also see Suggested links – both within the registry, and outside (in DataCite). Spatial coverage and tags are also shown. So it’s a start…

But we need to clarify use cases and workflows. We had a good number of use cases but we’ve had a recent workshop we are still looking back at. We also want to compare different possible platforms for the service and assess their suitability – we want the best possible system. We want to establish a working instance of the system, involving all UK data centres and university data repositories if we can. We also want to establish a simple workflow for adding new data sources and adapt to changes in the existing data sources to avoid duplication – whether merging or ranking/preferring one record over another. We also need to test usability – for end users but also for local administrators. Finally some important documentation aspects – recommendations for quality and standardisation of metadata records. And we need to evaluate the costs and benefist of the system.

So we have a long way to go, and two years to get there. We are already being asked what this means for repositories. We think it comes down to metadata, syndication and participation – please do get involved in this participation phase.

In terms of metadata our pilot work suggest we need fields: title; description/abstract; dataset identifier – for de-duplication management so needs to be persistent and global if it can be; subject – for that click through functionality and recommendations, we found many repositories using RCUK classification; URL of landing page – a discovery service so you need access to additional metadata and the data itself, but should be derivable from dataset identifier; Creator (+ID) – some issues in the pilot phase around consistency of format so ID would be particularly useful too, especially for avoiding duplication or fracturing of records; release date; rights information; spatial coverage – particularly important and needs to be a nice structured format; temporal coverage; publisher. Many of these elements are required by DataCite and include all the elements needed there – so a way to kill two birds with one stone.

In terms of syndication we want to do this through OAI-PMH, CSW, Atom/RSS – useful for Sword perhaps; Other XML export over HTTP. Whatever we do we need to separate data sets from the other stuff you have. If you have a separate data repository that will be easy. But if not then we need the type “dataset”, or we can draw on specific sets or collections (actually what we do with DataCite). And the more metadata detail, the better.

There are several ways to get involved. You can let us know about issues that concern you – in phase one a contributor was keen to stress the importance of author names in the citation – we have therefore ensured that we have encoded that information in the XML, to preserve that level of detail. If you have data records do let us know so that we can have a look, try to include them. But we’d like you to do more – test the evolving system for us by setting up and updating an account on the system; harvest your metadata into the system and check it; see if we handle duplicates (and non duplicated) correctly; see how your records look on the system; see how easy they are to find; and measure the visibility of your datasets before and after inclusion.

Do get involved, and do track project progress at: http://www.dcc.ac.uk/projects/research-data-registry-pilot/.

Q&A

Q1: Is there a way to see the software used to create the metadata. And I was also wondering about the spatial data issue and shared equipment.

A1: Hasn’t occurred so far but I can see the shared equipment data raising that use case.?

Q2: Are you looking to expand the metadata that you have?

A2: We are right at the beginning so still deciding some aspects. ORCA has a set number of fields. We could ask for additional fields to be added. We can make relationships to generalised objects. The metadata scheme you mentioned, we did look at. And in another system we could adopt the DataCite schema. There are lots of possibilities.

Comment, Kevin: It’s worth emphasising that one of our main guiding principles is to require as little additional work as possible from participants, unless it delivers some use for them. So whilst we can think of lots of metadata that might help, right now we don’t have evidence of how that metadata might benefit discovery. We want best value from what metadata exists, and only then think of what more could be added?

Q3: On the issue of metadata, the first thing for a service is to see what metadata people need now, rather than clutter it up with more. But at the same time I wonder if including what funder is behind the work is a think that you have thought about that as a metadata field to include. As well as owner of that data – and reuse around that. Because being able to search across data repositories as a funder would be very useful.

A3: On the point of owners, a point of contact for the data might be less controversial and fits the ISO standard we are working with. We did consider funder in phase 1. We had an issue with the RF-CS schema was around that identifier. There if you have a funder, you need a project identifier. The data we ingested sometimes had both elements of data, but not all had a project identifier. There are ways to overcome that though, as they have in Australia.

And now, coffee… 

ArchivesSpace, Scott Renton, University of Edinburgh, Informatics Forum, G.07

I’d like to talk about ArchiveSpace, our new Archive management tool. And firstly to tell you a bit about archives at Edinburgh. They sit within the Centre for Research Collections. We are categorised under Special Collections and Archives. We have a wide and varied selection of archives – the Laing Archive – who had a weird and vast archive!; ECA archive- a highlight being the contract and life portrait of a young Sean Connery; Carmichael Watson archive, Roslin Archive; NHS Lothian Archive; Godfrey Thompson Psychology Archive.

So the current management uses the ISAD(G) standard, the EAD(XML) Schema. These are as laid down by the GASHE and NAHSTE projects. As happens so often the commercial systems don’t always do what you want, so you end up with bespoke options. Recently a system called CMSyst which was built by Grant Butters, an archivist developers hate – great on both the archives and the technology! The system is robust in terms of access and authority control but built with MySQL/PHP. And we also have Data feeds Archives Hub, ARCHON, CW Site (EDINA).

We had a new archivist coming in, so we wanted to pick a system. DSpace was an option, but not all great. Vernon was great for clear objects, no way it could cope with boxes of stuff, collections within collections. We looked at too commercial systems Calm and Adlib – now merged – not good enough to justify the cost. Then we looked at Archivists Toolkit – brilliant… but no front end! ICA Atom looked good, but nothing like the functionality we needed. And so that took us to Archives Space – which has the right mix for us.

Now I was being a bit dismissive about Archivists’ Toolkit, it was the predecessor to ArchivesSpace and trialled extensively, structured to work with EAD, ISAD(G) standards, MySQL database, Good authorities control. But no front end – which meant we needed to wait or build one. Hence ArchivesSpace.

ArchivesSpace has all archivist functionality in one place, it’s web delivered, it’s Open Source, and built by the community - Lyrasis. It is a MySQL database, running under Web Server (often Jetty, we use Tomcat); Code developed in JRuby, available through GitHub, and has four web apps for search, management etc. We had all the data from CMSyst and we exported in EAD. There were a few issues but more or less there now. We also have the functionality in ArchivesSpace to link out to illustrative digital objects. We already have the Luna system for a number of more random items, often separate from collections. So we are and will be making that link.

The system is really popular in America. We are the first European institution to take it. There are Charter Members (54) – they are the executive. And there are 100+ General Members who can contribute – including Aukland and Hong Kong now. So significant and large are the archives using the system that we are almost second tier in terms of collection size, it’s a really significant uptake.

My colleague Ianthe was talking about collections.ed earlier. ArchivesSpace will effectively be the expression of archives within. It needs to be branded seamlessly through the CSS – once all set may work better in ArchiveSpace, need to surface Archives as CLDs. We have had to set up cross walks and deal with duplications. There are 1000s of collections but we think in the next few months we’ll see it all working together.

This has been a really collaborative project with curators, archivists, projects and innovations staff and digital developers all involved. Good training delivered to archivists, and there is a manual created collaboratively by the teams. And good collaboration with other institutions, and interns here.

So, next steps… colleagues will be at conferences in Madrid and Newcastle. And there is a sister application for museums called CollectionSpace as a possibility for our museums. But we wanted to mention it today because we are really happy with it. Do let any archivists you know, looking for a system, to get in touch with us.

You will find more via our collections portal: http://collections.ed.ac.uk/

Q&A

Q1: Is this intended to run as a catalogue or, as you begin to archive digital artefacts, will it also store those, or will they be elsewhere like Luna for images.

A1: It is a flexible module but at the moment it’s focused on the catalogue/management aspects. But I know Grant Butters is thinking about digital artefacts.

Q2: Statements?

A2: Everything is effectively publicly available. ?

Q3: Has there been much consideration of making metadata available through other discovery services?

A3: The Archives Hub has been talked about so far. That’s the main place for archives across Britain at the moment.

Q3: I’ve seen metadata from Archives feeding into institutional searches.

A3: I think we are trying to do that with CLDs, but not sure it will get down to that. Still a few things to conquer here really.

Developer Challenge Presentation – Ianthe Sutherland

Ianthe: We have two awesome entries to the developer challenge. We will hear from “Are We There Yet” and “Repository Linker”. Then we shall cheer and see who takes what prizes!

Are We There Yetttt? – Miggie Picton, Marta Riberiro, Adam Field

Miggie: This was designed to solve a problem in multiple repositories, because of HEFCE’s requirement for authors to log a paper at submission, before accepted for publication. So I asked for a tool to alert us when a tool sent in for publication is actually published. So that was what I asked for!

Magda: The user is presented with a form for title, author, and an email. And when they submit… the system goes searching…

Adam: We decided that once something has a DOI, we can be confident that it has been published. So we take the metadata, send to CrossRef, and search for suitable DOIs, and seek out exact match for title and creator. If we see that, we grab date first seen, and alert the user that something was found at this DOI. Then librarian’s responsibility to check that. Obviously this system would benefit from more metadata but we’ve kept it simple…

Magda: So when results are in you can click through to view the record. You can then mark that DOI as correct, or as incorrect. If it is correct the date is logged. You can have more than one correct DOI.

Adam: The other feature is that, whilst you are waiting to find the paper you get the message “we’ll get there when we get there!!!”

Magda: So we’d develop this with a connection to CRON-job so the user doesn’t have to sit and wait for the result.

Repository Linter – Richard Wincewicz, Paul Mucur and Rory McNicholl

Richard: We took a while establishing our idea. We came up with a tool to check the completeness of your repository records, and help you fill in the gaps. Through a web service and a plugin,

Paul: Part of this is a webservice for entering a record for the repository… it will spot gaps, but suggest replacements from services like SHERPA, CROSS-REF. To demonstrate this John? has created a demonstrator integration for EPrints.

John?: So you take a record… it sends the data to Paul’s service, looks at where the gaps in the metadate are. It suggests replacements to fill those gaps. As you confirm details it filters down suggestions so that data is gradually filled in. So, imagine you’ve been handed only a small piece of paper… no other metadata. Hmm… but there is a DOI that seems to match… lets have a look… who are the authors, who are the funders… So we are almost there… And, in theory… if we try to add another author, new projects will be suggested based on that author. But that’s just a tiny part of what you could do for your repository – more powerful options there, imagine the possibility!

Prize Bit

Ianthe: We have prizes thanks to our lovely sponsor the Software Sustainability Institute, and Neil Chue Hong is going to quickly tell us a bit about what they do…

Neil: The Software Sustainability Institute want better code available for researchers. We engage with events like Repository Fringe because we think code expands beyond software engineering. A couple of reasons… we want to support people like these developers, to develop and support them. But also because we have a need to have preservation software – particularly when linking things to papers. We want to arrange a hack for one-click solutions to archive software to repositories like DSpace and EPrints.

Cue many cheers for both! But the first prize goes to Repository Linter! But prizes for both from our lovely sponsor the Software Sustainability Institute.

CLOSING REMARKS, Peter Burnhill, EDINA, Informatics Forum, G.07

EDINA are one of a triumvirate of Information Services organisations – with DCC and the University of Edinburgh – behind this event. And I’d like to start by thanking all of you for coming along, for speaking, and I’m sure we’d like to thank all of those who made today’s event possible. We are about innovation and creativity, as in the presentations we’ve just had.

So, looking over the last few days… yesterday was rather dominated by open access… that is part of the threat of the REF I suspect. But almost 50% of you are new to the Repository Fringe, there is change, our group is broadening out beyond early adopters a few years back. When other colleagues across the world talk about the repository, they are not as obsessed by OA as we are, they think beyond Gold, Green, the Finch Report… those are important for adoption but they are not everything. So to really realise the potential of repositories there is a real need to understand the importance of research information management – and I think that penny has dropped – and bounced a few times too!

We now have connections between the four separate areas – the publishers, the library and related areas of institutions, research awards and funders, and the academics and authors. I think both of those Challenges we saw this afternoon illustrated many of the challenges still here. The issue of incomplete, partial metadata and the need to get the best that you can, and select the best available data. And the first presentation was so important for illustrating what I call the “celebration of the purchase order”. So, now we have Gold, we are supposed to buy into a service – we should issue a purchase order for that service. But you don’t pay before the job is done. That first Challenge was about logging that purchase order in a way, and that DOI returned letting you pay your bill (look out for that bestseller novel: “The PI Worrier and the Bag of Gold”).

Day two has been about preservation – a flood of tweets there. On research data. and the idea of special collections – these other aspects of repositories and concerns for the long term. If we look back to OR2014 you can see that these ideas are definitely knocking about more widely.

Anyway, what we do is about ease and continuity of access. Repositories support that endeavour. Both elements are important. And it’s not just the PDF of the research article… It is the aim to ensure that RepoFringe continues to celebrate and support the developer but always with an eye on policy-purpose.

Big thanks to all who came from near and far, to all the organisers for today.

Martin Donnelly, DCC: So that’s it for this year. We want this to be a great event, and to make next year a great event we need your feedback, your comments, your ideas on how to make Repository Fringe 2015 even better! So email us, comment on the blog, tweet us! And finally I’d like to particularly thank Dominic Tate (UoE), Laura ? (UoE), And Lorna Brown (DCC) for making the last two days happen.

Thank you to all of you who came along, who have been following on the blog and on Twitter. If you have any feedback for us do comment here, email us, or tweet us… but there is also an official survey that you can complete here. And, keep an eye on the blog for follow up posts, links to your posts about the event, and we’ll try and add our pictures into these liveblogs too!

Repository Fringe 2014 LiveBlog – Day One

It’s finally here! Repository Fringe 2014 is live, with two busy days of talks, discussions and our Developer Challenge. We will be tweeting – follow #rfringe14 for all updates – as well as liveblogging the event. We are also taking images and videoing some of our talks, and these will be uploaded and made available here on the website shortly after the event.

This is a liveblog and that means there will be a few spelling errors and may be a few corrections required. We welcome your comments and, if you do have any corrections or additional links, we encourage you to post them here. 

Welcome to Edinburgh, Kevin Ashley, DCC, Informatics Forum, G.07

Welcome to Edinburgh! I’m Kevin Ashley, Director of the Digital Curation Centre, one of three organisations – along with EDINA and the University of Edinburgh – that fund and make Repository Fringe possible. I know this is our biggest Repository Fringe ever! This is something of an unconference event, we try to adapt the event to the needs of the participants – and I’ll say a bit more about that shortly.

Coming back to the origins of the event. Back when I first got involved in repositories, back when I was at the University of London, we heard about this project at MIT called DSpace… as it happens our university moved away from DSpace and into EPrints. But for a long time we spent a lot of time looking at managing papers and collections of papers. And whilst that is part of what we focus on at Repository Fringe, we also like to think about what else we need to store – collections of tweets, multimedia… and we think about what changes need to happen to move things forward, and what barriers need to be overcome to do that.

One way to do that is the Developer Challenge. You have until lunchtime today to register for that. You’ll have 24 hours to solve a particular repository problem. Even if you are not a repository you can contribute by feeding in ideas, also by lunchtime.

Of course we know there is also the Edinburgh Fringe taking place – and just like the Fringe, Repository Fringe takes place across several venues this year! We are mainly here in Informatics, in Appleton Tower, and some events are in the Main Library just across George Square.

One further thing about the Repository Fringe – we really want to open up the discussion about what the Repository Fringe should look like in the future. Do come and chat with us, let us know your feedback and ideas.

And now, to introduce our opening keynote speaker Yvonne Budden, who has worked in libraries since she graduated in 2002. She has worked on the WRAPed project and currently works at the University of Warwick spanning repositories, scholarly communications, data metrics around repositories, etc. She is also chair of UKCoRR and sits on the Jisc Scholarly Communications Advisory Group. So, over to Yvonne.

The revolution has been cancelled: the current state of UK open access,Yvonne Budden, University of Warwick, Informatics Forum, G.07

When I was asked to speak I was told to be controversial, to get things off to a rousing start. Now, I may not be as rousing as my picture of Lenin here but I do hope to trigger a revolution by the end of my talk! By which I mean a radical change.

I was reading an article by Richard Poynder (2014) the other day, answering questions on where we are with repositories and what still needs to be done. He said

“For their part, publishers tend to assume that OA advocates are freeloaders or – as ACS’s Rudy Baum appeared to imply in 2004 – dangerous socialists”

Now, when I was reading that I was chasing and paying invoices to publishers, awaiting a new HEFCE REF, but nonetheless I liked the idea of being a dangerous socialists! Are we getting a stall in the way we are thinking? So I got to thinking about revolutions, about movements. I looked up some definitions and have picked three…

1. A wide reaching change in affairs

So, this year we had another significant anniversary. The 25th anniversary of the invention of the internet. The internet has been a huge change for people but it has also become synonimous with the idea of freedom of knowledge, and the freedom of knowledge – also associated with Gutenburg’s press and the history of publication. Communication is speedy and international, information is readily available – good and bad. And people’s expectations have become to change in this environment. People expect to be able to get this information. People expect to be able to Google something and find an answer. So are we, in scholarly communications, making the most of that opportunity, those advantages, and those expectations? I think this will be vital for researchers who are time poor, but also these tools will help us to do our job and support researchers better, to automate part of our work.

We had Stefan ?’s call to arms in 1994. We saw huge amounts of action and advocacy around open access. But it has been about fixing our problems – about overcoming access issues. Is it really a people’s revolution? When we get a collective group of people talking about something, you can do anything. People power is important. But an issue we have in the open access world is that… is what I think of as open access the same thing that you think of as open access? What is green open access? What is gold open access? I heard “diamond” open access the other day! If we don’t understand how can we expect our researchers to do that. These are complex and difficult definitions. It is an international movement and yet everyone looks at the issue from their own environments, their own local context. That makes sense, we can’t change things beyond our immediate area, but we are also activists, we are part of bigger movements. We need to start thinking with one voice, in one specific direction, to make things that little bit clearer. There has never been a single international body that has pushed for open access. Why haven’t we got an IFLA or Libre… international movements. Are we giving bodies that do exist, like SPARC, enough power and support for them? Are some forms of open access more equal than others?

2. Revolution is a cyclical recurrance

The oldest definition here. We have an interesting survey from Taylor and Francis – attitudes are up across the board, except for the speed of open access. I think that’s to do with people engaged in hybrid open access. Almost half of T&F authors plan to choose green open accces in the future, a third plan to choose gold open access. 66% of those asked found articles useful to their research in repositories. And 45% of these international authors found the repository version just as useful as the “version of record”. Nothing in this report is likely to be a surprise, but there are more people talking about open access. Responses feel more informed all the time. Open access is no longer the preserve of “that crazy repository manager”. We no longer hear people asking what open access is. Now it is “I’ve heard of it, but how do I do it?”.

I also wanted to contrast that T&F survey with some research did with our Warwick researchers. We did two surveys – one in 2011 and one in 2014. Both are based on Huddersfield’s Repository Support Project survey. We haven’t seen a big change in attitude (in favour) to OA. But we have seen increasing numbers making work open access (and happily for us 67.1% used WRAP). And interestingly 70.4% believe copyright of articles should remain with the authors. That’s interesting and tells us that the status quo is kind of broken. And 93.5% keep a copy of the authors manuscript – although I can tell you now that not all of them are sending them to me. And finally 48.6% are happy with CC-BY licenses for their work.

Just a quote from one respondent, who notes that the model for open access means the university pays to make work OA, and also for licenses to journals, so pay twice when it would be better spent on research. And another respondent raised the issues of the bounds of plaguerism – an issue we will all increasingly need to grapple with.

In terms of attitudes to the RCUK policy from Warwick researchers, a good 20% are strongly in favour, others are also (less strongly) in favour. Contrast that with attitudes to the HEFCE policy. They are completely different approaches, with very different priorities. The key areas and mechanisms are very different. But reactions of the researchers are very similar. Those in favour of the RCUK policy, also seemed to be in favour of the HEFCE policy. It will be interesting to see how those strongly not in favour get on.

I think it was my very first Repositories Support Project meeting that I heard the phrase “Carrot and Stick” – important for compliance for all of us. But I think one of the things we have to be very careful about as we engage with publishers on an article by article basis for gold OA, is that the carrot for the researcher in funding does not become the stick for the library in terms of work and admin. Anyone working with publishers will be aware of just how much work it takes to get OA working. Some of us still aren’t getting what we paid for – the licenses, the speed of publishing. And the irritating practice of having an article under a different license when “in press” than when published? Should in press articles get that same CC-BY license in that crucial 4 month window when the research is new and live.

And issue of ownership, and a brief quote from Leo Tolstoy on the taking of labour by force… I have had the experience of explaining how publishing work to someone totally outside of that world. You explain the process, from author handing over copyright, then reviewing for free, then buying articles back for the researcher to borrow. I have a friend in banking who I explained this to who couldn’t understand this model. Our whole model is based around publishers certifying the research. Except it isn’t the publisher that is certifying that research. Isn’t it time that we do it ourselves. The SPARC agenda – the license to publish rather than a license to make open access – is becoming really useful and interesting here. When I undertook my librarianship degree the phrase “knowledge economy” was being bandied about, but I think we are only just seeing universities and researchers thinking about how best to exploit their knowledge, to be part of this knowledge economy.

At this point we are 16 months into the RCUK 5 year policy. We are having issues with pubishers, with getting what we’ve paid for. And there is that perception of OA taking money away from research. The RCUK policy is well intentioned. It makes researchers think about journal selection, about where to publish. Or at least it should. We don’t quite see that… instead they submit, they get the article accepted, and only then is it a hurdle they have to cross… and they want money to publish in a journal and we have to point out when that is not a compliant journal, and not an article we can pay for. There is still much to do to ensure this checking, this OA process, begins much earlier in the process, at the point of selection and submission.

Now, many of you will be familiar with Swan and Houghton’s 2012 work on the cost of implementing OA. It’s very clear from that work that OA will cost this country money. And how do we go forward in a mixed ecology where some countries support OA, some do not. Are we creating two classes of research and researchers? Are all articles and is all OA equal?

One of the existing tenants of scholarly publishing has been the idea of free markets. But if you can only buy a journal from a single publisher, that isn’t a functional market. That doesn’t encourage prices to even out. It’s the situation we’ve had with journal articles, and it’s an issue we will see with APCs. We will create another monopoly within the publishing market. We are already seeing that costs of subscriptions and APCs are continuing to rise. Are we creating a market by paying for OA? Going forward we need to make sure that we don’t do with research, what we did with journals. Now with journals we did a classic library thing… we made life for researchers easy and seamless… but the costs and work in that were hidden. We’ve now had to switch off IP access to electronic resources because of the number of conferences etc. held at Warwick. This is something that helped raise awareness with researchers that the library is paying for this stuff. They don’t get this stuff automatically. The library puts a lot of budget in. And with APCs we need to do similarly. We should develop seamless processes all through the environment and process so that those who want to pay for OA can do so easily. BUT we have to make the cost, and who is paying that cost, clear to our researchers. That’s essential for us going forward. That sense of researchers understanding that there is a cost here. And that’s the only way to focus attention on the cost of journals.

3. Overthrowing of an established government or social order by those previously subject to it

Now this something we need to be part of. We need to inform researchers so they can change their own environment and process. But change won’t come quickly. One area we are subject to change in… there has been a certain amount of vocal government support of OA approaches. I don’t know if the new universities minister, Greg Clark, may do anything. I don’t think things are looking hugely promising for the current government – and that may not even apply to colleagues in Scotland after September! – but what will things look like for the UK in May 2015? Will there be another shift? Can we do anything about it to push forward a wholesale change in the environment, in scholarly communications?

Now, when we talk to a researcher about OA, which hat is he wearing? How much of our message gets across depending on whether he’s wearing a research, a dissemination, a supervisor, a teaching hat. And we need to be aware of all those hats – and the needs they have under each of the many roles they combine. Can we think in a more joined up fashion and get people to engage in all parts of that process. This gets back to the issue of research data management. It’s been interesting for me to take on the role of Academic Support Manager at Warwick, to engage researchers at other points in the process – not just when they have this product, this neat journal article. I find being a contact for researchers around their research, their research data… if we can engage when they wear all of their different hats – and if we can change our repository manager hats to suit – then that will make a real difference. At Warwick I’m known as the “open access person”, that’s fine but I don’t think we can afford to just be. Open Access doesn’t just happen in isolation – it’s part of the full research life cycle.

I’ve spoken a lot about open access in the research community. So what is the future of scholarly communications? Is electronic publishing and OA about replicating print materials. There has been a revolution in OA Journals. Can we think differently. Can we have a conversation about what “dissemination of research” really mean? What do researchers want it to mean? For a researcher to fulfill open access, public engagement, widening access, REF, requirements all at once and differently? When preparing Warwick’s REF materials I was sad to see an awful lot of interesting and innovative types of outputs disappear from the outputs section as everything became a homogenous series of books and journal articles. Some of that stuff went into case studies but for many researchers REF meant that type of materials. It didn’t matter what REF said – it had wide requirements – our researchers were totally focused on their traditional outputs. But there is so much more taking place, so many other forms of output. Is there not something to do there. In that T&F survey 11% of researchers said that the current scholarly articles will no longer be relevant in 10 years time. That’s depressing. We need to think outside of the box now. We need to encourage researchers to think beyond packaging research as journal articles. I think it’s time to start having these conversations, particularly as we move publishing activity into the institution, where we can make a real difference. So if a researcher wants to set up a new journal can we start talking about research outputs that enable interaction, that are conversations, that get more people involved. We can think of researchers in a different way, outputs in a different way, and scholarly communications in a different way. Changing researchers attitudes in a way that moves forward the whole.

Q&A

Q1: You mentioned switching off IP access. How did you do that?

A1: There are a few journals we have had to leave IP access on because there wasn’t another way to do it. We had backlash to that change otherwise – and all from one department. We have enabled the switching off of IP access using a proxy that forces login.

Q2: Given how things have played out over the last 10-15 years… there have been lots of missteps around whether Institutional or Subject repositories are better. Open vs Libre access. Should you change the whole scholarly communications structure… if you had been in charge, if you had been in power, what do you think should have been done differently?

A2: I think in a lot of cases the biggest issue is that open access, as a movement, has been split. And there have been skirmishes around which type of open access is better. I would have bashed some heads together, to seek a way to agree on open access, to enable all routes to open access no matter what sort of repository. So if we speak to researchers the library talks about one sort of open access, the publisher talks to them about a different sort of open access. Open access means 30 or 40 different things and you can’t get concensus in that environment.

Q3: I wanted to clarify – who do you consider to be the revolutionaries? To me it’s no longer the libraries’ problem, it’s an academic problem. Doesn’t that shift emphasis from repositories towards solutions for them.

A3: Absolutely. But I think we still have a role here. It is an academic problem, our role is to find solutions. There is a lot of work in the OA community – options, ways to get systems to talk to each other. But we may no longer be on the forefront anymore. To be honest I’m not sure we ever should have been.

Q4: I was really interested in your last point – about broadening what we do in terms of research dissemination. There is no rule against that. Publishers don’t neccassarily stop that. And open access articles are not the only way of doing that. Would it be a good idea to encourage researchers to put data in data repositories, to disseminate their researchers in parallel to scholarly articles. To breaking out of the loop.

A4: yes, this is the time to start thinking about this. We need to think of data as a product – not just the by product of an article. The point I was trying to make was that when researchers think about what they need for promotion or for the REF, is that they tend to fall back on the traditional. I think it would be nice to think about not having that article in that top flight journal isn’t neccassarily bad if you are disseminating elsewhere – the data, the research methods, etc.

Q4: I think we are not ever going to be at a place where repositories or institutional websites will ever have the same impact factor that researchers need for the REF and for promotion. I just think that we need both in parallel. Making data available for reuse, for text mining etc.

A4: Yes, but personally I would prefer not to need to have that traditional model always there.

Q5: I was really pleased to see you talking about rights, about licensing. I came into repositories having done work selling writing. I was sure that model wouldn’t survive.. but it has. This has been a real brick wall. We now have the means to publish… I can’t seem to get over the hurdles, to make researchers aware that THEY are the value of the journal – their reviews, their input. It’s not the publisher. This has really inspired me though – do you have any tips?

A5: I have helped researchers retain their own copyright, but often thats researchers coming to me and wanting to find out how to avoid signing a publishing contract having seen colleagues retain copyright of their work. Oddly I think we will have more traction with established academics, young researchers tend to be more conservative. But the new impact factors from Thomson Reuters, and discussion about those, have been encouraging. Thomson Reuters took away impact factors from 39 journals, the most ever, because of academic misconduct. That idea that the current landscape cannot (always) be trusted can create some real opportunities.

Meeting Research Funder Requirements: RIOXX Application Profile and Guidelines, Paul Walk, EDINA & Balviar Notay, Jisc, Informatics Forum, G.07

Balviar: I wanted to start by giving you an update on RIOXX 2.0. We were hoping it would be released by June but we are awaiting NISO approval of the free-to-read and licence-ref. We now expect their recommendations towards the end of August or beginning of September. We don’t expect big changes but we want to wait until their recommendations are received. However we don’t want to wait to develop our plugin so we will be working on that – consulting with the DSpace and EPrints communities – and, after that approval, we will be testing with early adopters in October/November 2014. Just get in touch or respond to my post on the Jisc Scholarly Communications blog if you would like to be in that early adopter test group.

We will then, in December 2014 through to April 2015 we will be implementing RIOXX, and there will also be support around that implementation process. And that will allow data collection with the appropriate RIOXX fields to effectively support RCUK evidence-based reviews of the effectiveness and impact of their Open Access policy. Jisc Monitor will also benefit from this consistency of data. Unless you have consistency, it’s hard to build services on top. And this will also support institutions in early planning for REF OA compliance starting 1st April 2016.

If you Google “Jisc Scholarly Comms” you’ll find more on this and on the timeline for this work. And with that it’s over to Paul.

Paul: So I came in late to Yvonne’s talk, just in time to hear that it’s not just about compliance! Anyway…

As most of you have already heard something of RIOXX I’ll just summarise… it’s a metadata application profile intended to allow repositories to report on open access publications in a way which satisfies reporting requirements from RCUK and HEFCE. It’s a set of guidelines for repository implementation. And it has been developed by UKOLN and is now with myself and colleagues at EDINA. And this work has been funded by Jisc.

The original concerns of RIOXX was how to represent the funder for research. How to represent the project or grant. But also how to represent the persistent identifier of the item described, the provision of identifiers pointing to related dataset(s), and how to represent the rights of use of the item. And those concerns haven’t really changed.

We also had some original principles for RIOXX: that it be purpose driven – and focused on satisfying RCUK reporting requirements; Simple – relatively easy to implement; generic in scope – so not just about traditional publications; transient – a year or two ago there was a lot of interest in CERIF etc; interoperable – respecting other related standards that apply; developed openly – not just in closed mailing list, developed as much as possible with public consultation. So, what’s changed? Well it’s still about being purpose driven – for RCUK and also serves HEFCE requirements without deviation form the original remit; simple – slightly more complex than version 1 but still pretty simple; generic in scope – now gone, it needed to be specific, but that doesn’t mean we can’t conceive of parallel or more generic versions; transient – kind of gone as we talk about preparations for REF 2020; interoperable – still working with OpenAIRE, currently mapping the profiles to enable a crosswalk (in one direction); developed openly – yes, you can look on the RIOXX website, on my own blog, public discussions.

The other major change from version 1 to version 2 of RIOXX has been about implementing recommendations from the V4OA process. So, we have released a beta for public consultation in June 2014 – we had a great response including many of you in the room so thank you for that! Now version 2.0 RC 1 has been compiled, we are writing accompanying guidelines, and XSD schema has been developed.

So I just wanted to go through some of the elements of RIOXX. Firstly

dc:identifier

One of the things I’ve had to explain is that the profile, when it is expressed in an XML record, within that record the identifier identifies the OA item being described by the RIOXX metadata record (regardless of where it is). Recommended that it is the item itself – not the splash page. Whatever it is it must be an HTTP URI (a URL).

dc:relation and rioxxterms:version_of_record

rioxxterms:version_of_record is an HTTP URI which is a persistent identifier to the published version. Meanwhile dc:relation is a way to capture related items, to enable discovery of those.

dcterms:dateAccepted

This must be provided – it is one of the clearer defined dates and it acts as a trigger in a work flow. It’s much more precise than the date “published”. This is the most important date in RIOXX – we almost took out publication date but it’s still there, date of acceptance is the important one though.

rioxxterms:author and rioxxterms:contributors

Both of these accept an optional ID attribute, this must be an HTTP URI. Use of ORCID is strongly recommended. All authors should be represented as individual rioxxterms:author properties. The “first names author” can be indicated with another optional attribute: “first-named-author”. In a perfect world we’d have it work like a citation – possible – but data transfer in XML records can be rearranged hence the explicit attribute.

rioxxterms:contributors – is for all other contributors to the item.

rioxxterms:project

This now joins funder and project_id in one, slightly more complex, property. The use of funder IDs (DOIs in their HTTP URI form) from FundRef is recommended but other options are possible. For institutionally funded work you can use your institution’s ID, this is a free text field though so use the ID you have been given, or select something sensible. For multiple funders you simply repeat this element as many times as needed.

license_ref

This is from the NISO “Open Access Metadata and Indicators”. It is a simple element taking an HTTP URI and a start date. The URI should identify a license – there is work underway to create a “white list” of acceptable licenses. That will be a simple way to make data consistent – discussion going on with SHERPA and Jisc and HEFCE at the moment. The start date enables an expression of embargoes, with the date on which the license takes effect captured. All the business of interpreting licenses is out of scope – you just capture the information here.

So, in summary, we are close to release. Detailed guidelines are being drafted right now. The white list is being worked on. Jisc is funding work to develop plugins. And RIOXX has been taken up by HEFCE and RCUK so we have their support and engagement here.

Q&A

Q1: A quick question on implementation of RIOXX for CRIS systems. You mentioned DSpace and EPrints, but what about CRIS systems?

A1: I’ll leave the plugin question for Balvier. I did some work with the technical lead for CERIF, and she wrote up that discussion of how RIOXX could work for CERIF there. And the technical description for RIOXX allows you express it however you want to. My colleague Ian has created a schema for repositories but there is no reason it could not be used or developed for CRIS systems.

Balvier: We haven’t yet talked to CRIS vendors. But we know we do need to consider CRIS systems, to engage with the community. We are starting to have those conversations and looking at what needs to happen.

Q2: I did have one of those conversations with Balvier about RIOXX for CRIS systems, we’ve started the process with PURE here at Edinburgh and you can already harvest some data. But I had two questions: firstly about the corresponding author versus first named author.

A2: It’s my understanding is that it’s usually the same.

Q2: No, not always.

Comment: In biology it’s normally the last named author.

Comment: If you mean “lead author” then call it.

A: It was RCUK’s preference that it be “first named author” but the synonyms will be in the guidance.

Q2: So, my second question…why are items other than publications now out of scope?

A2: It’s to reflect RCUK needs, it has harmonised vocabularies. HEFCE are aware of that work…

Comment: We decided to focus on publications as we wanted this to be useful for the REF and many other types of items wouldn’t make sense in that RIOXX profile.

Q3: We have trouble tracking accepted or published dates sometimes so how can we go back and create RIOXX records for our publications?

A3: Another important element of RIOXX is that it isn’t in any sense retrospective – it’s not designed to do that. It’s designed to facilitate open access outputs for RCUK. It’s deliberately not a generic bibliographic record for open access publications

Q3: We can’t convert it?

A3: It’s open access, you’d be welcome too! But that wouldn’t be RIOXX, it would be something else.

Comment: I think this about taking RIOXX in the way it has been intended.

Paul: Yes, that’s a good point. RIOXX isn’t intended to be a standalone record – your repositories will have much richer metadata available. And it doesn’t make sense to subvert RIOXX to do that. By all means use the prior work in your own context. Only really useful if you are reporting back to RCUK or HEFCE – if that retrospective aspect becomes important then we need to consult again. But right now mandates don’t apply retrospectively.

Q4: Have you got the capability for more than one author ID?

A4: Not really. It’s not our problem in this thing to solve. We need a unique identifier for the author. If you have to map between multiple IDs thats a different problem.

Q4: So pick one based on funder requirements and prioritise?

A4: Yes.

Q5: I wonder about the issue of identifying publications. What motivates people and influences change. Impact and impact case studies have been important. How far do you have to focus your future spectacles to understand that we need a way to indicate impact in RIOXX? Is that in the future? Is it something RCUK have started to ask you about?

A5: I’ll give an answer but RCUK might have their own response. We have had these conversations but no focused piece of work in that record.

Comment: Right now our biggest problem is monitoring compliance with our open access policy, RIOXX will enable that in a significant way, but we also have other ways to monitor impact of research too.

Comment: That impact problem is being solved elsewhere. Taking a linked data view of that multiple ID issue, that idea of “sameAs” is doable elsewhere. So I think that we let that be solved elsewhere.

Paul: Yes, we ask for one ID, we let someone else make those decisions about which one.

Comment: Institutions are now minting ORCID IDs… if you are associated with more than one institution then you may have multiple ORCIDs and I guess that means it is up to the individual to resolve that – although it perhaps has all the makings of a potential car crash!

Comment: A similar issue to FigShare minting DOIs – can have two DOIs for the same publication.

Q6: You said that you would be interoperable with OpenAIRE, so if they adopt RIOXX will they be able to comply with OpenAIRE?

A6: Yes, that’s the intention. At the beginning of RIOXX we looked at OpenAIRE as an option for meeting funder requirements. But where funding is from the European Commission the use of OpenAIRE matters. It’s important. What we worked out is that if you produce a RIOXX record for an OA publication, funded by EU funding, then you can create an OpenAIRE record from your RIOXX record in a lossless way. I hestitate to say that you can forget about OpenAIRE BUT you can use RIOXX as a route to create OpenAIRE records. Jisc are talking to the Open University about doing this. But we are being careful that we haven’t put anything in RIOXX that might prevent us from doing this. But it won’t work the other way round – you can’t create a RIOXX record from an OpenAIRE record.

And now we return, after a lovely lunch, with two parallel sessions. In the library we have “Optimising resources to develop a strategic approach to Open Access (an OA Good Practice Pathfinder Project), Ellen Cole, Northumbria University”, whilst in the Informatics Forum – and on the blog – we have:

SHERPA Services Update (RoMEO, JULIET, FACT, Open DOAR), Bill Hubbard, University of Nottingham, Informatics Forum, G.07

My name is Bill Hubbard and I am based at the Centre for Research Communications, based at the University of Nottingham. And we have been running those for some 10 years now. And I will be talking about SHERPA services. Looking at the attendee list it’s great to see that a lot of you are practitioners with a really practical focus which will be great as we want to hear your views, your opinion.

To start things off, SHERPA services are currently majority funded by Jisc although we also have funding from others. And I’m going to start by showing some slides illustrating how what we do fits into the wider Jisc Repositories Shared Services Project – they are part of an integrated service within the UK, a cohesive support structure for the sector from Jisc.

So I will be talking a bit about what we do at the moment, the value proposition and benefits that we offer, and an example use case. We have three fundamental services and a service on top using our APIs – and others may be able to use those, I’ll suggest, for other innovative services.

RoMEO is the global service of authors’ rights for using repositories giving details journal by journal. It’s used by other services and some institutional repositories to embed permissions checking into the deposit process.

JULIET is a registry of policies on Open Access from research funders worldwide. Funders are actually trying to make policies clear and concise. JULIET lets you see the differences, to compare policies.

So if we combine RoMEO and JULIET we perhaps have a third thing…

OpenDOAR is the world’s authoritative and quality-assured directory of open access repositories. That is manually checked, thoroughy quality assured. And that’s work used by Jisc, at EDINA for the Jisc Publications Router. It’s useful for anyone trying to establish which repositories are where.

FACT is an end user service – advice to UK authors on compliance with funder’s policies in their journal of choice. It has been put together for academic projects. We inherited green and yellow colours to indicate publisher policies… Les asked our opening keynote what she would have done… If I’d known then… I would never have gone for green and gold! Until now I wouldn’t have given our other services for an author to look at… but it is designed for end users to be a simple system to use.

So, across these, we see our value proposition as being about these being services, and these are services with truly global reach. And I would thank you all for the feedback and suggestions and contributions that you have made. So we are global… but our funding is not global. Bit of tension there. For each of the three services we simplify information on repositories. We standardise. We put information into a standard set of descriptors. Thats a functional standard that we created and use. And we also summarise, to make it useful to you. We see that as having particular value. And each of our services have APIs – so that information is available to use on an open access basis, to be used in your institutional service or in other services.

So, we offer efficiency gains at sector level, we have single points of information, we provide quick access to policies and information, and we allow quick comparison of policies and information, and we are taking some of the pressure off the OA community – because these questions would be there anyway. And the feedback we get seems to back that up. But if you feel different, we’d love to hear your views.

At an operational level we maintain central datasets. We can compare between records. We can extract statistics. And we talk to publishers, to funders, to the OA community. And we can therefore see some of the places where tensions can arise, or where challenges or problem may be occurring. At a strategic level we see that we have a role in developing future OA support services. SHERPA services data used to prime systems.

We used the data from RoMEO and JULIET APIs to create the new SHERPA FACT in response to the RCUK requirements for open access. Those APIs made swift development of that possible. And you can also use our APIs to build things with our tools and data. Do let us know – we like to hear how they are being used – but they are there for you to use.

So these four services have four distinct groups of users. For FACT it’s for end users to check for compliance. For RoMEO and JULIET is can be used by repository managers. And we see OpenDOAR being used to understand the wider landscape. But that’s what we think… we know in the real world the users may be different, or the use cases may be different. For instance some RoMEO users use results as a starting point – heading off to look at the publishers own website for instance. So that’s a valid but different use case to what we initially envision.

But what about in the future? The world is changing. We have real momentum and it’s getting faster. To pick up on Yvonne’s point about revolution, I already see the tug of real change, a real people’s revolution. And certainly what I truly do believe is that this stuff has to be embedded in academic culture. It takes time, but it also takes official acceptance. Those revolutions that don’t get embedded, don’t become part of policy, through fudging and compromise, don’t actually take effect. You end up with idealogically messy situations. I’m really pleased with progress to date. I’m pleased with where we are and delighted with the RCUK and HEFCE policies as that’s real change and embedding. But we’ve always tried and wanted more people to engage. In some ways you could say that we’ve tried to engage with academics around OA on our terms, arguably we now have to engage with academics on OA on their terms. Funders are engaged, institutions are engaged, publishers are engaging… and that means changing working practices and processes.

So, how do our services respond?

It’s down to you. You have to use them and feed back on them in order for them to stay useful, relevant, and to prove their work. We’ve had lots of surveys, focus groups, a technical review, and our day to day feedback. But in this ever changing world we want to ask you whether what we do is what you want?

So, over to you. For instance likes and dislikes… take RoMEO…. and don’t be afraid to be critical! We’ll happily take criticism on board.

Comment 1: I find it very useful – along with all of the SHERPA services – are about the colours but also about the granularity… if between 12 and 18 months then I might need to go to the publishers website. And with SHERPA FACT you can see what kind of OA but not whether you need to pay for publication or not. There’s no way of seeing if it’s a green option.

A 1: So it tells you you can archive it, but not that you would have to pay for it. Perhaps you could send that comment to us too as my colleague has an encyclopedic knowedge

Comment 1: In general they are great though, not having to get to the publishers website is great.

A 1: Yes, finding information on publishers website is raised and we have raised it with them

Comment 2: I encourage my researchers to use RoMEO, and I use RoMEO all the time. But the colours are really confusing – they associate green RoMEO with green open access. I don’t look at colours, I read the terms, but the colours cause so much confusion.

Comment 3: With my funders hat on… if a journal is not compliant, is there usefulness in seeing what similar journals are compliant? Would that be good? Would that be an issue in terms of competition? Would that be too much?

A3:Really interesting. Lets see by show of hands… would that be useful? That looks like about a third of us think it would be, many fewer think it would not be.

Comment 4: As a librarian I’d find that useful. Academics probably do know the alternatives… may just be a frustration that their preferred journal isn’t compliant.

A4: What if alternatives were ranked by impact factors?

Comment 4: Depends on the credence you place in impact factors… ! I also had a question about how information is gathered for RoMEO… that sort of crowdsourcing model… do you think that works for maintaining accuracy and information? And as a corollary… there is a risk that combining information from different sources… are publishers likely/can they be interested in providing more information directly to RoMEO.

A4: Are publishers interested in making their terms and conditions clear? I’m not sure… BUT they would if there were commercial incentives. There is QA on RoMEO but it can be tough to get information back from publishers. And on your other question… we like that it’s a sort of quality controlled crowd sourcing. We do a lot of the work. If suggestions come in with a pointer to the right information, that’s fine, but sometimes a lot of work is involved. But I don’t see full QA and central updating as being realistic and scalable – another 3FTEs worth of work I suspect – so I don’t see that being workable.

Comment 5: I don’t use directly your services, so take this with a pinch of salt, but picking up on the compliance cycle… presumably the issue of related journals… the problem of understanding which journals are in which space would be useful to solve. That could almost be a key way in, especially into the publications process… with the other data an add on. And the other side of that… what happens if I publish in an non-compliant journal? Might be helpful to think about.

Comment 6: Speaking as a researcher… I’m aware that what I, and people like me, want from you is omniscient god-like-ness to make a judgement “yes” or “no”. But the real world is many levels of judgement, and confusion… and not helped by multiple conflicting statements from publishers and journals. And my local team making judgements… that may conflict with my judgement… And you said that magic word “crowd sourcing”… I would want to interpret statements multiple ways – that I can do X – and my librarian may disagree. There is a role for the community to filter up their decisions and their practice… that crowdsourcing element… rather than imposing a perception. Does that have any value?

A6: Yes, we can think of different ways about crowdsourcing, and it’s accuracy. RoMEO is about consistent interpretation… so we’d have to shift it to a different service so people understood where that information is coming from… that might take work away from us, which would be great. But it’s about a crowdsourced solution that the crowd thinks is right.

Another way to do this is to ask “why use RoMEO”? I think it’s because we don’t want to do something against copyright. If we can be assured that no-one will take action against us, if we knew publishers wouldn’t sue us…. would we need RoMEO? It seems important… but is it a defensive manoeuver. Are we driven by fear? By uncertainty?

Now we’ve talked about what you like, what you do. But how does it fit with what you need? Would you prefer all four services in one interface, say?

Comment 7: I tend to have a list in the repository in one window, the service in another, and copy and paste between. So if there was a way to give a list and say “check this”, that would be great. But yes, we do use RoMEO to check compliance. If we could do this all automatically in the repository that would be great…

A7: That may be an opportunity for using the API.

Comment 8: Picking up on the idea of a list of compliant journals, the publication lifecycle and workflow, in persuading academics to think about OA at the point of publication… That list is something I am asked for by academics. Often I have to get a list of relevant journals… and then I look them up. So a real time subject based system for decision making, that list of compliant journals, would be really useful. I did attend a SHERPA workshop and there was some support for that sort of idea there.

A8: We have thought about it, but it’s about implementation. But we will consider it.

Thank you to all of you for coming along. You can contact us with your feedback – we are on email and on Twitter – and we are at a crucial moment. We have support from key research players and we need to ensure we continue to meet your requirements.

And now we break into three strands: DSpace 4.0 Update from Claire Knowles, University of Edinburgh, in Appleton Tower M2B; Latest developments in Hydra-land from Chris Awre, University of Hull, in Appleton Tower M2C; and here in the Informatics Forum:

EPrints  Update, Les Carr, University of Southampton

Now, I wanted to start with this slide of how the web is changing things… an image of people in Somalia trying to get a phone signal. And here we see people viewing the world cup on a mac, in the middle of the desert. That is a huge impact on life. And I think we get frustrated that that impact doesn’t magically happen in academia… and those systems we have built in academia over the last 600 years have survived so solidly, through so many changes of government and culture… so changing them is really hard work. And the innovation we see elsewhere therefore isn’t so easy!

So, EPrints… we have been going for over ten years… we will be hitting the difficult teenage years soon! We used to talk about EPrints at Southampton, the Jisc Innovation projects, and the EPrints services. But everything is changing. Fewer Jisc projects. We are less connected to IT support. So now the focus is on EPrints Services, about providing services and hosting to repositories across the UK and across the world. And we’ve seen all sorts of mandates, policies, reporting coming in. And we see big data coming in, CRIS systems, innovation as key business. So how do we manage those as “insurmountable opportunities”.

So, if we look at a timeline here, from that call to action in 1994, we move on to open access, the projects, all the things we have been involved with at EPrints. So, in case you are wondering, here is the office where EPrints is based. We exist within the context of Southampton University. I’m academic lead for EPrints, part of the Web and Internet Science research group. And it falls within lots of areas of the university – it was one of the impact case studies in our REF return in fact.

Over the years we’ve had a lot of changes in the team. At the moment we are myself, John Darlington, Sheridan Brown, Adam Field – the business and community relations manager, Kelly Terrell, Justin Bradley, Jiadi Yao, Will Fyson, Nawar Halabi. Lots of change. Our technical lead Seb is about to join Tate Digital as their chief information architect.

So, what are we doing? We host 70 repositories. We support open access repositories for about 100 institutions. We are not for profit and much more about the impact of our work, about our belief in open access.

In terms of recent and future developments.. IRStats 2 v1.0 is released. We have EPrints4 developments. And we are trying to think about what the repository will be in 2016. And, basically, we’ve hear you. We are looking at scalabilty issues (browse pages in particular); search is not always accurate; limited UI interactions; E{rints3 built from many blocks – so it’s time to refactor and improve it, and really improve the engineering; and lots of new domains.

So we are trying to find the path to the actual repositories, the business requirements, of their repositories. Whether to report on open access, to improve marketing and knowledge presence on the web. So how do we take what we have, and make sure it’s what people want. So, what is EPrints for? Well it’s for research content – publication, educational resources, scientific data; research business – research management, research processes, research impact; and the research activity – data collection, and data analysis. It is NOT a graveyard for the end of projects, it is part of living activity. It’s to support researchers, research data, research outputs, research outcomes and impact, and your librarians.

So, what is EPrints 4?

Well it’s for open access, publications repositories, research reporting. There is a bigger agenda for librarians, information management and for curation. It’s not about creating a fantastic shiny new repository. We believe there is a lot in EPrints. We need to tune it for speed, efficiency, scale and flexibility. We had EPrints analysed… and printed on a 3D printer… enables you to see where the software is not optimised, where it’s too blocky. We had lots of help from those visiting colleagues.

So, over the last 9 months we’ve written some 11,000 lines of code. Search is essential as is reporting. We’ve improved integration with Xapian, DB transactions, memcached, fine-grained ACLs. So we’ve been doing lots of improvement work here. You can use EPrints with a UI, but we think its much better with one.

Thinking towards Repository ’16. All the issues of trying to report to funders, to government, in order to share services, SHERPA etc. This is all really important to us. We’ve been working with a number of institutions and with Jisc – so we want to ensure that these (e.g. Gateway to Research) are in place for everyone in EPrints in the UK. IRUS-UK is on Bazaar, Publications Router is native to EPrints (SWORD) and importers on Bazaar, ORCID – proof of concept on Bazaar, etc. And we’ve been breaking down some of that division between repositories and CRIS systems.

As a piece of technology repositories are not a thing, but a performance… it’s librarians and developers and researchers hand-in-hand changing the environment in which they work. And becoming more sophisticated. We hope the environment has improved in the last 10-12 years. It’s not just about enforcing historic norms but also stimulating new practices to emerge – around copyright and openness and intellectual property, and privacy, and creativity, and Science 2.0. Helping academics adapt to a changing world around them.

So, where do we see ourselves in that… we want to be in the big bright shiny future. We want to support you in all the things that your institution is trying to do.

Q&A

Q1: An easy question: what is the easiest way to tell EPrints about questions or concerns?

A1: We’ve been trying to answer that question… part of our answer is Adam’s new role. But we also try to talk to people, to ask them what they think and want whenever we are out and about. What would you want to ask me?

Q1: Nothing right now but historically really… is there somewhere central to log this stuff?

A1: That is where Adam’s role to sit. But we need to know what businesses, what institutions need.

Comment 2: Surely user groups are the way to do that?

A2: There is a UK EPrints user group and the last meeting was in Leeds. Maybe ULCC next. We will support that and come along whenever we can. We wanted that to be community led, but we want to come along, taking notes etc.

Comment 2: But Adam will be there for that.

Comment 3: But user groups or user support in particular really important for reporting issues and requesting developments. I’d like a case number and details when I submit a question. Everyone is lovely but I want more organisation and strategic support there.

A3: In the university this stuff just grew up. Kelly has come from the NHS and she will be doing this project management stuff, ensuring we are being more organised and structured.

Comment 4: EPrints is on GitHub, can track issues there.

A4: Great for developers… but only that part of the community. We need other routes for other parts of the community.

COmment 5: You need a helpdesk, a way for ordinary stuff to be resolved quickly, to track queries.

A5: Adam’s role is going to be about this. We haven’t had an organised way to do that before. That’s part of reorganising our team at the moment.

Comment 6: Can you offer training for someone like me – I’ve worked on 5 repositories before but there is still basic stuff I can’t do on EPrints.

A6: We do have regular training activities. I’d have to check what’s happening and get back to you but you’d be welcome to attend those.

Comment 6: It’s tiny things. I feel silly emailing about them, I’d like to learn to do it myself.

A6: I’ll take your details and we can work on that.

Q7: When will EPrints 4 coming out?

A7: We don’t have a date, but probably about a year away.

And, after a lovely coffee and cake break…

Implementing Open Access at University of Bournemouth and University College London, Jean Harris, Informatics Forum, G.07

I work in both University of Bournemouth and University College London – hence the two hats! So I’m going to talk to you about my experience at these two hugely different university. UCL is a Russell Group universiy, it has 28,000+ students, about half UG, half PG. And total research funding is £871m (2011/12). Meanwhile Bournemouth has 17000+ students, many more UG than PG with teaching as central as research.

Both universities undertake publications management. Both use Symplectic Elements to manage publications and EPrints for IR. At UCL library manages OA funding and publications through the Research Publications Service (RPS) and Discovery (EPrints). At Bouremouth the Research and Knowledge Exchange Office managed OA funding and BRIAN (Bournemouth Research Information And Networking), but also BURO (EPrints) is managed by the library. UCL Discovery (EPrints) is supported inhouse (which means we are not always at the head of the queue) vs. BURO which is a hosted service with support from the EPrints Service.

For those without Publications Managerment it’s sold as a one stop shop – connecting up repository and public facing profile pages. Bournemouth is a full text only service. In UCL it’s a mixed picture with very mixed quality metadata. Both have very patchy metadata on individuals and their profiles.

UCL is metadata only and full text outputs, There are over 317,000 total outputs. In terms of open access there are over 16,000 Full text (green and gold) items. Live gold without full text are around 6327 items. And includes Theses – 2814 live full text plus 230 embargoed out of 5111. Bournemouth is full text but not all open access – some items stored there for the REF only.

At UCL there is a virtual open access team – gold has a manager and three staff, green has a manager plus 3.27 staff (I’m the 0.27!) – they mainly work on Theses work, and we have 1 member of UCL press – they have just taken the imprint back and aim to make everything available through UCL Discovery and publishers for eBooks. At BU there is OA funding – RKE 1 Manager and admin support for processing. We have no full time repository staff. There is a rota of 3 editorial staff working one work in three on outputs, a 0.2 manager, and me (0.2).

Funding for OA is also quite different, UCL are funded by RCUK, UCL and Wellcome. UCL’s funded OA compliance for RCUK is 115%. UCL have pre payment agreements with a great number of journals – an agreement with Elsevier has saved a huge amount of time and money. BU funded OA statistics – the budget is what researchers bring in plus 100,000 from BU. There are far fewer articles here. But both BU and UCL have had some challenges spending their money – UCL has done some retro conversion for instance.

In terms of challenges for engagement… UCL Discovery includes metadata only outputs – and only full texts get checked by staff, academic engagement has been a challenge – many staff were used to having things done for them so it took a lot of training and coaching to get them through even with the large stick of the REF, and there can be difficulty sending large files – they have to use dropbox and/or break up files, they are furious about how h index is calculated in RPS, they don’t understand searching and filtering for RPS – crashed whole system with a common name and single common keyword searching of PubMed for instance. BU-Buro moved in 2013 to full text only. We have some mapping data issues. Incorrect publications no longer appear in staff profiles, and there is some confusion about where the repository is located – because it has moved.

In terms of challenges for OA. A new statement has reiterated UCL’s commitment to OA, but it will be hard to move academics to this place. We need them to understand versioning, to understand that they have academic freedom in terms of where to publish – but they still have to comply with OA requirements. And we have to get them to deal with multiple entries. And there is an issue of sheer volume – UCL staff can be overwhelmed as there is such a drive there. For BU there is advocacy. Academic freedom is less of an issue… plus academics are more used to manual entry of metadata but there remain concerns about being scooped! And publishing timelines can be an issue.

For both organisations there are challenges  of deposit on acceptance, the range of OA options, establishing new workflows, moving goalposts, flexible support – workshops aren’t enough so you really need one-to-one support sessions, encouraging champions in faculties – to get them to engage with the repository. And we want to use REF2020 as a stick, but also as a carrot for their research. UCL is one of the new Jisc Pathfinder projects too.

But what happens when the money runs out?

At UCL the university as a whole supports Green OA but assists academics to meet their requirements through the gold route.

Q1: How do you comply 115%?

A1: By publishing more articles than they are required to. They are very proud of it.

Q2: In terms of dealing with Symplectic – how is deduplication of different data from different platforms dealt with?

A2: They don’t yet, we’ve called for it – through suggesting corrections etc. Sometimes there can be four matching/correct entries. You can join these in Symplectic but most people don’t notice or care about that. People don’t notice that duplication. So that’s a user workflow thing. The feed goes into the pending file… you get a message from IT services… and you then go and approve (or not) those items. It’s not really engaging them. You get a choice of options, if a manual entry from library staff that takes priority in the repository itself. You want to bring it all together, and validate that, giving the best record. Symplectic gives you best choice but that’s not ideal. But new version is coming out which will help with deduplication. But we’ve yet to see how that is decided. But it will help us – we’ll be able to see the full text attached.

Glasgow Led Jisc Open Access Project – End-to-End Open Access Process Review and Improvements, Valerie McCutcheon, University of Glasgow, Informatics Forum, G.07

I’ve worked at Glasgow University for the last 12 years, and for the last few years I’ve been working on research and open access. Two years ago we linked the repository to the research system so funders could see the outputs. I had been working in research outcomes, so really interesting to work with the repositories. But I’m here to talk about the Glasgow led Jisc Open Access Project which is a collaborative project with a number of partners including EPrints. We are a really varied group. From big organisations (with lots of publications) – Glasgow and Southampton – to wee ones like Lancaster or Kent. Some are in the north, some in the south. Some have EPrints out of the box, some have complex set ups with loads of customisations.

So, who else is involved? Well the Association for Research Managers and Administrators – we try to engage and join up with them. We don’t want to reinvent the wheel. We work with the EPrints User Group as well as other appropriate user groups. We speak to SCONUL, RLUK, and UK Council of Research Repositories. Really everyone. We want you to be involved, to engage, to collaborate and share!

So, what are we doing that is “special”? Well we are manifesting standard new open access metadata profile – an EPrints case study (generic, just happens to be EPrints), Hydra case study; EPrings OA reporting functionality; EPrints improved award linkage functionality. Some organisations have awards in their systems, others doing it manually or looking at how to do it soon. So we have proposed a number of generic workshops. An early stage workshop on issue identification and solution sharing – of use to our project but also to other Pathfinder projects. We also want to do a workshop, probably in November or December, on embedding future REF requirements – and we’ll collaborate and reuse (not repeat) with other Pathfinder workshops. We may also run a workshop on advocacy and as we reach the end of the project we will run a workshop to disseminate and share.

Are we duplicating standards? Well there is a Consortia Advancing Standards in Research Information Management (CASRAI-UK) bringing together standards and ideas, bringing them together and sharing. We will use output from RIOXX to feed into what we do, to implement it. We will use the V4OA work. And there is also NISO which is also looking at standards, and will take some of the appropriate standards from there. So we will be using one big spec that enables everything institutions need for research information management.

Right now we have a big spreadsheet of metadata – looking for concensus on what’s required for that basic specification for the EPrints software community. What I should mention is that we are very much focusing on what we want, not how it is delivered – that’s for the developers to think about. So that spreadsheet includes comments from the community, questions for us to answer. Things to be clarified from RIOXX, from HEFCE, etc. So, for instance, we have entries on date of article acceptance, on information on EU funding, etc.

We also have the APC finance workflow here – we are looking across workflows, to see what happens, why, to understand how our work can fit into those workflows and processes.

So, we have a workshop on 4th September on current initiatives on Open Access, and particularly on the Metadata. etc. Come along!

And we are part of these Pathfinder projects – there are a number of us and, to borrow a statement in a non-political way, we are “better together” – we need to work together, to collaborate, to share experience.

Q&A

Q1: You mentioned V4OA and CASRAI – do you have a sense of how they will work together?

A1: My impression is that we are synthesising these with RIOXX, to make sense of them all…

Comment from Balvier: In terms of V4OA, specific elements of vocabularies we all work on… APCs, embargoes, etc. are those types of terms. They were intended to be part of other information systems, including RIOXX. In terms of CASRAI it was about agreeing particular vocabularies – e.g. “organisational identifiers”. There isn’t overlap, they are distinct areas.

A1: So it can be synthesised and made coherent.

Lessons in Open Access Compliance in Higher Education (LOCH), Dominic Tate, University of Edinburgh, Informatics Forum, G.07

I’m aware that we are on the last leg of the day, so I will move through this quite swiftly. This project, LOCH, is another Pathfinder project. It is led by University of Edinburgh – a large research led Russell Group university – with Heriot Watt University (research led but business and industry focused), and with St Andrews University – the oldest university in Scotland, much smaller size. We all work together through the Scottish Digital Library Consortium (SDLC) and use PURE and some other common systems.

So, we are looking at managing open access payments – including a review of current reporting methods and creation of sharable spreadsheet templates for reporting to funders. The second strand is about using PURE as a tool to manage Open Access compliance, verification and reporting – we are all very active members of the PURE UK user group and are feeding in new requirements. Finally we will be adapting institutional workflows to pre-empt Open Access requirements and make compliance as seamless as possible for academics.

We are already working on a functional specification for PURE, we will then be looking at publishing workflows, and we have 6 case studies expected, three webinars, three workshop events, minimum of 24 blog posts, pilot services and workflows. And we will be disseminating through conferences, journal articles, etc.

Q1: Will you be working to current or forthcoming policies? They are putting RDM etc. for REF 2016… is that realistic?

A1: We are concentrating on existing policies and requirements. The “extra credit” bit isn’t known yet so not appropriate to focus on that yet. We have someone working with the project, from Elsevier, who really understands those upcoming changes. In terms of those additional requirements it will happen, the user group is an important part of that. If you look at the last REF Elsevier did develop stuff to meet the requirements.

The Mechanical Curator, Ben O’Steen, British Library, Informatics Forum, G.07

World Premiere: Open Access in Scotland

Dominic: Unfortunately Ben O’Steen is ill and cannot be here, however any questions about his project, The Mechanical Curator, can be asked of Steph Taylor who is here. Anyway, Like a rubbish substitute teacher, I’m going to show you a video. This is a world premiere. This was put together by my colleagues and some outside filmmakers, and this has been funded by Jisc. We wanted a video for use in advocacy. We went to universities across the UK, spoke to repository managers and maybe didn’t get the views we expected. We didn’t want this to be some sort of North Korean propaganda movie so there is a mixture of views.

And now we are watching Open Access in Scotland (which was super).

An A-Z of RDM (Song), Robin Burgess, Glasgow School of Art, Informatics Forum Atrium

Robin has based this song on two projects – one focused on capturing research data, one working with early career researchers on research data management. And I dedicate this to my manager Nicola Simonson.

And with that, a lovely a capella version of Robin’s A-Z of RDM, we conclude for the day with drinks, networking, and – if participants have read Edinburgh’s weather report for the next few days – a rare opportunity to enjoy a beautiful sunny evening! We will be back on the live blog tomorrow morning, see you then!