Repository Fringe Day Two LiveBlog

Welcome back to day two of Repository Fringe 2015! For a second day we will be sharing most of the talks via our liveblog here, or you can join the conversation at #rfringe15. We are also taking images around the event and encourage you to share your own images, blog posts, etc. Just use the hashtag and/or let us know where to find them and we’ll make sure we link to your coverage, pictures and comments.

This is a liveblog and that means there will be a few spelling errors and may be a few corrections required. We welcome your comments and, if you do have any corrections or additional links, we encourage you to post them here. 

Integration – at the heart of  – Claire Knowles, University of Edinburgh; Steve Mackey, Arkivum

Steve is leading this session, which has been billed as “storage” but is really about integration.

We are a company, which came out of the University of Southampton, and our flagship Arkivum100 service has a 100% data integrity guarantee. We sign contracts in the long term, for 25 years – most cloud services sign yearly contracts. But we also have a data escrow exit – so there is a copy on tape that enables you to retrieve your data after you have left. It uses all open source encryption which means it can be decrypted as long as you have the key.

So why use a service like Arkivum for keeping data alive for 25+ years. Well things change all the time. We add media all the time, more or less continually… We do monthly checks and maintenance updates but also annual data retrieval and integrity checks. There are companies, Sky would be an example, that has a continual technology process in place – three parallel systems – for their media storage in order to keep up with technology. There is a 3-5 year obsolescence of services, operating systems and software so we will be refreshing hardware, and software and hardware migrations.

The Arkival appliance is a CIFS/NFS rpresentation which means it integrates easily to local file systems. There is also a robust REST API. There is simple administration of users permissions, storage allocations etc. We have a GUI for file ingest status but also recovery pre-staging and security. There is also an ingest process triggerd by timeout, checksum, change, manifest – we are keen that if anything changes you are triggered to check and archive the data before you potentially lose or remove your local comment.

So the service starts with original datasets and files, we take copy for ingest, via the Arkivum Gateway on Appliance, we encrypt and also decrypt to check the process. We do check sums at all stages. Once all is checked it is validated and sent to our Archive on the Janet Network, and it is also archived to a second archive and to the escrow copy on tape.


We sit as the data vault, the storage layer within the bigger system which includes data repository, data asset register, and CRIS. Robin Taylor will be talking more about that bigger ecosystem.

We tend to think of data as existing in two overlapping cycles – live data and archive data. We tend to focus much more on the archive side of things, which relates to funder expectations. But there is often less focus on live data being generated by researchers – and those may be just as valuable and in need of securing as that archive data.

In a recent Jisc Research Data Spring report the concept of RDM Workflows is discussed. See “A consortial approach to building an integrated RDM system – “small and specialist”” and that specifically talks about examples of workflows, including researcher centric workflows that lays out the process for the research to take in their research data management. We have examples in the report include those created for Loughborough and Southampton.

Loughborough have a CRIS, they have DSpace, and they use FigShare for data dissemination. You can see interactions in terms of the data flow are very complex [slides will be shared but until then I can confirm this is a very complex picture] and the intention of the workflow and process of integration is to make that process simpler and more transparent for the researcher.

So, why integrate? Well we want those process to be simpler and easier to encourage adoption and also lower cost of institutional support to the research base. It’s one thing to have a tick box, it’s another to get researchers to actually use it.  We also, having been involved multiple times, have experience in the process of rolling RDM out – our work with ULCC on CHEST particularly helped us explore and develop approaches to this. So, we are checking quality and consistency in RDM across the research base. We are deploying RDM as a community driven shared service so that smaller institutions can “join forces” to benefit from having access to common RDM infrastructure.

So, in terms of integrations we work with customers with DSpace and EPrints, with customers using FigShare, and moving a little away from the repository and towards live research data we are also doing work around Sharegate (based on archiving Sharepoint), iRODS and QStar; and with ExLibris Rosetta and archivematica. We really have yet to see real preservation work being done with research data management but it’s coming, and archivematica is an established tool for preservation in the cultural heritage and museums sector.

Q1) Do you have any metrics on the files you are storing?

A1) Yes, you can generate reports from the API, or can access via the GUI. The QStar tool, and HSM tool, allows you to do a full survey of the environment that will crawl your system and let you know about file age and storage etc. And you can do a simulation of what will happen.

Q2) Can I ask about the integration with EPrints?

A2) We are currently developing a new plugin which is being driven by new requirements for much larger datasets going into EPrints and linking through. But the work we have previously done with ULCC is open source. The Plugins for EPrints are open source. Some patches created were designed by @mire so a different process but after those have been funded they are willing for those to be open source.

Q3) When repositories were set up there was a real drive for the biggest repository possible, being sure that everyone would want the most storage possible… But that is also expensive… And it can take a long time to see uptake. Is there anything you can say that is helpful for planning and practical advice about getting a service in place to start with? To achieve something practical at a more modest scale.

A3) If you use managed services you can use as little as you want. If you build your own you tend to be into a fixed capital sum… That’s a level of staffing that requires a certain scale. We start as a few terabytes…

Comment – Frank, ULCC) We have a few customers who go for the smallest possible set up for trial and error type approach. Most customers go for the complete solution, then reassess after 6 months or a year… Good deal in terms of price point.

A3) The work with Jisc has been about looking at what those requirements are. From CHEST it is clear that not all organizations want to set up at the same scale.

Unfortunately our next speaker, Pauline Ward from EDINA,  is unwell. In place of her presentation, Are your files too big? (for upload / download) we will be hearing from Robin Taylor.

Data Vault – Robin Taylor 

This is a collaborative project with University of Manchester, funded by JISC Research Data Spring.

Some time back we purchased a lot of kit and space for researchers, giving each their own allocation. But for the researcher the workflow the data is generated, that goes into a repository but they are not sure what data to keep and make available, what might be useful again, and what they may be mandated to retain. So we wanted a way for storing that data, and that needed to have some sort of interface to enable that.

Edinburgh and Manchester had common scenarios, commons usages. We are both dealing with big volumes of data, hundreds of thousands or even millions of files. It is impossible to use mechanisms of web interfaces for upload. So we need that archiving to happen in the background.

So, our solution has been to establish the Data Vault User Interface, that interacts with a Data Vault Broker/Policy Engine that interacts with Data Archive, and the Broker then interacts with the active storage. But we didn’t want to build something so bespoke that it didn’t integrate with other systems – the RSpace lab notebooks for instance. And it may be that the archive might be Arkivium, or might be Amazon, or might be tape… So our mechanism abstracts that in a way, to create a simple way for researchers to archive their data in a standard “bag-it” type way.

But we should say that preservation is not something we have been looking at. Format migration isn’t realistic at this point. At the scale we are receiving data doesn’t make that work practical. But, more important, we don’t know what the suitable time is, so instead we are storing that data for the required period of time and then seeing what happens.

Q1) You mentioned you are not looking at preservation at the moment?

A1) It’s not on our to do list at the moment. The obvious comparison would be with archivematica which enables data to be stored in more sustainable formats, because we don’t know what those formats will be… That company have a specific list of formats they can deal with… That’s not everything. But that’s not to say that down the line that won’t be something we want to look at, it’s just not what we are looking at at the moment. We are addressing researchers’ need to store data on a long term basis.

Q1) I ask because what accessibility means is important here.

A1) On the live data, which is published, there is more onus on the institution to ensure that is available. This hold back wider data collections

Q2) Has anyone ever asked for their files back?

A2) There is discussion ongoing about what a backup is, what an archive is etc. Some use this as backup but that is not what this should be. This is about data that needs to be more secure but maybe doesn’t need to have instant access – which it may not be. We have people asking about archiving, but we haven’t had requests for data back. The other question is do we delete stuff that we have archived – the researchers are best placed to do that so we will find out in due course how that works.

Q3) Is there a limit on the live versus the archive storage?

A3) Yes, every researcher has a limited quantity of active storage, but a research group can also buy extra storage if needed.  But the more you work with quotas, the more complex this gets.

Comment) I would imagine that if you charge for something, the use might be more thoughtful.

A3) There is a school of thought that charging means that usage won’t be endless, it will be more thoughtful.

Repositories Unleashing Data! Who else could be using your data? – Graham Steel, ContentMine/Open Knowledge Scotland

Graham is wearing his “ContentMine: the right to read is the right to mine!” T-shirt for his talk…

As Graham said (via social media) in advance of his  talk, his slides are online.

I am going to briefly talk about Open Data… And I thought I would start with a wee review of when I was at Repository Fringe 2011 and I learned that not all content in repositories was open access, which I was shocked by! Things have gotten better apparently, but we are still talking about Gold versus Green even in Open Access.

In terms of sharing data and information generally many of you will know about PubMed.

A few years a blogger/Diabetes Researcher, Jo Brodie and regular tweeter asked why PubMed didn’t have social media sharing buttons. I crowd-sourced opinion on the issue and sent the results off to David Lipman (who I know well) who is in overall charge of NBCI/PubMed.  And David said “what’s social media ?”.

It took about a year and three follow ups but PubMed Central added a Twitter button and by July 2014, sharing buttons were in place…

Information wants to be out there, but we have various ways in which we stop that – geographical and license restricted streaming of video for instance.

The late great Jean-Claude Bradley saw science as heading towards being led by machines… This slide is about 7 years old now but I sense matters have progressed since then !


But at times, is not that easy to access or mine data still – some publishers charge £35 to mine each “free” article – a ridiculous cost for what should be a core function.

The Open Knowledge Foundation has been working with open data since 2004… [cue many daft Data pictures about Star Trek: The Next Generation images!].

millennium falcon

We also have many open data repositories, figshare (which now has just under 2 million uploads), etc. Two weeks back I didn’t even realize many universities have data repositories but we also want Repositories Unleashing Data Everywhere [RUDE!] and we also have the new initiative, the Radical Librarians Collective…


Les Carr, University of Southampton (


The Budapest Open Access Initiative kind of kicked us off about ten years ago. Down in Southampton, we’ve been very involved in Open Government Data and those have many common areas of concern about transparency, sharing value, etc.

And we now have which enables the sharing of data that has been collected by government. And at Southampton we have also been involved recently in understanding the data, the infrastructure, activities, equipment of academia by setting up That is a national aggregator that collects information from open data on every institution… So, if you need data on, e.g. on DNA and associated equipment, who to contact to use it etc.

This is made possible as institutions are trying to put together data on their own assets, made available as institutional open data in standard ways that can be automatically scraped. We make building info available openly, for instance, about energy uses, services available, cafes, etc. Why? Who will use this? Well this is the whole thing of other people knowing better than you what you should do with your data. So, students in our computer science department for instance, looked at building recommended route apps, e.g. between lectures. Also the cross with catering facilities – e.g. “nearest caffeine” apps! It sounds ridiculous but students really value that. And we can cross city bus data with timetables with UK Food Hygiene levels – so you can find where to get which bus to which pub to an event etc. And campus maps too!

Now we have a world of Open Platforms – we have the internet, the web, etc. But not Google – definitely not open. So… Why are closed systems bad? Well we need to move from Knowledge, to Comprehension, to Application, to Analysis, to Synthesis and to Evaluation. We have repositories at the bottom – that’s knowledge, but we are all running about worrying about REF2020 but that is about evaluation – who knows what, where is that thing, what difference does that make…

So to finish I thought I’d go to the Fringe website and this year it’s looking great – and quite like a repository! This year they include the tweets, the discussion, etc. all in one place. Repositories can learn from the Fringe. Loads of small companies desperate for attention, and a few small companies who aren’t bothered at all, they know they will find their audience.

Jisc on Repositories unleashing data – Daniela Duca, Jisc


I work in the research team at Jisc and we are trying to support universities in their core business and help make research process more productive. And I will talk about two projects in this area: UK Research Data Discovery Service and the second Research Data Service.

The Research Data Discovery Service (RDDS) is about making data more discoverable. This is a project which is halfway through and is with UK Data Archive and the DCC. We want to move from a pilot to a service that makes research data more discoverable.

In Phase 1 we had the pilot to evaluate the Research Data Australia developed by ANDS, with contributions from UK data archive, Archeology data centre, and NERC data centres. In Phase 2 Jisc, with support from DCC and UKDA are funding 9 more institutions to trial this service.

The second project, Research Data Usage and Metrics comes out of an interest in the spread of academic work, and in the effectiveness of data management systems and processes. We are trying to assess use and demand for metrics and we will develop a proof of concept tool using IRUS. We will be contributing to and drawing upon a wide range of international standards.

And, with that, we are dispersing into 5 super fast 17 minute breakout groups which we hope will add their comments/notes here – keep an eye on those tweets (#rfringe15) as well!

We will back on the blog at 11.15 am after the breakouts, then coffee and a demo of DMAOnline – Hardy Schwamm, Lancaster University.

And we are back, with William Nixon (University of Glasgow) chairing, and he is updating our schedule for this afternoon which sees our afternoon coffee break shortened to 15 minutes.

Neil and I will be talking about some work we have been doing on linking research outputs. I am based at the British Library working as part of a team working on research outputs.

Linking Data – Neil Chue Hong, Software Sustainability Institute; Rachael Kotarski, Project THOR

Rachael: Research is represented by many outputs. Articles are some of the easier to recognise outputs but what about samples, data, objects emerging from research – they could be 100s of things… Data citation enables reproducibility – if you don’t have the right citation, and the right information, you can’t reproduce that work.

Citation also enables acknowledgement, for instance of historical data sets and longitudinal research over many years which is proving useful in unexpected ways.

Data citation does also raise authorship issues though. A one line citation with a link is not necessarily enough. So some of the work at DataCite and the British Library has been around linking data and research objects to authors and people, with use of ORCID alongside DOIs, URLs, etc. Linking a wide range of people and objects together.

THOR is a project, which started in June, on Technical and Human Infrastructure. This is more work on research objects, subject areas, funders, organisations… really broadening the scope of what should be combined and linked together here.

So the first area here is in research – and understanding the gaps there, and how those can be addressed. And bringing funders into that. And we are also looking at integration of services etc. So one thing we did in ODIN and are bringin into THOR is about connecting your ISNI identifier to ORCID, so that there is a relationship there, so that data stays up to date. And the next part of the work is on Outreach – work on bootcamps, webinars, etc. to enable you to feed into the research work as well. And, finally, we will be looking at Sustainability, looking at how what we are doing can be self-funded beyond the end of the project, through memberships of partner organisations: CERN, DataCite, ORCID, DRYAD, EMBL-EBI, ands, PLoS, Elsevier Labs, Panomia(?). This is an EU funded project but it has international scope and an international infrastructure.

So we want to hear about what the issues are for you. Talk to us, let us know.

Linking Software: citations, roles, references and more – Neil Chue Hong

Rachael gave you the overview, I’m going into some of the detail for software, my area. So we know that software is part of the research lifecycle. That lifecycle relies on the ability to attribute and credit things and that can go a bit wrong for software. Thats because our process is a little odd… We start research, we write software, we use software, we produce results, and we public research papers. Now if we are good we may mention the software. We might release data or software after publication… rather than before…

A better process might be to start the research, to identify existing software, we might adapt or extend software, release software (maybe even publish a software paper), use software, produce results, might release data and public data paper, and then we publish research paper. That’s great but also more complex. Right now we use software and data papers as proxies for sharing our process.

But software is not that simple, the boundaries can be blurry… Is it the workflow, the software that runs the workflow, the software that references the worksflow, the software that supports the software that references the workflow, etc? What’s the useful part? Where should the DOI be for instance? It is currently at programme level but is that the right granularity? Should it be at algorithm level? At library level? Software has the concept of versioning – I’d love our research to be versioned rather than “final” but that’s a whole other talk! But the versioning concept indicates a change, it allows change… but again how do we decide on when that version occurs?

And software also has the problem of authorship – which authors have had what impact on each version of the software? Who has the largest contribution to the scientific results in a paper? So for a project I might make the most edits to a code repository – all about updating the license – so the biggest contribution but would the research community agree? Perhaps not. Now I used to give this talk and say “this is why software is nothing like data” but now I say “this is why software is exactly like data”!

So, the different things happening now to link together these bits and piece. GitHub, Zenodo, FigShare and Institutional Repo looked at “package level” one click deposit, with a citable DOI. There has been work around sword deposit which Stuart Lewis has been looking at too. So you can now archive software easily – but that’s easily – but it’s the social side that needs dealing with. So, there is a brand new working group led by Force11 on Software Citation – do get involved.

And there are projects for making the roles of authors/contributors clearer: Project Credit is looking at Gold/Silver/Bronze levels. But Contributor Badges is looking at more granular recognition. And we also have work on code as a research object, and a project codemeta that is looking at defining minimal metadata.

So that brings us to the role of repositories and the Repository Fringe community. Imperial College for instance is looking at those standards and how to include them in repositories. And that leads me to my question to you: how does the repository community support this sort of linkage?

Q1 – William Nixon) Looking at principles of citation… how do you come up with those principles?

A1 – Neil) Force11 has come up with those citation principles and those are being shared with the community… But all communities are different. So it is easy to get high level agreement, but it is hard to get agreement at implementation details. So authorship changes over time, and changes version to version. So when we create principles for citation do we create all collectively and equally, or do we go the complex route of acknowledging specific individual contributions for a particular version. This causes huge debate and controversy in the open source community about who has the appropriate credit etc. For me, what do we need to deposit? Some information might be useful later in reward lifecycle…. But if I’m lead author will that be a priority here?

Q2 – Paul Walk) My internal hippie says that altruism and public good comes into open source software and I wonder if we are at risk of messing with that sordid research system…

A2 – Neil) I would rebutt that most open source contribution and development is not altruistic. It is people being rewarded in some way – because doing things open soure gives them more back than working alone. I wouldn’t say altruism is the driving force or at least hasn’t been for some time.. It’s already part of that research type system.

Comment) For me this is such a high level of where we are, you are talking about how we recognise contribution, citation etc. but just getting things deposited is the issue for me right now… I’d love to find out more about this but just convincing management to pay for ORCID IDs for all is an issue even…

A2 – Rachael) We do need to get work out about this, show how researchers have done this and the value of those will help. It may not just be through institutions but through academic societies etc. as well..

A2 – Neil) And this is back to the social dimension and thinking about what will motivate people to deposit. And they may take notice of editors… Sharing software can positively impact citations and that will help. Releasing software in the image processing community for instance also shows citations increase – and that can be really motivating. And then there is the economic impact for universities – is there a way we can create studies to show positive reputation and economic impacts on the institution that will prove the benefit for them.

Q3) A simple question – there are many potential solutions for software data… but will we see any benefits from them until we see REF changing to value software and data to the same extent as other outputs.

A3 – Neil) I think we are seeing a change coming. It won’t be about software being valued as much as papers. It will be about credit for the right person so that they are value. What I have seen in research council meetings is that they recognise that other outputs are important. But in a research project credit tends to go to the original writer of a new algorithm perhaps, not the developer who has made substantial changes. So where credit goes matters – the user, implementer, contributor, originator, etc? If I don’t think I will get suitable credit then where is the motivation for me to deposit my software?

EC Open Data Pilot, EUDAT, OpenAIRE, FOSTER and PASTEUR4OA – Martin Donnelly, Digital Curation Centre

I was challenged yesterday by Rachael and by Daniela at Jisc to do my presentation in the form of a poem…

There once was a man from Glasgee
Who studied data policy
In a project called FOSTER
Many long hours lost were
And now he?ll show some slides to ye?

So I will be talking about four European funded projects on research data management and open access that are all part of Horizon 2020. Many of you will be part of Horizon 2020 consortia, or will be supporting researchers who are. It is useful to remind ourselves of the context by which these came about…

Open Science is situated within a context of ever greater transpareny, accessibility and accountability. It is both a bottom up issue: the OA concept was coined about 10 years back in Budapest and was led by the high energy physics community who wanted to be more open in sharing their work, and to do so more quickly.  And it has also been driven from the top through government/funder support, increasing public and commercial engagement in research. To ensure better take up and use of research that has been invested in.

Policy wise in the UK the RCUK has seven Common Principles on Data Policy, size of the RCUK funders require data management plans. That is fitting into wider international policy moves. Indeed if you thought the four year EPSRC embargo timeline was tight, South Africa just introduced a no more than 12 month requirement.

Open Access was a pilot in FP7, this ran from August 2008 until the end of FP7 in 2013. It covers parts of FP7, but it is covers all of FP8/Horizon 2020 although that is a pilot process intended to mainstream by FP9 or whatever it is known by. The EC sees real economic benefit to IA by supporting SMEs and NGOs that can’t afford subscriptions to latest research. Alma Swan and colleagues have written on the opportunity costs which provides useful context to the difference Open Access can make.

Any project with H2020 funding have to make any peer-reviewed journal article they publish in an openly available and free to access, free of charge, via a repository – regardless of how they publish and whether green or gold OA.

H2020 also features an Open Research Data pilot – likely to be requirement by FP9. It applies to data and metadata needed to validate scientific results which should be deposited in a dedicated data repository. Interestingly, whilst data management plans needs to be created 6 months into project, and towards the end, they don’t require them to be filed with the EU at the outset.

So, lastly, I want to talk about four projects funded by the EU.

Pasteur4OA aims to simplify OA mandates across the EU – so that funders don’t have conflicting policy issues. That means it is a complex technical and diplomatic process.

OpenAIRE aims to promote use and reuse of outputs from EU funded research

EUDAT offers common data services through geographically distributed resilient network of 35 European organisations. Jisc and DCC are both working on this, integrating the DCC’s DMP Online tool into those services.

The FOSTER project sis supporting different stakeholders, especially younger researchers, in adopting open access in the context of the European Research Area and to make them aware of H2020 requirements of them – with a big carrot and a small stick in a way. We want researchers to integrate open acces sprinciples and practice in their current research workflow – rather than asking them to change their way of working entirely. We are doing train the trainer type activities in this area and also facilitating adoption, reinforcement and of OA policies within and beyond the EC. Foster is doing this work through various methods, including identifying existing content that can be reused, repackaged, etc.

Jisc Workshop on Research Data Management and Research at Risk Activities, and Shared Services – Rachel Bruce, Daniela Duca, Linda Naughton, Jisc

Rachel is leading this session…

This is really a discussion session but I will start by giving you a very quick overview of some of the work in research at ris as well. But this is a fluid session – we are happy to accommodate other topics that you might want to talk about. While we give you a quick overview do think about an RDM challenge topic you might want to take the chance to talk about.

So, in terms of Research at Risk this is a co-design challenge. This is a process we take forward in Jisc for research and development, or just development end of the spectrum, but to address sector challenges. The challenges facing the sector here is about the fragmented approach to research data and infrastructure. Because of that we are probably not reaching all the goals we would wish to. Some of that relates quite closely to some of what David Prosser was saying yesterday about open access and the benefits of scale and shared services. So, we have been asked to address those issues in Research at Risk.

Within Research at Risk we have a range of activities, one of the biggest is about shared services, including in the preservation and curation gap. You have already heard about discovery and research data usage, also the Research Data Spring.

So, the challenges we want to discuss with you are:

  1. The Shared services for RDM – yesterday there was discussion around the SHERPA services for instance. (Rachel will lead this discussion)
  2. Journal research data policy registry (Linda will lead this session)
  3. Business case and funding for RDM – articulating the role of RDM (Daniela will lead this session)
  4. But also anything else you may want to discuss… (Varsha will lead this group discussion)

So, Shared Services… This is an architecture diagram we have put together to depict all of the key services to support a complete data management service, but also linking to national and international services. And I should credit Stuart Lewis at UoE and John Lewis (Sheffield?) who had done much of this mapping already. We have also undertaken a survey of respositories around potential needs of HEIs. Some responses around a possible national data repository; a call for Jisc to work with funders on data storage requirements for them to provide suitable discipline specific data storage mandate.

Linda: I will talk a bit about the Journal Research Data Policies Registry – you can find out more on our blog and website. We want to create a registry that allows us to turn back time to see what we can learn from OA practices. The aim is to develop best practice on journal policies between publishers and other stakeholders. We want to know what might make your life easier in terms of policies, and navigating research data policies. And that input into this early stage work would be very valuable.

Daniela: The business case and costings for RDM is at a very early stage but we are looking at an agreed set of guidance for the case for RDM and for costing information to support the business case in HEIs for research data management. This reflects the fact that currently approaches to funding RDM services and infrastructure vary hugely, and uncertainty remains… And I would like to talk to you about this.

Rachel: we thought we would have these discussions in groups and we will take notes on the discussions as they take place, and we will share this on our blog. We also want you to write down – on those big post it notes – the one main challenge that you think needs to be addressed which we will also take away.

So, the blog will be going quiet again for a while but we’ll try and tweet highlights from groups, and grab some images of these discussions. As Rachel has said there will also be notes going up on the Jisc Research at Risk blog after today capturing discussions… 

Cue a short pause for lunch, where there was also be a demo taking place from: DMPonline – Mary Donaldson and Mick Eadie, University of Glasgow.

Our first talk of this afternoon, introduced by William Nixon, is:

Unlocking Thesis Data – Stephen Grace, University of East London

This project is for several different audiences. For Students it is about bridging to norms of being a career research, visability and citations. Helping them to understand the scholarly communication norm that is becoming the reality of the world. But this also benefits funders, researchers, etc.

We undertook a survey (see: and we found several already assigning DOI’s to theses, but others looking to do more in this area. We also undertook case studies in six institutions, to help us better understand what the processes actually are. So our case studies were for University of East London; University of Southampton; LSE; UAL; University of Bristol; and University of Leicester. Really interesting to see the systems in place.

We undertook test creation of thesis DOIs with University of East London and University of Glasgow, and University of Southampton undertook this via an XML upload so a slightly more complex process. In theory all of that was quite straightforward. We were grateful for the Jisc funding for that three month project, it didn’t get continuation funding but we are keen to understand how this can happen in more institutions and to explore other questions: for instance how does research data relate to the theses, what is it’s role, is it part of the thesis, a related object etc?

So questions we have are: What systems would you use and can they create/use persistent identifiers? Guidance on what could/should/must be deposited? One record or more? Opportunities for efficiencies?

On the issue of one record or more, a Thesis we deposited at UEL was a multimedia thesis, about film making and relating to making two documentary films – they were deposited under their own DOIs. Is that a good thing or a bad thing? Is that flexibility good?

Efficiencies could be possible around cataloguing theses – that can be a repeated process for the repository copy and for the library’s copy and those seem like they should be joined up processes.

We would love your questions and comments and you can find all project outputs.

Q1) What is the funder requirement on data being deposited with theses?

A1) If students are funded by research councils, they will have expectations regardless of whether the thesis is completed.

Q2) Have you had any feedback from the (completed) students whose work has been deposited on how they have found this?

A2) I have had feedback from the student who had deposited that work on documentary films. She said as a documentary film maker there are fewer and fewer ways to exhibit those documentary films. As a non commercial filmmaker seeing her work out there and available is important and this acts as an archive and as a measure of feedback that she appreciates

Q3) On assigning ORCID IDs to students – I struggle to think of why that would be an issue?

A3) Theoretically there is no issue, we should be encouraging it.

Comment: Sometimes where there is a need to apply an embargo to a thesis because it contains content in which a publisher has copyright – it may be useful to have a DOI for the thesis and separate DOIs for the data, so that the data can be released prior to the thesis being released from embargo. [Many thanks to Philippa Stirlini for providing this edit via the comments (below)].

IRUS UK – Jo Alcock, IRUS UK

We are a national aggregation service for any UK Institutional Repositories which collects usage statistics. That includes raw download data from UK IRs for all item types within repositories. And it processes raw data into COUNTER compliant statistics. And that aggregation – of 87 IRs – enables you to get a different picture than just looking at your own repository.

IRUS-UK is funded by Jisc. Jisc project and service manage IRUS-UK and host it. Cranfield University undertake development and Evidence Base at Birmingham City University undertake user engagement and evaluation.

Behind the scenes IRUS-UK is a small piece of code that can be added to repository software and which employs the “Tracker Protocol”. We have patches for DSpace, Plug-ins for Fedora, and implementation guidelines for Fedora. It gathers basic data for each download and sends it to the IRUS-UK server. The reports are Report 1 and Report 4 COUNTER compliant. We also have an API and SUSHI-like service.

At present we have around 400k items covered by IRUS-UK. There are a number of different reports – and lots of ways to filter the data. One thing we have changed this year is that we have combined some of these related reports, but we have added a screen that enables you to filter the information. Repository Report 1 enables you to look across all repositories by month – you can view or export as Excel or CSV

As repositories you are probably more concerned with the Item Report 1 which enables you to see the number of successful item download requests by Month and Repository Identifier. You can look at Item Statistics both in tabular and graphical form. You can see, for instance, spikes in traffic that may warrant further investigation – a citation, a news article etc. Again you can export this data.

You can also access IRUS-UK Item Statistics which enable you to get a (very colourful) view of how that work is being referenced – blogged, tweeted, cited, etc.

We also have a Journal Report 1 – that allows you to see anything downloaded from that journals within the IRUS-UK community. You can view the articles, and see all of the repositories that article is in. So you can compare performance between repositories for instance.

We have also spent quite a lot of time looking at how people use IRUS-UK. We undertook a number of use cases around the provision of standards based, reliable repository statistics; reporting to institutional managers; reporting to researchers; benchmarking; and also for supporting advocacy. We have a number of people using IRUS-UK as a way to promote the repository, but also some encouraging competition through newsletters etc. And you can find out more about all of these use cases from a recent webinar that is available on our website.

So, what are the future priorities for IRUS. We want to increase the number of participating repositories in IRUS-UK. We want to implement the IRUS tracker for other repository and CRIS software. We want to expand views of daya and reports in response to user requirements – for instance potentially alt metrics etc. We also want to include supplementary data and engage in more international engagement.

If you want to contact us our website is; email; tweet @IRUSNEWS.

Q1) Are the IRUS-UK statistics open?

A1) They are all available via a UK Federation login. There is no reason they could not technically be shared… We have a community advisory group that have recently raised this so it is under discussion.

Q2) How do data repositories fit in, especially for text mining and data dumps?

A2) We have already got one data repository in IRUS-UK but we will likely need different reporting to reflect the very different ways those are used.

Q3) If a data set has more than one file, is that multiple downloads?

A3) Yes.

Q3) Could that be fixed?

A3) Yes, we are looking at looking at separate reporting for data repositories for just this sort of reason.

Sadly Yvonne Howard, University of Southampton, is unable to join us today due to unforeseen circumstances so her session, Educational Resources, will not be going ahead. Also the Developer Challenge has not been active so we will not have the Developer Challenge Feedback session that Paul Walk was to lead. On which note we continue our rejigged schedule…

Recording impact of research on your repository (not impact factors but impact in REF sense!) – Mick Eadie & Rose-Marie Barbeau, University of Glasgow; 

Rose-Marie: Impact is my baby. I joined Glasgow specifically to address impact and the case studies. The main thing you need to know about the impact agenda is that all of our researchers are really stressed about it. Our operating landscape has changed, and all we have heard is that it will be worth even more in future REFs. So, we don’t “do” impact, but we are about ensuring our researchers are engaging with users and measuring and recording impact. So we are doing a lot of bridging work, around that breadcrumb trail that explains how your research made it into, e.g. a policy document…

So we have a picture on our wall that outlines that sort of impact path… showing the complexity and pathways around impact. And yet even this [complex] picture appears very simple, reality is far more complicated… When I talk to academics they find that path difficult: they know what they do, they know what they have to show… so I have to help them understand how they have multiple impacts which may be multiple impacts, it might be be by quite a circuitous route. So for instance in a piece of archeological work impacted policy, made Time Team, impacted the local community… Huge impact, extension international news coverage… But this is the form for REF processes…

But my big message to researchers is that everything has changed: we need them to engage for impact and we take that work seriously. It’s easy to say you spoke to schools, to be part of the science festival. We want to capture what these academics are doing here professionally, things they may not think to show. And we want that visible on their public profile for example. And we want to know where to target support, where impact might emerge for the next REF.

So, I looked at other examples of how to capture evidence. Post REF a multitude of companies were offering solutions to universities struggling to adapt to the impact agenda. And the Jisc/Coventry-led project establishing some key principles for academic buy in – that it needed to be simple and very flexible – was very useful.

And so… Over to the library…

Mick: So Rose-Marie was looking for our help to capture some of this stuff. We thought EPrints might be useful to capture this stuff. It was already being used and our research admin staff were also quite familiar with the system, as are some of our academics. We also had experience of customising EPrints. And we have therefore added a workflow for Knowledge Exchange and Impact. We wanted this to be pretty simple – you can either share “activity” or “evidence”. There are a few other required fields, one of which is whether this should be a public record or not.

So, when an activity/evidence is added the lead academics have can be included, as can any collaborating staff. The activity details follow the REF vocabulary. We include potential impact areas for instance… And we’d like for that record to be linked to other university systems. But we are still testing this with research admin staff.

We still have a few things to do… A Summary page; some reporting searching and browsing functionality – which should be quite easy; link to other university systems (staff profiles etc); and we would like to share this with the EPrints community.

Q1) What about copyright?

A1 – Rose-Marie) Some people do already upload articles etc. as they appear. The evidence repository is hidden away – to make life easier in preparing for the next REF – but the activity is shared more publicly. Evidence is

Q2 – Les) It’s great to hear someone talking about impact in a passionate and enthuastic way! There is something really interesting in what you are doing and the intersection with preservation… In the last REF there was evidence lost that had been on the web. If you just have names and URLs, that won’t help you at the end of the day.

A2 – Rose-Marie) Yes, lack of institutional memory was the biggest issue in the last REF. I speak a lot to individuals and they are very concerned about that sort of data loss. So if we could persuade them to note things down it would jog memories and get them in that habit. If they note disappearing URLs that could be an issue, but also I will scan everything uploaded because I want to know what is going up there, to understand the pitfalls. And that lets me build on experience in the last REF. It’s a learning process. We also need to understand the size of storage we need – if everyone uploads every policy document, video etc. It will get big fast. But we do have a news service and our media team are aware of what we are doing, and trying to work with them. Chronological press listings from that media team isn’t the data structure we would hope for so we are working on this.

William) I think it is exciting! As well we don’t think it’s perfect – we just need to get started and then refine and develop that! Impact did much better than expected in the last REF, and if you can do that enthusiastically and engagingly that is really helpful.

A2 – Rose Marie) And if I can get this all onto one screen that would be brilliant. If anyone has any questions, we’d love to hear them!

Impact and Kolola – Will Fyson, University of Southampton

I work for EPrints Services but I also work for Kolola, a company I established with co-PhD students – and very much a company coming out of that last REF.

The original thinking was for a bottom up project thinking about 50 or 60 PhDs who needed to capture the work they were doing. We wanted to break down the gap between day to day research practice and the repository. The idea was to allow administrators to have a way to monitor and plan, but also to ensure that marketing and comms teams were aware of developments as well.

So, our front page presents a sort of wall of activity, and personal icons which shows those involved in the activity. These can include an image and clicking on a record takes you through to more information. And these records are generated by a form with “yes” or “no” statements to make it less confusing to capture what you have done. These aren’t too complex to answer and allow you to capture most things.

We also allow evidence to be collected, for instance outreach to a school. You can also capture how many people you have reached in this activity. We allow our community to define what sort of data should be collected for which sort of activity. And analytics allow you to view across an individual, or a group. That can be particularly useful for a large research group. You can also build a case study from this  work – useful for the REF as it allows you to build up that case study as you go.

In terms of depositing papers we can specify in the form that an EPrints deposit is required when certain types of impact activities are recorded – and highlight if that deposit has been missed. We can also export a Kolola activity to EPrints providing a link to the Kolola activity and any associated collections – so you to explore related works to a particular paper – which can be very useful.

We’ve tried to distribute a research infrastructure that is quite flexible and allow you to have different instances in an organisation that may be tailored to different needs of different departments or disciplines. But all backed up by the institutional repository.

Q1) Do you have any evidence of researchers gathering evidence as they go along?

A1) We have a few of these running along… And we do see people adding stuff, but occasionally researchers need prompting (or theatening!), for instance for foreign travel you have to be up to date logging activity in order to go! But we also saw an example of researchers getting an entry in a raffle for every activity recorded – and that meant a lot of information was captured very quickly!

(Graham Steel @McDawg taking over from Nicola Osborne for the remainder of the day)

Demo: RSpace – Richard Adams, Research Space


RSpace ELN presentation and demo. Getting data online as early as possible is a great idea. RSpace at the centre of user data management. Now time for a live demo (in a bit).

Lab note books can get lost due to a number of reasons. Much better is an electronic lab book. All data is timestamped. Who made what changes etc. are logged. Let’s make it easy them use. Here’s the entry screen when you first log in.  You can search for anything and it’s very easy to use. It’s easy to create a new entry. We have a basic document into which you can write content with any text editor. You can drag and drop content in very simply. Once documents have been added they appear in the gallery. Work is saved continuously and timestamped.

We also have file stores for large images and sequencing files.


It’s very easy to configure. Each lab has it’s own file server. Going back to workspace, we’re keen to make it really easy to find stuff. Nothing is ever lost or forgotten in workspace. You can look at revision history. You can review what changes have been made.  Now looking at a lab’s group page. You can look at but not edit other user generated content. You can invite people to join your group and collaborate with other groups. You can set permission for individual users. One question that comes up often is about how to get data out of the system. Items are tagged and contain metadata making them easier to find. To share stuff, there are 3 formats for exporting content (ZIP, XML and PDF).

The community edition is free and uses Amazon web services. We’re trying to simplify RSpace as much as possible to make it really easy to use. We are just getting round to the formal launch of the product but have a number of customers already. It’s easy to link content from the likes of DropBox. You can share content with people that are not registered with an RSpace account. Thanks for your attention.

Q1) I do lot’s of work from a number of computers.

A1) We’re developing an API to integrate such content. Not available just yet.
Closing Remarks and presentation to winner of poster competition – Kevin Ashley, Digital Curation Centre

I’m Kevin Ashley from Digital Curation Centre here in Edinburgh. Paul Walk mentioned that we’ve done RFringe events for 7 years. In the end, we abandoned the developer challenge due to a lack of uptake this year. Do people still care about it ? Kevin said there is a sense of disappointment. Do we move on or change the way we do it ? Les says I’ve had a great time, it’s been one of the best events I’ve been to for quite some time. “This has been fantastic”. Thanks Paul for your input there said Kevin.

David Prosser’s opening Keynote was a great opening for the event. There were some negative and worrying thoughts in his talk. We are good at identifying problems but not solutions. We have the attention of Governmental department in terms of open access and open data. We should maximize this opportunity before it dissapears.

Things that we talked about as experiments a few years ago have now become a reality. We’re making a lot of progress generally. Machine learning will be key, there is huge potential.

I see progress and change when I come to these events. Most in the audience had not been to RFringe before.

Prizes for the poster competition. The voting was quite tight. In third place LSHTM, Rory. Second place. Lancaster. First place, Robin Burgess and colleagues.

Thanks to all for organizing the event. Thanks for coming along. Thanks to Valerie McCutcheon for her contribution (gift handed over). Thanks to Lorna Brown for her help too. Go out and enjoy Edinburgh ! (“and Glasgow” quipped William Nixon).


Repository Fringe 2015 – Day One LiveBlog

Welcome to Repository Fringe 2015! We are live for two packed days of all things repository related. We will be sharing talks via our liveblog here, or you can join the conversation at #rfringe15. We are also taking images around the event and encourage you to share your own images, blog posts, etc. Just use the hashtag and/or let us know where to find them and we’ll make sure we link to your coverage, pictures and comments.

This is a liveblog and that means there will be a few spelling errors and may be a few corrections required. We welcome your comments and, if you do have any corrections or additional links, we encourage you to post them here. 

Welcome to Edinburgh – Jeremy Upton, Director, Library and University Collections, University of Edinburgh

It gives me great pleasure to introduce you to the University of Edinburgh to this great event organised jointly by staff from the Digital Curation Centre, EDINA, and the University of Edinburgh.

If you have come from outside Edinburgh then it really is a beautiful city and I encourage you to explore it if you have time. And of course it’s the Edinburgh festival and I’m sure you’ve already had a sense of that coming in today. It is an event with a huge impact on the city, and on the University – we get involved in hosting and running events and I have to give a plug for our current exhibition, Towards Dolly, featuring Dolly the Sheep at the University of Edinburgh library.

So, as the new Library Director I really am pleased that Repository Fringe is running again here. And in my time thus far two issues have really been a major priority: Open Access and Open Data, and I’m pleased to see both reflected in your programme. Amongst academics these issues can trigger quite a fair degree of panic and concern. But you can take a really positive opportunity here – the academic community is looking to our community to provide creative solutions.

I’m also delighted to be here because throughout my career I have been a fan of collaboration and shared working, and the areas of open access and open data are areas particularly ripe for collaborative and shared working, to share knowledge and share some of our pain, as we all meet these shared requirements.

We also find ourselves in an increasingly uncertain world and that makes the role of innovation and ideas so important – and that is why events like this are so important, giving us space to

Edinburgh was an early adopter of OA, starting in 2003. We have strong support for open access with champions within departments. We received over £1.1M RCUK funding of Gold OA this year. Our staff look at the landscape not only from our own institutional perspective, but also looking to the much wider sector. We work in collaboration with colleagues in EDINA and with DCC, talking regularly and sharing knowledge and expertise.

We are one of the partners in the new Alan Turing Institute, working with large data sets including large open data sets. And that is looking at the opportinities for new and innovative research in areas such as healthcare. And I heard our Vice Chancellor talking about the use of data in, for instance, treating diabetes in new ways.

Now, finally, I have a few practical items to mention. You all have stickers for voting on your favourite posters. Also Repository Fringe is very deliberately a fringe event – we want this event to have a looser structure than traditional conferences. The organisers want me to emphasise informality, please do dip in and out of sessions, move between them, your presenters will expect that so move as you wish. And if you want to create your own break out sessions there are rooms available – just ask at the Fringe registration desk. And the more that you put into this event, the more you will get out of it.

We would like to thank our sponsors this year: Arkivum, EPrints Repository Services, and the University of London Computing Centre.

So, please do enjoy the next few days and take the opportunity to see some of Edinburgh. And hopefully you will have a fruitful event finding new solutions to the challenges we all face.

Now it is my pleasure to introduce your opening keynote speaker. David Prosser came from the “dark side” of medical publishing, then moving on to undertake his doctorate and move onto his work with Research Libraries UK…
Fulfilling their potential: is it time for institutional repositories to take centre stage? – David Prosser, Executive Director, RLUK,

As someone who has been involved in Open Access for the last 12 years I want to look back a bit at our successes and failures, and use that to set the scene with where we might go forward.

I wanted to start by asking “What are repositories for?”. When we first set up repositories they were very much about the distribution of research, for those beyond the institution, and for those without the funding to access all of the journals being published in. There was, and continues, to be debate about the high profits made by commercial publishers… There was a move towards non commercial publishers. And there was something of a move to remove “dirty profits” from the world of scholarly communications that helped drive the push to open access.

We have also seen a move from simpler journals and books, towards something much richer which repositories enable. We were going to revolutionise scholarly communications but we haven’t done that. We have failed to engage researchers adequately and I think that the busyness of academics is an insufficient reason to explain that. So why have we failed to engage and to get academics to see this as something they should do on a daily basis? I warned Les Carr years back when he was talking about the Schools timetabling… He was saying how hard that is to do but that he knew he had to do that because that was part of what he needed to do to achieve what he wanted. We have gotten fixated on making things easy in open access, even though people will take the time to do things they feel its important to do.

And there has also been confusion about standards, about publication status – what they can share as pre/post print and what they were allowed to do which took a long time to resolve. It didn’t help that publishers were confused too. There is an interesting pub conversation to be had about whether the confusion is a deliberate tactic from publishers… In charitable moments you can see statements coming out that suggest confusion, that publishers don’t understand the issues.

But there are places where those issues have been overcome. Arxiv is so well established in high energy physics that no publishers restricts authors in depositing there. Another subject based repository, PubMed Central, has also seen success but that is in part because of requirements on authors and publishers, and that space has seen success because of working closely with publishers. We have also seen FigShare and Mendeley that have seen great success – what is it about them that is attractive that we can learn from and borrow from for repositories?

Over the last 12 years there has been a real tension between pragmatism and idealism. When repositories first emerged we were happy to take in content without checkin the quality so carefully – for instance suitable and clear rights information. As a user the rights information is not always there or clear. Not all metadata we have, especially for older material, is neccassarily fit for purpose, for our needs. We have to some extent a read only corpus because of that. But is that enough? Or are there more interesting things we would want to do? It is difficult to look back and see those pragmatic decisions as the wrong one: we wanted to demonstrate the value; to show authors the potential for dissemination of their research… It is hard to say that was wrong but going forward we really need to make a concious decision about what it is that we want.

So one of the things about open access as a force for revolutionising scholarly communication… You can see scholarly communication as being made of three functions of registration, archiving and dissemination, now all of these can be fulfilled by repositories but we still seem to be using repositories for all of these things. We haven’t moved to using repositories for those functions first. Early on there was an idea that you would deposit work locally then get it accepted, kitemarked, etc. by a journal or process after that. We can see the journal has retained it’s position. Libraries in the Scope3 project, which looked at journals whose content was entirely available in Arxiv, spent 10 years persuading and working with publishers to get those journals to be open access so that post prints were as open access as all previous and parallel versions of the same paper. But that was about protecting journals. Libraries seem to be so keen on journals that they are desperate to protect them, sometimes in the face of huge opposition from publishers!

So we have a very conservative system. You have to see journals not as a form of scholarly communication, it is about reward mechanisms. If you are rewarded for being in one of those high energy physics journals it does make sense that you should be so invested in supporting their existance. The current reward structures are the issue, but what is the solution there? One of the governments key advisers, Dr Mike Walker, raised this issue without suggesting solutions. And in research institutions and libraries we are so far away in terms of our sphere of influence from those reward mechanisms which means all we can do is nudge and inform…

We can, however, see open access advocacy as a success. In the last government we saw some openness to talking about open access… We can talk about whether the impact of that has been totally helpful but there has been impact. Something like 80 institutions now have open access policies – they vary in effectiveness but those even being in place are remarkable. And where they work well they make a real difference, with Wellcome Trust, the University of the Age (?), RLUK and the HEFCE policy is really the game changer.

It has been interesting, over the last few weeks, to see a change in the HEFCE policy. It is interesting to see how ready institutions are for it – there are many that are not ready yet and that could mean a pressure to change that policy but we see them stating that policy mistakes  will be treated leniently which is helpful. Authors usually know if their paper has been accepted but it can be harder to know when it has been published, which is an important trigger. But it seems that the stick of the HEFCE policy is too strong. Universities don’t trust academics and researchers to deposit regularly, and they recognise the risks that that brings in terms of the REF and their funding in the future. This is why a lot of Russell group universities in particular have lobbied for acceptance rather than publication date…

It says a lot about scholarly communications that authors and institutions do not always know when a paper has been accepted or published. The idea of the notification of acceptance being a private transaction between the author and the publisher, that raises some concerns for research libraries.

Now, I wanted to make a small diversion here to talk a bit about RCUK. With the comprehensie spending review coming up in UK Government, and saber rattling about 40% cuts in research budgets. And I think funders, RCUK in particular, will look at what they are spending and ask if they are getting value for money. And I think researchers will also question, if their budgets are cut, why RCUK are paying so much money to Reed Elsevier. So there will be pressure to stop paying for open access. And there is a transition period where longer embargoes are allowed for open access – this has led to groteque growth of publishers decision trees! It could be that that the end of that transition period, and a cut in funding for gold OA, may put the focus back on repositories. That is an important scenario that we should be thinking seriously about. And the issue of embargoes means I need to say that there is still no harm in shorter embargoes. Any embargo is a concession to the publisher. It’s a concession that potentially slows down the communication and sharing of research.

There is also an important embargo change where publishers fail to respond to enquries about gold OA, such that those crucial first few weeks of interest may be lost to them. Now I think that’s another incompetance issue rather than something more sinister.

By failing to engage authors in the deposit process, to engage them in that way. We are making APCs payments easier – we just ask authors to tell us where they are publishing and we pay from them. I am concerned about separating the author from the process in general, but particularly from APCs. The author doesn’t know or care about the costs involved. If they do engage with that, if they do look, then they need to make that choice about whether the price charged is worth it for the relevance, impact or importance of that journal. Separating the author from the process makes us in danger of creating an APC crisis in the same way that we had a serials crisis.

TRaditionally Universities have shown a shocking indifference to their scholarly output – the research papers, publications, etc. It was very hard to understand what was published, what was created. Very little responsibility on scholars to capture their own published outputs – an assumption that library would purchase but that assumption was not always correct. Some of that is being addressed by REF, but also be Universities becoming much more aware of their intellectual output. Capturing and reflecting on that output is no longer seen as weird or alien, and that is good for our work, for our arguements about the value of open access, of respositories, etc. BUt universities do also care about cost benefit analysis for this work. And for data in particular there can be really high costs associated with making data available for reuse. We need better stories to explain how th ebenefits outweigh the costs.

We have had issues over the last 15 years around the visiion of open access that we originally had… In the UK we could talk about

Danny Kingsley, Cambridge University talked at LIBER about the idea that in a sense the compliance engine aspects of repository fringe can devalue the potential of repositories, of what they could be for open access in the academic community. If open access is “just” a side effect of repositories it is an amazing side effect! Making work available under open access is a real achievement, even if the route is rather tortuous, and has involved pain in negotiating the confusion and issues with publishers, we have made a real difference. And there is nothing wrong with being flexible over open access, and of jumping onto band wagons. Compliance is a useful band wagon right now, so we should use it! We should stop worrying about whether people do the right thing for the wrong reason, and just be glad that the right things are taking place.

But over the next few days we should be thinking about how we can use what is in our repositories, how easy to rights statements make licenses, how can we look across a topic easily across multiple repositories. And, in terms of preservation, how concerned are we and should we be about that? Are they more about dissemination? If we are going to get an explosion of material of the next few years do we have the capacity to handle and interpret that material?

So we have had a messy tortuous route here but open access is really happening, and we have several days to develop our vision for what we should be doing with this. David Willets has talked a lot about open access, I think he’s rather overplayed his hand based on what is happening in the US, but there is so much more that we could do with open access,


Q1, Grant, University of Leicester) How many repositories should there be? There tends to be 1 per university. There are some joint ones between institutions…

A1) If you started with a blank state today would you set up 100-120 institutional repositories? I tend to think no, you wouldn’t… You would want something more centralised. There are a variety of institution with very varied expertise: Edinburgh is very skilled and engaged and would want their own repository but there are many institutions are really concerned about what they can set up to meet HEFCE requirements, and there is an opportunity there for someone to bring them together so that they can all meet those requirements in a centralised way. I think there should be more centralisation…

Comment – Paul Walk) There is a shadow issue there about not the number of repositories, but who controls it. A collaborative set up where control is retained seems the important thing…

A1) I think White Rose seems like a great example – a shared repository but it looks like each institution has their own space in terms of how that is presented on the web.

One of the big areas in fashion is the idea of library as publisher, of each institution publishing. I think what should be learned is that infrastructure for University presses should be shared, but content is where each institution should focus. The idea of all institutions using their own publishing platforms, different set ups, appearing to be but not quite interoperable, doesn’t seem like the way to go.

Q2 – Kevin Ashley, DCC) I remember Andrew Prestwick talking about institutional repositories in Wales where he commented that for smaller institutions the issue of control, of their own system, was really important to them.

A2) We live in a strange world where authors are hugely keen to give away all of their Intellectual Property to commercial publishers but can be odd about making it open access.

Comment – Rachel Bruce, Jisc) I remember the conversations Kevin talked about, and we set up a shared repository, the Depot but that was not a success. The institutional repository structure seemed more effective at that time.

A2) I think that may have been an issue of that being too early. The Depot has been more a repository for lost souls, for authors without institutions… But there wasn’t really an attempt to engage institutions.

Comment – RB again) It was a repository of last resort… And we would engage differently around that if we were doing that now.

Q3 – Les Carr, Southampton) In terms of where things should be put, should there be departmental repositories? As someone with a national view, looking over a national research ecology, how would you reshape the research landscape 15 years ago? We seem to have gotten stuck in commerciality, compliance, quality of journals, quality of research, and not questioning the system. How would you have shaken up the system in 2000 to change that?

A3) It is really hard. Many of the decisions of the last 15 years were made with good intent. The whole of scholarly communications is about the reward structure. It makes people write papers that are not really intended for communicating results, but for getting rewards. You see Peter Murray-Rust talking about this a lot… You have a huge range of data and outputs that you have to reduce to 5 pages of write up and results that are not easy to reuse. We do that because of pay and reward… Here we have the bizarre situation that HEFCE says that Impact Factor and where you publish isn’t the issue in REF, but everything academics and researchers believe is that that stuff matters. And so much of what we are doing are hampered by the idea that journals are how we decide funding, how people develop their careers. But if Dr Mike Walker can’t say what the alternative would be, I don’t think I can.

Repositories for Open Access, Research Data Management and beyond – Rory McNicholl, Timothy Miles-Board, University of London Computer Centre

I am going to start with a short potted history of the University of London Computing Centre… In 1966 the Flowers Report assessed the probably computer eeds during the next five years of users in Universities and civil research. The great and the good of the University of London met to discuss this and they commissioned a glamorous building in 1968 for computing. By the 1980s we had a new machine which had a fantastic amount of computing power which could be used by researchers around the region.

After the 1980s there was deemed to be less need for a single computer centre in quite the same way. But that there was a real need for computing for HE and Public sectors. So, what are we doing at Repository Fringe? Well back in 1997 Kevin Ashley and colleagues recognised the need to preserve at-risk digital objects and work was undertaken to address that through a project, NDAQ, that ran to 2009. Following that we have been working on a new project, from 2006, including a Digital Preservation Training Programme, and what we are now calling the Research Technologies Service.

The Research Technologies Service provide various things including Open access repositories; research data repositories; eJournals – which there has been growing interest in; Archvival storage; and Bespoke asset presentation – a way to have a front end customised for specific organisations.

To achieve this we are using ePrints, alongside OJS for our eJournals, and Arkivum (A-Stor in ULCC DC), as well as Python, Django and elasticsearch. And we do that for various institutions which means we need to be interoperable with 3rd party systems. So we are interoperable with institutional HR systems, Harvesters, etc… with crossref, fundref, CERIF, IRUS UK, Altmetric, BL, OpenAIRE, ORCID, DataCite, SHERPA. But there are so many more – too many to detail in full.

How do we do what we do? We are flexible, a small team which is very well supported with infrastructure expertise and a service desk. We are community driven, as part of the HE community responsive to that community. We are also fluid, platform agnostic, and ready to listen to our customers and embrace change.

That brings us on to the community platform, how we realise those things. Those funder (HEFCE and SFC) mandates, tend to drive what we do… That’s what keeps the community up at night, and thinking about what they can achieve and how. We engage in a way that takes best advantage of the shared code and initiatives around open source software. So developers write code, share on GitHub, and the people we host can then access that shared expertise and development via ePrints and the ePrints Bazaar – bypassing commercial coders and quickly ensuring they are able to address RDM, open access, etc. issues.

And we have a community platform for Open Access – oa_compliance, OpenAccess, rioxx2 – we’ve made that something we can put into a repository so it describes what needs to be described; datesdatesdates – a way to understand which dates count; reviewed_queue – to manage the process and workflow to crack the publication process; ref2014; and… more? The open access button is of interest… ePrints has had a “request copy” button for years and years… Maybe we need a “request open access copy” added?

Over the last couple of years there has been a huge push towards using repositories for research data, and with RDM. We have been working with University of Essex and University of Southampton to look at the recollect profile – keeping research data and describing it effectively. And we have put that into the community platform. And then we undertook work with University of East London, and the London School for Hygiene and Tropical Medicine; and we have worked on DataciteDOI, developing that on a bit with University of Southampton and DataCite; and arkivum has been a big part of the OA Mandate… Seeing that it became clear that there was a need for infrastructure, and work with arkivum has helped us top up access to the archive network. And another thing that came out of the EdShare world, UEL, and LSHTM which was about describing project data sets and collections. And more? We are working with Jisc, University of Creative Arts, and CREST on the next phase of the Research Data Spring project to improve the way that data moves from the researcher to the repository, to make that quicker and more efficient, and the presentations of that data.

And beyond… Lots of other things have happened… RepoLink – for linking research papers together, UEL have gone for this for linking research objects; pdf_publicationslist; soundcloud; iiif-manifest – coming out of work on presenting digitised objects and that is feeding back into how we do presentation; bootstrap – our colleague at UEL did some fantastic work around bootstrap to make repositories work well on mobile, that’s available to use and explore now; crosswalks_sgul – this has been around symplectic tools, we work with St George’s a lot on this and they have been happy to publish this back into the community.

So, all of the work we’ve done can be found on the community platform, ePrints bazaar (, but the source code around that isn’t always obvious so you can also find this work on GitHub (

So, what’s next? Well the Public Knowledge Project and the idea of university presses seems timely, there are more opportunities for more community platforms. There are exciting things coming from our siblings at the School of Advanced Study and Senate House Library, who have made an interesting appointment in the area of digital so exciting things should come out of that… And we are also looking at Preservation as a Service… working with Arkivum and artfactual… or maybe something more simple. And we are also creeping backwards through the research object lifecycle… And of course more collaborations, so we have ORCID in place but can we help institutions get more impact from it for instance?

Lastly… We have a job ad out – we’re hiring – so come join the team! Contact me: Thank you!


Q1 – Rachel Bruce, Jisc) Who is using OJS?

A1) We have several universities using OJS, we’ve worked on a plugin for integrating in repositories, on a system for another universities to encourage universitity staff and students to set up their own journals. We have three universities using OJS in those ways so far, but lots of interest in this area at the moment.

Q2 – Dominic Tate, UoE) Is there one area of service you are particularly looking at suppoet?

A2) I think CREST has been an interesting example… provising archiving for organisations that can’t justify doing that separately. We do tend to focus on the technology

Poster Session – why should we look at your poster? – Martin Donnelly orchestrating the minute madness!

Sebastian Palucha: I am talking about Hydra, I can tell you about Hydra.. If you know about we moved from ePrints to Hydra, and we can also talk about how we integrate DataCite.

Gareth Knight from London School of Hygiene: We developed a plugin to add geospatial data to items in ePrints. Come and ask us about it!

Alan Hyndman from FigShare: My poster is on how FigShare can interoperate with institutional repositories, and also some of the other interoperabilities we are already doing…

Robin Burgess, from GSA: Apologies, no guitar this year! This is on exploring research data manager in the digital arts, and in the communities fields. And this is my last chance to present here – I’m moving on to Sydney as their Repository and Digitisation manager so I wanted to go out with a bang!

Adam Carter, from the EPCC: I’m here for the Pericles project, and EU FP7 project on digital preservation. We are not building a digital repository, we are about the various different aspects of managing change around a digital repository. We are arguing getting data in is easy, how do you deal with technological change in terms of accesing and using data, and the change in who uses your data and how, so it ties into repositories in many ways. The poster includes some work we are doing modelling the preservation ecosystem. Also on sheer curation – that preservation when the data object is created, not when you deposit it.

Rory Macneil, from RSpace: on integrating electronic lab notebooks with RDM and linking in DataStore at University of Edinburgh, we’ll be doing a demo at lunchtime on this too! RSpace supports export of documents, folders and associated metadata in XML files, and that work leads to an integrated RDM workflow for researchers and the institution, so that the data is collected, structured, and archived and shared. That’s possible by working with researchers, RDM professionals and IT managers.

Pablo de Castro, from LIBER: my poster is on the EU FP7 post cancellation access project. This is an experiment which OpenAIRE has been managing in order to implement fair Gold OA. Some specific constraints that this project looks at is that publication in hybrid journals will not be funded. And we are working on what we call the APC alternative funding project. We are working with a €4 M with a pilot that began in May, with significant help from University of Glasgow. And given those constraints we are keen to engage institutions to make this a success, this idea for an alternative way to implement Gold OA. And we have some idea of the main places requests are coming from, etc. But it should grow quite a lot in the forthcoming months.

Martin: In the spirit of the Fringe please do make use of the blank poster boards! Add your own literature, arrows, etc!

Hardy Schwamm: DMA Online is an online dashboard, funded under Jisc Research Data Spring, which provides a view of how many data sets are funded and created in your institution, how many have an RDM plan, how much data they plan to use. It takes data from various places and hopefully DMP Online, and any information that is held spreadsheets. You can see our poster and our demo. Do come and tell us what you would like to see from the dashboard…

Dominic Tate: I’ve been asked by my colleague Pauline Ward, there are some noticeboards up in the forum for comments on tomorrow’s workshop – do come and ask me if you have any questions about that.

Lunch, which includes: and Demo:RSpace – Rory Macneil, Research Space

And we are back…

Open Access Workshop – Valerie McCutcheon, University of Glasgow

We are using the EPrints repository at Glasgow in this session, but this is just one example of open access. But you may have your own set up or perspective. And we’ll talk for about 40 minutes, then you can choose what you want to talk about in more details.

So, we are going to have a live demo of the journey an open access article goes through when it goes into our repository, Enlighten.

So, we would select the type of item, we upload that file, and then we add details about that paper – the title, abstract, etc. [to fill these in Valerie is taking audience suggestions – not all of the journal titles being suggested sound quite authentic!]. Then our next screen adds the source of funding for the publication. That’s so far, so traditional… But wouldn’t it be nice to get some of that data from the journals?

So, without further ado, I’m handing over to Steve Byford from the Jisc to talk about the Jisc Publications Router

Wouldn’t it be lovely if publications data could automatically go into your institutional repository in a timely and REF compliant sort of way. Now, to manage expectations a bit, this won’t fix all the possible problems but the Router will prompt at two key stages at the publications process. The Router gathes details of research articles from publishers etc. Then it directs articles to appropriate institutions, alerting them to the outputs and helping capture of the content into a repository or CRIS.

This has been funded as a project based at EDINA. That project reached it’s conclusion on Friday. The aim of that project was to demonstrate a viable prototype, which it did. It processed real publications information and that worked well. And now that that project has finished a successor system is currently being developed, to migrate existing participants in August to September 2015. Then we will be recruiting new participants, and then aiming for rapid expansion of content captured. And we have the intention to move to full service by August 2016. So, if you want to hear more about that, then choose that for your breakout session following this one.

Back to Valerie

Now, once I’ve uploaded my article, and the information, I might want to look at the access status of that article. And on that note I’d like to introduce Bill Hubbard to give you an update on SHERPA services.

I’m here with my colleague who actually manages the SHERPA services… If you select us for a breakout session we’ll be doing a double act! So, we have five minutes to tell you what’s new… Hopefully you have already heard of us, and use the site. If you do then I hope you find us useful. We support open access processes around publishers rights, open access statuses, etc. We are about making your job easier, and so part of what I want to find out from you today is how we can do that, what we can do to help make your life easier.

RoMEO, which we started over 10 years ago, the world was a lot simpler but the policies and rights picture has only become more complex. JULIET is a registry of policies on Open Access and that is a more straightforward process. OenDOAR is the world’s authoritative and quality assured directory of open access resources. We also run FACT and then also REF – advice to UK authors on compliance with HEFCE’s OA policy which will launch soon!

In RoMEO we have rights data on over 19,000 journals, In Juliet we have over 155 funders, and in OpenDOAR we have around 2937 IR listings.

Futures… I’m asking you not to tweet pictures… This is work in progress… We have a new interface and improved funcationality coming in OpenDOAR, FACT, for RoMEO we are working on Improved User Feedback, and improved international collaboration, and maybe even improved policies – we are working with publishers about the quality of expression. And REF is a new service of course. And what else? Well we are moving towards an improved range of shared services with Jisc… Come to our session to find out more…

And now back to Valerie

So we are going to look at some of these services just now… So I will look up our article on SHERPA/RoMEO… We have integrated more open access information in our repository – we have a whole screen for this now. And so on this screen we see the estimated cost – lets assume we’ve gone for the Gold option – and I can later update with actual costs to reflect any changes in currency/price etc. And we can select the status of the paper – including Green, Gold but also “No OA Option”, Pending etc. And we can add the Article Reference, Date of compliant deposit, funder acknowledgement, etc. Then we have an RCUK screen for completion… And finally a deposit screen.

And now over to Balviar Notay to talk about how Jisc are working on RCUK Compliance.

I will be talking about the RIOXX metadata application profile and guidelines for research papers, worked with RCUK and HEFCE. This was developed by Paul Walk (EDINA) and Sheridan Brown (Key Perspectives). You probably collect this data already, but this is about standardising this. RIOXX doesn’t cover all REF requirements but will cover many of the key areas. It has been a long time getting to this place but now at a place where if we do this, we can really see consistency of tracking research papers across systems in a really coherent way.

RCUK will be releasing some communication in the coming weeks and strongly recommending that all institutional repositories at research organisations in receipt of RCUK funding use RIOXX. We have developed Plug ins and patches to support implementation, plug ins for EPrints RIOXX, also DSpace and CRIS user groups have also started to engage with RIOXX but we need more engagement here.

Now, onto the REF plugin. This has been developed with HEFCE. This will build on the original plugin developed for the 2014 REF. Institutions wishing to use the REF plugin must also install the RIOXX plugin. And we are looking for expressions of interest to trial the EPrints plugin. We are also looking at developing a DSPace plugin. The development team are Tim Miles-Board and Sheridan Brown (Key Perspectives).

Back to Valerie

The next screen after RCUK, moves us to a page where you can capture OpenAIRE compliance…

Balviar again: CORE is a system to aggregate open access content, providing access to content through a set of services. There are about 74 million items from around 666 repositories, 10k journals, and 60 countries. In terms of services there is a service, API and Data dump. And you can access and data mine that service. They are also developing a dashboard for institutions to track data in CORE, how the contest has been harvested, etc. Looking at how we support funder compliance through that dashboard as well. And we are looking at that work through a project called Jisc Monitor…

Back to Valerie

We now have a choice of breakout sessions… We have a sheet

Comment: There has been a lot of discussion on UK CORE about peer to peer repositories, so that may be useful session to add to the list.

Paul Walk: This has been about repositories alerting each other about non-corresponding authors. So discussion around peer to peer repositories. There is a Google Doc on this that can be accessed, I’ll add that under a working title of COAX.

We are now hearing a quick recap of the breakout groups looking at some themes in more detail:

  • SHERPA Services, Azhar Hussain, Jisc
  • Open Access Metadata, Valerie McCutcheon, William Nixon, University of Glasgow
  • Publications Router, Steve Byford, Jisc
  • Profiles for Reporting to Funders (e.g. RCUK, REF, EU), Balviar Notay, Jisc
  • Aggregation Services, Balviar Notay, Jisc, Lucas Anastasiou, The Open University
  • What advice do we want about open access – Helen Blanchett, Jisc Customer Services
  • COAX / Co-operative Open Access eXchange (Peer to Peer Repository information sharing) – Paul Walk, EDINA

We’ll try and take the blog to some good sessions… 

In fact we have loitered in the main room where Valerie and her colleague William Nixon are talking about the drivers for the various add ons and customisations they have made to EPrints. They were doing a lot of this work via spreadsheets and this system represents a major saving of time and improves accuracy.

Valerie and William, in response to questions, are also going through some of the features of their item records in more detail – for instance there is the potential to add multiple transactions for one article, at that sometimes apply. Valerie: we would love a better solution for capturing some of the finance information on open access and welcome your comments and feedback, or if you have a solution to the issue…

We are now moving on to roam the other Repository Fringe breakouts taking pictures and tweeting. Best place to catch summaries/highlights from the breakouts is over on the hashtag #rfringe15. And if you are in or leading a session that you’d like to write up, just leave a comment here and we’d welcome a follow up blog post from you!

Tea and coffee Demo:
 – Jonathan Rans, Digital Curation Centre

Demo:DMAOnline – Hardy Schwamm, Lancaster University

The blog will be in the EPrints session in the main room at Repository Fringe 2015 but there are three sessions taking place for the next half hour, keep an eye on Twitter for more on all of these:

Parallel sessions:

  • EPrints update, Les Carr, University of Southampton
  • Dspace update, Sarah Molloy, QMUL
  • PURE update, Dominic Tate, University of Edinburgh, Appleton Tower, Lecture Theatre 3

EPrints update, Les Carr, University of Southampton

Adam Fields, the Community Manager for EPrints is presenting this session via live video link from South Korea! It is around midnight there so he’s also presenting from the future!

We are starting (after a few snapshots to set the scene) with a chart of the EPrints services team, to give a sense of how many are in the team.

EPrints Services exists to effectively serve the community for expertise and support, initially for the open access agenda, but it is becoming a lot more than that with what is happening with RDM in the sector. We are a not for profit service, and we exist to serve the OA community through Commercial Services (hosting, support, etc), we Lead on EPrints roadmap and releases, and provides funding for the development of the EPrints software.

What I’m going to talk about with the software side of things, the general trend of where we are going and the software side of things. I want to start with the past. The main feature released in EPrints 3.3.14 was a change to the EPrints Bazaar. It had been a bucket of packages but there had been little by way of tagging and properties. So we have added an accolades section, to tag stuff as having particular properties etc. You can filter based on these accolades… These are alphabetically listed, with EPrints Services Recommended appearing at the top to indicate those packages that we have tested and recommended – anyone can contribute to the bazaar.

In EPrints 4 / 3.4.0 has the key philosophy that the “Base” EPrints storing and handling of generic data and objects. And “Layers” to handle specific metadata schema import/export, rendering, search, etc. for specific domains. So this concept is that the database and everything else are two separate aspects. So there would be a layer for publications, but another layer for a data repository. The reason for doing this is to more sustainability develop against the increasingly complex requirements of the sector. So the 3.4 releases will be collections of metadata schemas, renderers, etc. to support this. So here are two (diagrams) illustrating the difference between EPrints for Publications, for Open Data, or for Dataset Showcases… These releases are sort of all the same in abstract… For a Dataset Showcase it’s about showcasing or visualising a dataset, so you need a bespoke metadata schema for your data sets. But in abstract the set up is the same – the metadata schema and the tools you need to describe your data set. And similarly we have an EPrints for Social Media Data, for importing tweets and that has similar abstract shape but with specific functionality to reflect the large sale of the dataset.

Q1) Is this one instance or multiple instance?

A1 – Les) You can have multiple repositories running on one EPrints repository. So the idea is of having a range of repositories but that could be on one installation. But you can connect up repositories and mash things up of course. But the idea is to trim down how many metadata fields and how much data you collect – that you only need to gather the relevant information for the relevant type of item.

A1 – Adam) This approach is about having a repository with key features… We categorise particular components, and these are examples of what that might look like. But anyone can customise EPrints for the combination of features and functionality that they need.

Adam: Now I want to talk a bit about my role. I am here to engage with the community, and to engage with the community, understanding what it is you need and want from EPrints. I hang out a lot in the various community spaces but I have also been working on supported developments – where individual organisations require something specific but need help to do those, for instance a thesis deposit tool. I am creating training videos for those supporting and administrating EPrints repository. We also have community members discussing improvements to the wiki, and I’m expecting lots of progress there from the last 6 to 8 months. I’m also always encouraging everyone to share documentation, or write documentation – everyone in the room here will have knowledge and expertise to share with other EPrints community members. Are there other things we can do to help? [no, not at the moment based on audience response].

One of the side effects of creating videos is that I get feedback and statistics on who is viewing those videos. So, for instance, I put up a video on installing an EPrints repository. It has been viewed several times a week since it went live. But the intriguing thing is the countries it has been viewed from – the top two countries have been Indonesia and India. It has been viewed around 40 times in the UK, but 120 countries in Indonesia, and 20 views in Iraq and Guatemala. That suggests a truly global community, but also means we need to think about how we could bring this community together.

Finally, a quick plug for the EPrints UK User Group meeting, which is on 11th September in Southampton. If you would like to present please do post to the EPrints UK User Group Google Group, or contact Adam directly (

Q2) Is there more information on when these versions will be released?

A2 – Les) The release version 3.4 is coming soon and that will take us to the modular stage. But at the moment we are waiting for a developer to get us there. But in terms of getting to EPrints 4.0… Much of what we needed will have been delivered in 3.4. But the whole point of 3.4 is that it will contain the same underlying system but moves us to that modular layer idea.

And on that note we leave Adam to get some much needed rest in Korea, as we turn to our final session of the day… 

Panel Session: Building data networks: exploring trust and interoperability between authors, repositories and journals with:

  • Varsha Khodiyar (VK), Scientific Data(Chair);
  • Neil Chue Hong (NCH), Journal of Open Research Software;
  • Rachael Kotarski (RK), DataCite;
  • Reza Salek (RS), European Bioinformatics Institute;
  • Peter McQuilton (PM), Biosharing, 

Varsha is introducing this session for us: I work for Nature Publishing Group, one of the “evil publishers” and I work as a data curator at Scientific Data, part of the NPG group. An example of the sort of repository we work on is PhysioNet, a very specialist space in which data is shared.

We have a number of requirements for data journals and our criteria include that that they must be (1) recognised within their scientific community (2) long term preservation of datasets (3) implementation of relevant reporting standards (4) allow confidential review of submitted datasets (5) stable identifiers for submitted datasets (6) allow public access to data without unnecessary restrictions. And we have a questionnaire online to help assess repositories of data against these.

Neil Chue Hong – I am Director of the Software Sustainability Institute but for the purpose of this presentation I am also Editor in Chief of the Journal of Open Research Software. So what does a metapaper in this journal look like? Well it describes the software, the license, the potential for reuse, etc. And so a paper as a whole tends to include an introduction, how it came to be, screenshots, implementation, quality control, metadata, reuse, references. In some ways it is a proxy for

For the panel we are trying to do the same things in Software as research data in some ways, but have concerns about preserving code – Google Chrome is shuttering, how do we preserve that?

Rachael Kotarski: We are looking at assigning DOIs to data theses and software, among other things, so that DOIs remain stable even as data develops and changes over time. We are working with 52 organisations across the UK. I also have a role at the British Library around providing collections as data – so enabling researchers to use large scale collections of data. And the Alan Turing Institute is to be physically hosted at the British Library – we aren’t a partner in that project but we are hosting it.

Reza Salek: I am at the European Bioinformatics Institute, the largest freely available data on life sciences and it is available for reuse, and completely open open access terms. The repository at EMBL houses a couple of experiments, we were the first one to provide a repository for sharing data in this way. Historically this community was not as happy to share their data. We learned quite a lot – hope to learn a bit more.

Peter McQuilton: I work at and we are a web-based, curated and searchable portal where biological standards and databases are registered, linked and discoverable. We have a database registry, a standards registry, and a policies registry. You can also make a collection of your own from a sub set of these collections of materials.

Our mission is to help people make the right choice – for researchers, developers and curators who lack support and guidance on which format or checklist standards to use. We are a small team with collaborators that include NPG, EMBOPress, BioMedCentral, Jisc and others.

Varsha: I have some questions for our panel, but do just jump in…

Q1 – Varsha) How did your community embed your repository?

A1 – Rachael) For us the persistent identifiers are key to enable reuse over time. Specifically for DataCite we have very few requirements: we have five fields that enable you to cite the DOI. There are more fields one would want to actually use the data, but because it is cross subject and format we can’t specify exactly what that should be. The other thing we require is a landing page – a target for the DOI, so that the object can be found and used. You could make a DOI link to, say, your Excel spreadsheet, but it is preferable to have a landing page with more information on how that object can be used. We also expect longevity, but we leave it to community to decide what longevity means for them.

Comment – Paul Walk) I absolutely agree about the necessity for that, but from a machine to machine process that isn’t as much of a priority…

A1 – Rachael) We recognise the importance of M2M interfaces, but we argue that shouldn’t be the default. So from that page you might then have the information there on how to access in an M2M way.

A1 – Varsha) Actually for privacy and sensitive data

Q2) For some of our researchers longevity might mean 50 or 100 or 200 years and longevity can really be about preservation in the long term…

A2 – Peter M) And that is about format of course, having the technology to read that data is its own challenge.

A2 – Paul Walk) I was at something at British Library talking about longevity in terms of generation, and that seemed like a useful approach.

Comment – Rachel Bruce) That came from the National Science Foundation work

Q2) It is also about funding around preservation.

A2 – Reza S) The scale of data and data sets is also changing really quickly. But even recent data sets are effectively archived. But it is so hard to know what the technology will be, what science will evolve into…Is there a solution or approach that works here?

Comment) An astronomical image 10 years back versus now allows you to see what has changed, an archeological site you probably dig up once… You can’t re create that data… But then we can’t keep everything!

A2 – Neil CH) I think sometimes the data can be recaptured, sometimes we only have one shot. But in many cases it is interesting that preserve the data. Is it for reuse and sharing? Or is it for checking and comparison? Those two approaches have very different timelines and requirements associated with them. It is not always the data that needs preserving.

Comment) Surely the whole point is that we cannot predict what others might want to do with our data…?

A2 – Varsha) Sure, the historical ships logs being used in climate change are a great example.

A2 – Neil CH) Interestingly those ships logs can be used as our means of expression haven’t been changing that much. But in software we are used to moving on… And that is much harder to go back to. If we forget how to read a PDF file, that would be a disaster… But we have a lot of examples. We have to be careful not to support niche standards if we are talking about long term preservation

Comment) Do we know of data that has been well preserved but the means to read them has been lost?

Comment) I have word perfect files on my computer!

Varsha) A fellow researcher had a similar issue around use of Floppy Discs, which nowhere in his university was there any way to read those…

Comment – Kevin Ashley) The issue is also about what is worth doing… You can read a floppy disc but you have to want it enough to be worth a high level of expense.

Varsha) Do you have researchers depositing data? What are the issues there about deposit and reuse?

Les) A lot of my research is about social media and existing data, and there the existance and readibility isn’t the issue, but making some sort of collection of it, away from the wild west of the web, but curating as a selection within the University, we create all sorts of ethical and legal problems. That is the issue when we are gathering data from lots of people are interacting. The deposit mechanism isn’t the issue, it’s convincing people that is the right thing to do, the processes around thinking that through, the data for access, for anonymity…

Neil CH) In my experience as a part time researcher and we have been creating data sets. Because I give talks on license and data policy for RCUK I feel I should be able to do all those right things with my own data… So for me, asking this room, why is it so difficult to do the right thing here? I put my data in a data repositories, share my colleagues names, also my asset register… But I can’t just give that my DOI so that all of those details get imported in. This is where trust breaks down. If I can’t be bothered to add all authors, and it’s just me, then I’ve broken the compliance. I use PURE and if I have a copy in RoMEO that can be one click, and I love that. But everything else should be easy too.

Comment) Rather than have a go at publishers, lets have a go at museums! I am a palaeontologist. I have a great 3D scan of bones I am researching… I’d love to share that with the world but if I did I would be in trouble as the museum believes all images created there are their property. It is a political issue though. If collections management and commercial arms of the museum can be talked to, you are fine… Unless there is deemed to be a potential commercial application/use of that scan.

Rachael K) There is one library allowing photographs of their material, if in appropriate copyright state, and those are shared on Flickr. But the people taking pictures have to understand what they can take pictures of… Digitisation is expensive. Phone cameras in a reading room isn’t great, but

Comment) My feeling is that the copyright on a 110 million year old bone should have expired!

Neil CH) We are looking to work on a project at the Natural History Museum where some of the same issues arise – about who owns copyright of derivative products in that way, for educational use. It may be that educational use may be a way to do that in future, but still too early days yet.

Comment) In Germany they have the view that if they make the scan of their own materials, they hold data, but others scanning it can do as they wish.

Neil CH) I think in Australia they have also had some quite forward thinking examples there.

Varsha) We have drifted from repositories a little… But in our last few minutes what are the best ways to support our communities around repositories? How can we say that a repository is trustworthy?

Comment) I think for me that issue of trust being part of how easy it is to deposit, is important. The issue I find is that it is also hard to find and discover data…

Peter M) That is changing though… In biology that is improving, asit is known that it is important that data is discoverable.

Comment) Perhaps rather than prescribed repositories or journals, there is a peer review process. When you say that it is peer reviewed, does that include the data?

Varsha) Yes, that includes the data and that it is shared in the right repository. We make sure that we can access the data, download files, etc. before we will publish. We only publish if that is appropriate.

Neil CH) We do similar. We have a list of repositories and documentation that helps ensure that data is accessible. And of having identifiers, and some sort of plan for managing that software. I actually kicked off a debate, inadvertantly, that this is an expensive checking process and it is at the wrong end of the cycle… So there is an arguement that you should pre-register before you generate data, and that that should be signed off at the end. An interesting idea for having peer review atthe outset, not after generation of data.

Reza S) These are good questions. It can take a long time to go through that process. Repositories are usualy at the end of the process, and there are issues there… It takes time… But that is culture change. For every year working on data you should expect maybe 3 days of curation work before depositing, in my experience.

Varsha) And on that note, thank you to all of our panel and for all your excellent questions.

Dominic Tate is announcing our drinks reception, remember whilst you are out there to vote on your favourite poster!

And, with that, the blog is done for the day. Remember to pass on comments, corrections, etc. and we will be back tomorrow for Day Two of Repository Fringe! 

MOOCs in Cultural Heritage Education

This afternoon I will be liveblogging the MOOCs in Cultural Heritage Education event, being held at the Scottish National Gallery of Modern Art in Edinburgh.

As this is a liveblog please excuse any typos and do let me know if you spot any errors or if there are links or additional information that should be included. 

Our programme for today is:

Welcome and Intro – Christopher Ganley (ARTIST ROOMS, National Galleries of Scotland and Tate)

Image of Christopher Ganley (National Galleries of Scotland) Christopher is the learning and digital manager for the National Galleries of Scotland and Tate. In case people here don’t know about Artist Rooms, this is a collection that came to Tate and NGS in 2008. Around 1100 items of art from Anthony d’Offay with the National Heritage Memorial Fund, the Art Fund, the British and Scottish Governments. The remit was to be shared across the UK to engage new audiences, particularly young people. The collection has grown to around 1500 items now – Louise Bourgeois is one of the latest additions. The Artist Rooms Research Partnership is a collaboration between the universities of Glasgow, Edinburgh and Newcastle with Tate and NGS led by the University of Edinburgh. And today’s event is funded by the Royal Society of Edinburgh and has been arranged by the University of Edinburgh School of Education as part of the outreach strand of their research.

Year of the MOOC?: what do Massive Open Online Courses have to offer the cultural heritage sector? – Sian Bayne, Jen Ross (University of Edinburgh)

Sian is beginning. Jen and I are going to situate the programme today. Jen and I are part of the School of Education working in Digital Education, and we are ourselves MOOC survivors!

Image of Sian Bayne (University of Edinburgh)We are going to talk about MOOCs in a higher education context, and our research there, and then talk about what that might mean for museums and the cultural heritage context. Jen will talk about the eLearning and Digital Culture MOOC and expand that out into discussing cultural heritage context.

So, what do we know about MOOCs? It’s a bit of a primer here:

  • Massive: numbers. Largest we ran at Edinburgh had 100k students enrolled
  • Open: no “entrance” requirements.
  • Online: completely.
  • Course: structured, cohort-based. And we don’t talk about that so much but they have a pedagogy, they have a structure, and that distinguishes them from other open education tools.

In terms of where MOOCs are run we have EdX – they have no cultural heritage partners yet. We have Coursera and they do have cultural heritage partners including MOMA. And FutureLearn who have cultural heritage partners yet (but not who are running courses yet).

The upsides of MOOCs is that they have massive reach, a really open field, high profile, massive energy, new partnerships. But on the downsides there are high risks, there are unproven teaching methods – and the pedagogy is still developing for this 1 teacher, 20k students kind of model, and there is a bit of  a MOOC “backlash” as the offer begins to settle into mainstream after a lot of hype.

In terms of cultural heritage there isn;t a lot out there, and only on Coursera. American Museum of Natural History, MOMA, California Institute of the Arts and the new Artist Rooms MOOCs are there. Some interesting courses but it’s still early days, not many cultural heritage MOOCs out there.

So in terms of the UK Jen and I have just completed some research for the HEA on MOOC adoption. One aspect was which disciplines are represented in UK MOOCs. We are seeing a number of humanities and education MOOCs. FutureLearn have the most of these, then Coursera and then there are cMOOCs in various locations. In terms of the University of Edinburgh we launched our first MOOCs – 6 of them across 3 colleges – last January and were the first UK university to do so. This year we have 7 more in development, we have 600k enrollments across all of our MOOCs and sign ups for the Warhol MOOC is well past 10k already.

So why did we get involved? Well we have a strong and growing culture of digital education,. It was an obvious for us to take that step. There was a good strategic fit for our university and we felt it was something we should be doing, engaging in this exciting new pedagogical space. Certainly money wasn’t the motivator here.

MOOCs have been around for a while, and there is still some things to learn in terms of who takes them, who finishes them etc. And we’ve done some research on our courses. Here the Philosophy MOOC saw over 98k students but even our smallest MOOC – equine nutrition- saw a comparable number of registrations to our total on campus student body (of approx 30k). Of the 309k who enrolled about 29% of initially active learners “completed” with a range of 7 – 59% across the six courses. We think that’s pretty good considering that only about a third of those who signed up actually accessed the course – of course it’s easy to sign up for these and hard to find time to do them so we aren’t worried about that. The range of completion is interesting though. We had 200 countries represented in the MOOC sign ups. And age wise the demographic was dominated by 25-39 year olds. And we found most people who took the MOOCs, at least in the first round, mostly had a postgraduate degree already. They were the people interested in taking the MOOCs. And now over to Jen…

Image of Jen Ross (University of Edinburgh)Jen. I want to tell you about the experience that lecturers and tutors had on the eLearning and Digital Cultures MOOC that took place last January. Firstly I wanted to talk about the xMOOC and the cMOOC. the xMOOC is the highly structured, quite linear, institutional MOOCs – the Coursera or FutureLearn model. Some peer interaction, but as a side benefit of the content as the main thing. Teacher presence in these sorts of MOOCs tends to be very high profile – the rock star tutor concept. You won’t meet them but you’ll see them on video. A lot. The other sort is the cMOOC, the connected MOOC. these were thought of by Canadians in 2012/13 before MOOCs became built. Around the theory of connected environments, participants create the course together, very loosely structured, very collaborative, very focused on participant contributions. Not about the rock star professors. This difference has been quite a big press thing, xMOOCs have had a bashing, people suggesting they are “elearning from 1998 minus the login button”. But actually what Sian and I have been finding is that in ANY MOOC we see much more than these two different forms. Our own MOOC is really neither an xMOOC or a cMOOC but had a lot of other content.

So our MOOC, #EDCMOOC, was based upon a module of the MSc in Digital Education module that generally has about 12-16 participants, and instead trying these ideas about the self in online environment in a MOOC format, at huge scale. So we decided rather than doing week by week lecture heavy format, we would do something different. Instead we did a “film festival” – clips for participants to watch and talk about. Then some readings on theory of digital education. And questions to discuss. We asked students to create public facing blogs which we linked to, we also used the built in discussion spaces. And instead of weekly tests etc. we had a single peer assessed “digital artefact” final assignment.

We gathered all blogs, which they had registered with us, in one place – so you could see any post tagged with #EDCMOOC. And we had a live hangout (via Google+ / YouTube) at the end of every few weeks – and we would pick up on discussions, questions that were coming up in those discussions and coming in live. The students themselves (42k of them) created a Facebook Group, a G+ group, used the hashtag but also these additional groups meant there was so much material being produced, so much discussion and activity beyond a scale anyone could keep up with. A hugely hectic space for five weeks, with everyone trying as best they could to keep an eye on their corner of the web.

Bonnie Stewart described our MOOC as “subverting it’s own conditions of existence”. And it was a chance to rethink that xMOOC/cMOOC divide. But also what the teacher is in a MOOC. What it means pedagogically to be in a MOOC. There are interesting generative questions that have come out of this experience.

So, I want to show you some examples of materials participants made on the MOOC. Students shared these on Padlet walls. We also had an image competition halfway through the MOOC. e.g. “All Lines are Open” by Mullu Lumbreras – the Tokyo underground map re-imagined with many “You are here” markers – emphasizing the noisiness of the MOOC! There were many reflective and reflexive posts about students trying to get to grips with the MOOC itself, as well as the content. There was such a variety of artefacts submitted here! There were images, videos, all sorts of assignments including super critical artefacts, such as Chris Jobling’s “In a MOOC no-one hears you leave” – although interestingly we did. There was also a chatbot assignment – allowing you to talk to “an EDCMOOC participant” and used comments from chats and from the course to give back comments, really interesting comment on the nature of the MOOC and the online environment. We also had a science fiction story all created in Second Life. This must have taken such a lot of time. We have found this on the MSc in Digital Education as well that when you give people the opportunity to create non textual assignments and contributions they give such creative and take such a lot of time over their multimodal work.

We also had  – a nod for Artist Rooms colleagues – a Ruschagram tool as an assignment. And indeed people used their own experience or expertise to bring their own take to the MOOC. Artists created art, scientists drew on their own background. Amy Burbel – an artist who does lots of these online videos but this one was all about the EDCMOOC.

Image of Jen Ross and Sian BayneSo I’d like to finish with some ideas and questions here for discussion… Elizabeth Merritt from the Centre for the Future of Museums asks about MOOCs in terms of impact. Rolin Moe talks about MOOCs as public engagement on a different scale. Erin Branham asks about reach – why wouldn’t you run a MOOC even if only 20k people finish. We have comments on that actually… David Greenfield emphasises the innovation aspect, they are still new, we are still learning and there is no one single way that MOOCs are being used. There is still a lot of space for innovation and new ideas.


Q1) I work at the Tate in visual arts, the idea of assessment by multiple choice is very appealing so I wanted to ask about peer assessment. How did that work? Did there need to be moderation?
A1 – Jen) It is quite controversial, that’s partly as the MOOC platform don’t handle peer assessment too well. We didn’t get asked too much to remark assignments. Peer assessment can work extremely well if the group know each other or share a common understanding.

A1 – Sian) It was strange how assessment focused many people were for a non credit bearing course though, they wanted to know how to pass the MOOC.

Q2) I wanted to ask about the drop out which looked absolutely huge…

A2 – Sian) You mean people who didn’t begin to engage with the MOOC? It is problematic… there has been a lot of criticism around drop outs. But we have been looking at them from a traditional education point of view. MOOCs are free, they come in, they sample, they leave. It’s about shifting our understanding of what MOOCs are for.

Q2) What did you learn from that…?

A2) I think it would be too hasty to make too many conclusions about that drop off because of what it means to be in a MOOC

A2 – Jen) there is some interesting research on intentions at sign up. Around 60% of people signing up do not intend to complete the MOOC. I don’t think we will ever get 90% retention like we do on our online MSc. But Sian’s point here holds. Different demographics are interested for different reasons. Retention on the smaller equine science MOOC was much more about the participant interest rather than the content or pedagogy etc. The 7% retention rate was the more innovative assessment project.

Q3) We would love to have that data on drop outs. We aren’t allowed to fail at that rate in public. I work in the National Library of Scotland and we know that there is “library anxiety”.  I would hate to think this is a group with inflated library anxiety!

A3) Absolutely and I know there will be more on this later on. But its about expectation setting within the organisation.

Q3) Just getting that data though – especially the research on those who don’t want to complete – would be so valuable for managing and understanding that completion in open contexts.

Q4) Perhaps the count should be from the first session, not from those who sign up. It’s not the original email we are concerned with but the regular drop out which would be more concerning. We get people doing this with on site free experiences. This is more about engaging with the higher up decision makers and marketing about how we could use MOOCs in cultural heritage.

A4 – Sian) It was unfortunate that many of the MOOCs really marketed sign up rates, and inflated expectations from that, as a way to promote the MOOCs early on. Very unhelpful to have messages like “we want this one to hit a million sign ups!”

Q5) These aren’t credit bearing but are there MOOCs which are, how do they work?

A5 – Jen) Quite new territory. Some allow you to have some sort of credit at the end of the MOOC on payment of a fee. And some – including University of Central Lancashire – are trialling MOOC credit counting for something. Work at European level there too. But no one has cracked the magic bullet.

A5 – Sian) Two offering credit so far – one at Oxford Brookes, one at Edge Hill.

Q5) Maybe credit will appeal to those currently absent from the demographic profile – moving to those with few or no higher level qualifications

A5 – Sian) we did ask people about why they did the MOOC, many for fun, some for professional reasons. none for credit.

Q6) what are the indirect benefits of the programme?

A6 – Sian) We have had five or six people enrolling on the MSc as a direct result of the MOOC. We also got great publicity for being at the forefront of digital education which is great for the University. That indirect benefit won’t last of course as MOOCs get more mainstream but

A7 – Sian) 40 days academic staff time to develop, 40 days to deliver it. And that doesn’t include the Information Services staff time to set up the technology, In terms of participants I’m not sure we have that data

A7 – Jen) We kind of have it but it’s taking a long time to analyze it. You get a lot of data from the MOOCs. There is a whole field of learning analytics. We have the data from both runs of the MOOC but it’s hard to find the best way to do that.

Q7) Interesting, for people reflecting on their own time investment

A7) We gave guide time of 5-6 hours per week for the basic involvement but actually many people spent a lot of time on it. And there was a lot of content so it took that long to read and engage with it for many participants.

Q8) How do you assess 40k people?

A8 – Sian) Well that’s why we spent a lot of time trying to make the assessment criteria clear for people marking each other.

Q9) Can you say a bit more about xMOOCs and cMOOCs. A lot seem to be xMOOCs?

A9) There is a lot of discussion around how to go beyond the bounds of the xMOOC.

A9 – Sian) Our MOOC was seen as quite innovative as we were a bit of a hybrid, but a lot of that was about participants using social media and just having a hashtag made a difference.

Q9) So are there people trying to move out of the platform…

A9 – Jen) for the credit and microcredit courses you try to bring students into the MOOC platform as that is easier to measure. And that’s an area that is really becoming more prominent…

A9 – Sian) Would be sad is the move towards learning analytics took away the social media interactions in MOOCs.

A9 – Jen) We do see AI MOOCs where there is some opportunity to tailor content which is interesting…

Comment) Can see these working well for CPD.

:: Update: Jen and Sian’s Prezi can be viewed online here ::

The changing landscape of teaching online: a MoMA perspective – Deborah Howes (Museum of Modern Art)

It is a pleasure for me to tell you just a little bit about what has been going on at MOMA, especially having to spoken to just a few of you – I realise you are very savvy digital education, cultural education audience.

I like to start with this slide when I talk about online learning at MOMA – of MoMA education broadcasts in the 1950s. We have always been interested in technology. It is part of our mission statement to educate (the world) about the art of our time. This image is from the 1950s when MoMA had an advanced idea of how to teach art and creativity – and they invited TV crews in from Rockafeller Centre to record some of what was going on in terms of that education.

So online learning for MoMA can be as something as simple as an Online Google Hang Out working with seniors who go on a field trip once a month without them having to leave their apartment – they have a museum visit and discussing the art. Some have mobility issues, some have learning disabilities. But they have these amazing opportunities to visit and engage all the time for free. We use Google Hangouts a lot and this is an example that really hits home.

Image of Deb Howes (MoMA)

This example, like much of what I’ll talk about today, isn’t strictly a MOOC but it’s from that same open online concept and the MOOC is changing. However we have, at MoMA been running online courses since 2010. These are NOT MOOCs as we charge for them. You can take them in two ways. You can be self led and there is no teacher responding to you and there are no students but you go at your pace whenever you want. Or you can do the teacher led version with a teacher, with fellow students, with responses to your comments. We started the concept of starting these courses. We did this with Faith Harris, who now works at Khan Academy, and she was teaching online in the New York Museum of Fashion. She had a clear idea of what the format was – a structured course led by an educator. We did a studio course – how to paint – to see if that would work. That seemed such an usual idea at the time but they are really popular, especially as an instructor led experience. They like to see and share progression and to get feedback on that. Just like a real studio experience. So the “how to” videos, one of the things we tried to replicate online was the feel of exclusivity you have in an on-site course. If you enrol in person you get to paint in our studio then you get access to the galleries when no-one else is around. So here we have Corey Dogstein and he’s also an artist, the students love him, but you can see this video of how to paint like Jackson Pollock and really get into that free form, jazz playing vibe.

My previous role I came from a gallery where I had no idea who was doing my tour, or what they were getting from it, then I was in an academic place where I knew who everyone was, how they were progressing, assessing them etc. So in this role the online teaching experience has been really interesting. In particular taking out the temporarility and those barriers to speak up, you open up the accessibility to a much much wider audience. The range of learning difficulties that students come in with and feel able to participate online, that wouldn’t feel able to participate as fully in person is striking.

We use a course management system called Haiku. No matter what you do it looks like a bad high school newspaper. It organises content top to bottom, welcome messages, etc. 60% of our students to the MoMA online course have never taken an online course before. They tell us they’d rather try it with us! We have a lot of first timers so we have to provide a lot of help and support. We try to make them engaging and lively. The upside of the highly controlled space is that the teachers themselves are making these courses, it’s easy for them to change things, that’s the upside.

We try to think thematically about content, rather than thinking academically along a timeline say. So colour as a way to explore modern art came to mind, and also broadens the base beyond painting and sculpture – design and architecture for instance. So this way we can interview the curator of design, Paula Antonelli, on colour in design. [we are watching a clip of this]. Talk about exclusivity! Even on my 11 o’clock tour I couldn’t get you time with Paula. The students really respond to this. And we also created videos of the preservation techniques around colour.

This course: “Catalysts: Artists creating with sound, video and time” brings all those ideas together, and is a hybrid xMOOC and cMOOC although I only just realised this! We got the author Randall Packer to put this history together using artefacts and resources from MOOCs. It’s so hard to do this history – why read a book on the history of video artworks?! As an educator how many museums have the space to show a whole range of video art? Even at the new Tate underground you have a rotating collection. Rare to have an ongoing historical way to explore these. One of the reasons MoMA was able to jump into online courses feet first, is that Volkswagen are a corporate sponsor of the galleries and were keenly supportive. And as part of teaching the Catalyst course Randall, who is also a practicing artist, thought it would be great if we could get students to make and share work, wouldn’t it be great to make a WordPress blog they could use to share these and comment on each other. And my colleague Jonathan Epstein suggested digital badges – they get a MoMA badge on their blog and badges for LinkedIn profiles etc.

So, over three and half years we’ve registed about 2500 students. Small versus MOOCs but huge for us. Around 30% of enrolees are not from the US and that 30% represents over 60 countries. For us it was about engaging people in a sustained way with people who couldn’t come to MoMA or couldn’t come often to MoMA, and we really think we’ve proved these. This is one of those pause moments for us… so, any questions…


Q1) That quote on your slide “the combination of compelling lectures with the online galery tours and the interaction with the other students from around the world was really enlightening and provocative” – what do you learn from these participants?

A1) We do find students who set up ongoing Facebook groups for instance, and they are really active for a long time, they will go on a trip and write to their peers about what they’ve seen. We learn whilst they take the course, but also over time. What is so hard for museums to learn is what the long term impact of a museum visit… there is no way to know what happens months or years later, or when they are at another gallery… But you get a sense of that on the Facebook groups.

Image of Deb Howes (MoMA)

Q2) At the moment it’s $25 to come into MoMA. How much are the courses?

A2) It is. But it’s a sliding scale of prices. For self-led courses… 5 weeks is $99 if you are a member. or $150 for a non member (of the museum) 10 week course. For instructor led it’s $150 to $350 per course depending on time etc. They may fluctuate, probably go down. I like the idea of a cost recovery model. Free is hard for me as instructor. But there is a lot of free stuff, and especially in the MOOC world, they are comparing what’s available, what the brand is worth, which is worth doing.

Q3) Member?

A3) Of the museum. Typically at the museum you get lots of discounts, free entry etc. as part of that. I think it’s about $75 for an individual membership right now and that’s part of a wider financial ecosystem I don’t get into too much.

So… we have all these courses… We got contacted by Coursera who said “oh sorry we can’t take your courses as you don’t award degrees” but here is a sandbox for K-12 for you. In fact MoMA does a huge amount for teachers. We had just done a huge new site called MoMA Learning with resources for all sorts of classes. So we thought, well this will be our textbook essentially. If we leave it there we don’t need to renogiate all the content again. So we decided to do a four week “art and inquiry” MOOC. There is a huge focus in the core curriculum on discussions around primary source materials, we do a lot of training of teachers but we can’t fit enough of them in our building. We have taught a class for teachers around the country, perhaps beyond, who come for a week in the summer and talk about inquiry based learning. It just so happened when this came together that we were the first MOOC in the primary and secondary education sandbox – I think that has everything to do with why we had 17k ish participants. We had a “huge” engagement ratio according to Coursera, they told us we were off the charts – people are watching the videos “all the way to the end!”. Huge validation for us, but if you think carefully about all the ways people are learning that satisfy them, people look for something to engage with – and museum educators are great at this, great at finding different ways to explain the same thing.

At the end of the course we had a survey. 60% were teachers. The rest were taking the course for different reasons – doctors wanting to talk about x-ray results better with patients. 90% of all those who answered the survey had not been to MoMA or had an online MoMA experience but they did visit the website or site afterwards. We had more friends, we had people following and engaging with our social media. It was a wonderful way to have people access and engage with MoMA who might now have thought to before.

So I have a diagram of MOOC students. It is kind of Ying-Yang. The paid for courses tend to be my age or older, highly educated, have been to many international galleries. Coursera they are 20-30 year olds, it’s about their career, they take lots of Coursera courses. And what struck us was that putting our content beyond the virtual museum walls, people really want to engage with it. In the museum we want people coming to us, to speak to us, but here they don’t visit us at all but they still want to engage.

We had 1500 students get a certificate of completion. In MoMA we have 3 million admissions per year. I have no idea how many take that information with them. For me as a museum professional 17k people made an effort to learn something about MoMA, word is out, and I taught 1500 teachers in the way I would like to in an academic way, and I taught more than I could teach over three years, but in one single summer. And the success of that means we have followed up with another MOOC – Art and Activity: Interactive Strategies for Engaging with Art. The first one runs again soon, this new course runs from July.

There are a few other things we do online… MoMA Teens Online Course Pilot. This was a free 5 week course in art appreciation at MoMA. These were teens that had taken probably all our teen courses as part of after school programmes. They brought back to us this Real World MoMA episode. [very very funny and well full of art in-jokes].

You get the idea right? I should just let the teens do all the videos! We have a new group of teens coming in doing a completely different thing. This is their medium, they understand. They combine the popular with the collection in an unforgettable way, the kids will never forget these five artists they focused on.

I just want to go through some pedagogical background here. There is a huge body of really interesting reseach on how the brain works, what makes memories… One of the things I always try to think about is what makes your brain remember, and why a museum is such a great way to learn. So one thing that is that you learn when something new comes in – a new sight, a new sound, a new smell… Museums are like that. They are new experiences. For children they may never have been to a museum or even to the city before. I try to make the online courses take that into consideration. How can we do that, and make the brain hold on to what it being learnt?

I don’t know if Howard Gardner is familiar to you? His ideas that different brains work differently, and that we need to present material in different ways for different people. We have hands on aspects. We have scientist experts, we have critics… we try to present a range of ways into the material.

So here also is some student feedback – the idea that there is more in the course than can be absorbed but that that is a good thing. We also try to ensure there are peer to peer aspects – to enable sharing and discussion. So here we have the learning communities from that studio course – where participants share their art… increadible learning experiences and incredible learning communities can exist beyond the museum and beyond the university but it is great to be there to support those communities – to answer questions, share a link etc.

I wrote a post you might like: search for “how to make online courses for museums”

Moving forward we have a couple of hundred videos on YouTube but we were asked if we would put these into Khan Academy. We filtered the best down, gave them embed codes, and they have created a structure around that. As a museum you don’t have to do everything here, but reusing is powerful.

And moving forward we are doing some collaborations with the University of Melbourne.

And my forcast for Museum-University Partnerships forecase? Sunny with a chance of rain! There are real challenges around contracts, ownership etc. but we can get to a place of all sunny all the time.

Q1) We would be developing online learning as a new thing. When you decided to go down the online route did you stop anything else? Did you restructure time? How does that fit with curator duties?

A1) We didn’t drop anything. The Volkswagen sponsorship allowed us to build the team from myself and an intern to include another individual. But it’s a huge time commitment. Curators don’t have the time to teach but they are happy to talk to camera and are generally very good at it. I was at John Hopkins, and previously to that at the Metropolitan Museum… I was used to having media equipment to hand. There wasn’t that at MoMA but we created a small studio which makes it easy for curators to pop in and contribute.

Q2) Could you say a bit about the difference of practical versus appreciation type class?
A2) for practical classes the key is *really* good videos. Being able to replay those videos, if shot well, is really helpful and clears up questions. It lets them feel comfortable without asking the teacher over and over again. If you’ve ever been in a group critique that can be really intimidating… turns out that the level of distance of photographing your work, post online, and discuss online… students feel much better about that. There is distance they can take. They can throw things at the wall at home as they get critiqued! It is popular and now online you find a lot of low price and free how to courses. But our students who return it’s about the visits to the gallery, the history of the gallery, connecting the thinking and the artwork to the technique

Q2) So unspoken assumptions of supplies available?

A2) No, we give them a supply list. We tell them how to set up a studio in their own bedroom etc. We don’t make assumptions there.

Beyond the Object: MOOCs and Art History – Glyn Davis (University of Edinburgh)

Our final speaker is one of the “rock star lecturers” Jen mentioned!

So, in comparison to the other speakers here the course I have been preparing has not yet run. We have just under 12000 signed up so far, we anticipate around 20k mark. I am an academic and I teach film studies, particularly experimental cinema. A lot of the films I talk about it can be hugely hard for people to get hold of. That presents massive difficulties for me as a researcher, as a writer, but also for these sorts of learning experiences.

Where I want to start is to talk about Andy Warhol. A book, Warhol in Ten Takes, edited by myself and Gary Needham at Nottingham Trent University. We start with an introduction about seeing a piece called “does Warhol make you cry?” at MoMA – and he was at the time. So many rights to negotiate. That book is solely about Andy Warhol’s cinematic work, focusing on 10 films in detail. Those that are newly available from the archive, those where there was something new to be said. He only made films for five years – making 650 movies in that time. A lot even in comparison to Roger Corman (5 a year or so). Some are a few minutes long, some many hours. The enormous challenge was that in 1972 Warhol took all of his films out of circulation – he wanted to focus on painting, he was getting sued a lot by collaborators who wanted money from them. And they remained that way. Just before his death he said “my movies are more interesting to talk about than they are to watch”. He may have been joking but that sense has hung around studies of his work. Take a film like “Empire” (1964) it’s a conceptual piece – 8 hours and, in terms of content, time passes and it gets dark – has been little shown. Very few of his films are in circulation. MoMA has around 40 circulation copies available but that’s a rare place you can see them, you can see screenings at the Celeste Bartos screening rooms. The only other place to see them is at the Warhol museum in Pittsburgh on VHS. If not that its 16mm. You can’t pause or rewatch. It’s cold. It’s really hard to do Warhol research… so many pirate copies also out there…


So are his films worth seeing or are they just conceptual pieces? Since the films have started to come out of the archives films like Empire have been shown in their entirity… people then discuss the experience of sitting through all of them. Indeed in his PhD thesis (Motion(less) Pictures: The Cinema of Stasis), Justin Remeselnik suggests they are “furniture films” – you can admire and engage with them but not to be paid attention to for an increadibly long time… and yet in Pamela Lee’s book Chronophobia talks about seeing Empire the whole way through, as a phenomenological record of pain it’s fairly incredible. She’s not alone here… another writer, Mark Leach, asked an audience to provide live tweeting during a screening of Empire, and then compiled these into the book #Empirefilm.

This is a long diversion but… Gary Needham and I tried to think hard about the experience of the Factory and the working environment there, what was it like to see Warhol’s films in the context of other experimental filmmakers in the 1960s. In trying to put together a MOOC these ideas sat with me, as the rights negotiations for the book took place over 18 months. We had 30 new images created – we had to apply for grants to get these made, rather than reproduced – by the Warhol museum. We had materials from BFI. We were able to use publicity materials as well. And we had to get agreements from so many people. The Whitney Museum has a Warhol Film Project and acted as our fact checker. It’s a 500k word book so that took some time. One of Warhol’s assistants, Gerard Malanga, allowed us to use his diary entries in the book. I came to Warhol knowing the rights access issues. And I came to the MOOC knowing those issues, knowing the possible time lag…

Chris provided a great introduction to Artist Rooms earlier. I head up the Art and it’s Histories strand. Sian and Jen head up the education strand but I work with artist historians and theorists doing research projects around the materials. So making a MOOC was an idea we thought about as a way to bring out Warhol to a wider audience, and to highlight the Artist Rooms content. I had a lot of questions though and I knew we could not use moving images at all. Could we talk about Warhol’s work without images or clips? What does that mean? Can we assume that people taking the course might source or be able to watch those things. I’ve been teaching Warhol for 15-20 years. I can show all manner of images and clips to students for teaching which are fine to use in that context but which would be impossible to use online for copyright and provenance reasons.

So, there are roughly 250 Warhol pieces in the Artist Rooms collections. There are particular strengths there. There are a great number of posters, as Anthony d’Offay said to me, these give a great overview of events during his lifestyle. There are also stitched photographs – another strength – and these are from the end of Warhol’s career. There are not many so to have a number to compare to each other is great. There are also early illustrations and commercial works. And there are self portraits from the early to mid 80′s. So for me how do I put together a course on Andy Warhol based on this collection? His most famous work is all from about 1962 to 1966. These pieces are silk screens of Monroe, Electric chairs, guns, Campbells soup cans. They are hugely expensive and not in the collection. But are these so familiar that I can assume those taking the course will know them. But the other partners in Artist Rooms – from the National Galleries of Scotland and the Tate – that did cover some of this famous 1960s material, to sex up the course a bit!

So this let us take shape. This will be a five week course. Each week will be a video lecture from me (sex, death, celebrity, money, time) and then a video interview who have worked with Warhol’s work in one way or another – curators, academics, conservators etc. Who could give a fresh perspective on Warhol and what he means to them. I’ll come back to them shortly.

I’ve talked about Warhol’s ubiquity and that’s been an issue as we finalised materials, looked at editing videos. Warhol is one of the most well known artists in the world. His images circulate so widely on such a range of objects (maybe only exceeded by the Mona Lisa) that familiarity with them is high. You can buy just about everything – from mugs to skateboards… the Warhol story is extraordinary. What’s really interesting for anyone teaching art history or theory is that he provides a really interesting test case with regards to reproduction and distribution.

For instance the Marilyn Diptych ( Andy Warhol, 1962). This was based on a publicity still for the 1953 film Niagara which he cropped to his liking. He started to make works just after her suicide in 1962. They have been described as work in mourning. And they are important examples of pop art, collapsing the worlds of art and pop culture. But also commenting on the mass media reproduction of imagery. The uneven application across this piece suggest the blurring of images in newspapers, and the important difference between similar reproductions. Thomas Crow (in his essay for Art in America (May 1987), “Saturday Disasters: Trace and Reference in Early Warhol”) writes that Marilyn disappears quickly when you look at this work, what becomes clearer is the blurrings, the paint level variations. But I have been using this image to teach with Walter Benjamin’s essay on mass production in relation to art work. His essential argument is that endless reproduction, owning of facsimiles etc. changes our relation to the original. It could seem less valuable… or more valuable… as we have seen with Warhol’s work. And Warhol’s own work is a reproduction itself of course. And his painting is the valuable thing… not the press still…

Being able to talk about this work and reproduction through the MOOC and the digital format adds another layer. MOOCs raise the question of what the use of gallery visits may be. What’s the difference of talking about a work and engaging with the original piece. The process of art or art history has always involved travel to galleries, biennials, festivals. Writing about it means seeing the work, there are financial angles there, there are green angles there. For example I am going to Newcastle for three days to see “Crude Oil” (Wang Bang, 2008). It is a 14 hour movie, you can only see it in installation. I intend to move in… my husband thinks I’m mad!

And what about the experience of engaging with the stuff here. I spent three days at the Warhol Museum in Pittsburgh preparing for the MOOC watching to VHS, speaking to staff, and also looking at Warhol’s “time capsules” – receipts, ephemera, e.g. a box from 1978 is just “Concorde stuff”. I was accompanied by a curator, they opened boxes for me… some smelled bad due to moldy stuff, exploded soup cans, a still-inflated silly birthday cake which was a present from Yoko Ono. They are treated as art works. They are still cataloguing these things. So I spoke to the curators about how they are making the time capsules educationally engaging. They have video of celebrities going through them, for instance John Waters gives a great critique of one of the time capsules. They did a live opening, streamed to the ICA, of one of the time capsules. I mention these because these were really interesting examples of opening this type of content and artist up to others.

Let me just say a bit about how we have made the videos for the MOOC. My colleague Lucy Kendra who had filmed other MOOC content saw this filming experience as unusually immediate and intimate in form. We spoke to curators and conservators at the galleries, Gary at Nottingham, and Anthony d’Offay himself. We were also given access behind the scenes at the Tate Store – they took out 10 pieces as a backdrop which was so valuable. We had interviews of an hour, an hour and a half. We have so much materials. For the Warhol class there will be a required 10 minute version of the video, but we will then give a longer, possible unexpurgated, videos for those that want to see them the whole way through. These are fantastic and extraordinary videos. I think they are fantastic representations of these institutions but I think it may open the doors to careers in some of these roles. We hope they may open doors in ways other art education courses may not do.

These interviews I could not have forseen, but they have become the bedrock of the course, the USP, the main draw, and these first time perspectives on the artist and his career. Why Warhol is still of interest and the personal interests of the interviewees themselves. We started by thinking the issue would be about content and rights but the interviews have gone beyond the object there.

Image of Glyn Davis (University of Edinburgh)Q&A

Q1) Will there be assessment at the end? Will they be assessed by peers.

A1) Yes, I think there has to be for Coursera. I have PhD student Teaching Assistants. I have left some of those decisions to them. They have suggested allowing practical responses to the materials – to get a sense of materials and present day materials, contemporary approach. Or a short written text, a 2-300 word response to a work of their choosing – perhaps from Artist Rooms or perhaps another. These are great TAs though with ideas like building a map of the nearest Andy Warhol to the participant, opening up possible discussion of access. Peers will assess the work and this is where drawing on the expertise of colleagues who have run MOOCs before is so valuable.

Q2) When we did our MOOC we had an easier rights time but we really wanted to use films that it was hard to find legal clips to… we avoided anything we knew was of dubious origins. But we found students sharing those clips and images anyway! What do you plan to do with that?

A2) As far as I know the Warhol Museum in Pittsburgh are well aware that material leaks out… if our participants link to those things we can’t help that. We just create that distance and leave that in the students hands.

Comment – Debs) I feel your pain entirely! In addition to the academic excellence issue, at MoMA part of our job is about preserving the identity of the work, of the artists in our collections. We can’t distribute unofficial copies of works by artists in our collection, it wouldn’t look good. And yet… we were one of the first museums to go to Electronic Arts Intermix about using video online. They’d never really been approached to digitalise their works in that sort of context. The first person I spoke to was extremely pessimistic about these once-cutting edge technology using artists works being able to share these works online. We were able to say that in the environment of this course – a limited course, not a MOOC, we have a lot of details on them – it is very comparible to the classroom. We stream it and although you probably could capture the content but most won’t. They were OK with this. We got Bill Viola, Yoko Ono, etc. allowing us to stream the content. It was costly… but I hope as we push these boundaries more the artists and rights holders will go with that. Otherwise we will have a loss to art history and accessing this hard to reach art. That arguement of the most famous work being the most visible already is one I’ve used before, I hope that rings true.

Q3) Do you have specific goals – educational or a specific combination of enrolees – for this MOOC?

A3) There are two or three key goals. Part was a partnership between the university, the Tate and National Galleries. And part of that was about trying a MOOC as a way to do that. It might be that the Tate or National Galleries want to use one of those interviews somewhere else too. For me it is also about trying a new tool, and what is possible with that. I am interested in testing the boundaries of what Coursera will do.

Q4) With the MOOCs which you have completed… with hindsight now is there a lot that you would do differently?

A4 – Deb) Not a lot but… with the videos I wish we had done differently. I wish we had done them straight without “last week you did X”, or interviews with curators etc. I wish I had had the insight to bring in the right people or to make it more long term useful.

A4 – Sian) for our second run we did make changes. We refused to make videos the first time, we were being hard line. But the dominent comment online were “where are the professors” and “where are the videos” so we made introductory videos for each week. That was the most significant change.

And with that a really interesting afternoon is complete with thanks to organiser Claire Wright, and to the Royal Society of Edinburgh for providing funding for the event.

Find out more


My Summary of Thursday 1st August

Today we bring you a rather belated live blog (entirely your Repository Fringe blog editor’s fault) from guest blogger Valerie McCutcheon, Research Information Manager at the University.  She is part of a team that provides support for managing a wide range activity including datasets, publications, and other research outputs. Valerie blogs regularly at Cerif 4 Datasets.

Here is my brief summary of Repository Fringe Thursday 1st August.

Met some old friends and some great new ones and look forward to exploring several areas further including:

  • Open data – Are there good case studies out there from UK higher education institute based researchers that might illustrate to our researchers some potential benefits to investing time in making data openly available?  Had a good chat with Jacqui Taylor over lunch and hope we can follow this up.
  • Open Journal Systems, OpenAIRE compliance, and Repository Junction Broker – all sound like things we ought to progress and are on our list – need to see if we can find more time to investigate
  • Concerns over quality of data put out there e.g. by Gateway to Research – I will follow up with Chris
  • I wondered if the ResourceSync might be a useful option or at least concept to emulate to address synchronisation of data with RCUK outputs systems – I will follow this up with Stuart Lewis

Overall the day exceeded my expectations and I got a lot out of it – thank you!



LiveBlog: Closing Keynote

Peter Burnhill, Director of EDINA is introducing our closing keynote, something of a Repository Fringe frequent flyer. But he is also announcing that this year is the 30th birthday of the University of Edinburgh Data Library. There was a need for social scientists to store data and work with it. That has come a long way since. And we now face questions like curation, access, etc. Back to my first duty here… I had an email from Robin Rice in 2011 “we like FigShare” and wrote to the organising list “FigShare: could be the new data sharing killer app!” a bit of an understatement there. So, let’s find out what’s happened in the last two days. So, over to Mark!

Mark Hahnel – FigShare

So I am doing this PK-style as it’s Friday afternoon and we have people on stilts going past! Here we have people from institutions, from libraries. I’m not. We have different ideas so I want your ideas and feedback!

So I’m going to talk about open and closed… We’ll see where we get.

So FigShare let’s you upload your research. Yo can manage your research in the cloud. This has evolved since 2011. We can’t ignore why not all data can be open… So we have a private side now. Our core goal is still being Discoverable, Sharable (social media), Citable (DOI). discoverable is tricky!

We are hosted on Amazon web services, we are ORCID launch partner (only one with non article data I think), we are on a COPE (committee on publication ethics), we are getting DOIs from DataCite AND we are backed up in LOCKSS.

We wanted dissemination of content on the internet – its a solved issue. Instead of going backwards… Let’s see how we go forward by copying this stuff. In common these services like flickr, sound loud etc. visualise content in the browser – you don’t have to download to use.

So live demo number 1. So we have a poster here. Content on the left. Author there. Simple metadata, DOI, and social media shares. We’ve just added embedding – upload content to FigShare and use on your own site. So datasets are custom built in the browser – want to see your 2GB file before you download. You shouldn’t even be downloading, should all be on the web and will be. Ad we have author profiles. With stats including sharing stats. That is motivating. That rewards sharing. Think about who is involved in research. E try to do the other side of incentives action here too! Metrics are good. So is doing something cool with it. So for instance here is a blogpost with a CSV and a graph. So we have a PNG of the data… You can’t interact. But the CSV let’s you create new interactive charts. And we also added in ways to filter data.

We are also looking at incentivising to give back – doing research like an instant T test. Moving towards the idea of interactive research. But this is something that allows you to make research more interactive.

Q – Pat McSweeney) is this live or forthcoming?

It’s live but manually done. A use case for groups that use FigShare the most, that need special interaction for journals.

We are a commercial company but you can upload data for free. We work with publishers. We visualise content really well. So this is additional materials for PLoS, these are all just here on afigShare – theres a video? Play it! It’s how the internet work! Don’t download! We do his for publishers. Another thing we created for a publisher is that click open a graph, you get a Dataset. A researcher asked for it, we built it!

So, back off the internet…

So discoverable. What does that mean? google finds us but… Well is it hearsay? So DataCite started tracking our DOIs. For three months we were 8 our of top ten, then 7 out of top ten, then 9 out of to ten for traffic. So hey, we are discoverable!

But the future of repositories… Who cares?

So who takes ownership of this problem now – funders, stakeholders, or academics? I think it’s institutions and more specifically librarians. librarians are badass. They have taken ownership. They lead change, they try new things.

but the funders? Funders are really reacting to the fact that they want their data – it may be about what researchers want to reuse but really it’s about the impact of their spending. But they are owning that problem. NSF requires sharing with other researchers, similarly humanities. The EU are also talking about this – but not owning the problem, just declaring it really.

So looking across funders… Some have policies… Some stipulations… wellcome Trust withhold 10% of cash if you do not share data. That will make a difference. But what do you do with that data?

What about academics? Well they share data! I generated 9GB a year – probably in middle of the curve in terms of scale – in my PhD. So globally 3PB/year ish. But how much of my PhD is available? A few KB of data. My PhD is under embargo until later in the year, but it will be there.

I felt there were moral and ethical obligations. Sharing detailed research data is associated with increased citation. Simplicity matters, visualisation is cool. I thought it was about an ego trip, academics have to disambiguate themselves…

Now two years after leaving I was asked t come back ion and print excel files for my data for a publication… I generated this without a research data plan. Two years after I left my boss thinks I still work for her. She will hand the next guy working for her… What does he do, copy them back in?

So there is so much more here. It is not just open or closed, it is about control. It’s the Cory Doctorrow thing, the further you are from a problem, the more data you’ll give up, the Facebook issue. you do want control, it matters.

So what motivates academics? Being easy, being useful, and what do funders what – we will jump through hoops for them.

So back to the web… My profile has new different stuff but you’ll see sharing folders – group projects and discussions, ways to reshare that data. Nudge your sharing. But you need the file uploaded now to share two years later. You can share otherwise closed things with colleagues, regardless of institution.

Btw on this slide we have our designers idea of an institutional library – looks a lot like a prison.

So back to those libraries. How much data does an institution generate? Very few know this, how do you assess. Right now we are doing stuff for PLoS we let them browse all their stuff. They can see what they produced. And this aggregation is great for SEO too. Makes it easy to Google then find the research article from there. So from this aggregation we can filter top most viewed, to particular titles. Essentially this is a repository of research outputs, we take all formats. You can imagine that this could be there for any institution. And this has an API.

Institutions also want stats. See where traffic is from. Not just location but institutional IP ranges. So we can show where that item has impact, where viewers come from. But, at the same time populating repositories is hard. But we have data from Nature from PLoS. We can hand that data back to your repositories. We can find the association with the institution.

So it’s about control. It’s Research Data Management as well as Research Output Dissemination all in one.

So we have launched FigShare for institutions. We have heard concerns about metadata standards and how much metadata we have, so Henry Winlaker used our API to build a way to add more metadata to fit institutional needs. So if you share responsibility… Well what’s the point of the institutional repository? I would say that I think IRs are about to move fast. They have to, it was idealistic but now it’s mandated! Next year repositories will look very different. RDM plans say they have to. Funders say they have to.

This community is amazing! resourceSync is great, I want to use it! PMRs Dev challenge idea is great. We are commercial but we can work together!

Do we need to go back further? People use Dropbox, drag files in. We have a desktop app too. But maybe whenever you save a file maybe you need to upload it then. So at there is a project. A filesystem that nudges you to add metadata and do things as you are reqArded to do them. You can star things, it does version control. Digital science created this. It’s kind of like it can do so much more. So releasing it to see what’s needed. What’s really cool… You can download this now… If you press save now it saves it to FigShare. That sync would be ideal. Trying it out now. I work in the same office but there is no reason why these shouldn’t all be connected up to IRs to FigShare to all of these things…

And this is a slide specially for Peter Murray-Rust…

I know that openness is brilliant! But it’s also great to work with publishers. More files were made available for free, for academics, that’s great. Everything publicly available will ONLY be by CC0 and CC-BY. SHARE ALL THE DATA.


Q1 – Paul) what is the business model?

A1) for PloS it’s about visualisations and data. They lay us to do that. They have a business model for that. And FigShare for Institutions is coming that’s also part of the model

Q2 – Peter MR) I trust you completely but I do not trust Elsevier or Google… Etc. so you have to build organisational DNA to prevent you becoming evil. If you left or died what would happen to FigShare, yo see the point?

A2) I see that. But this is aimed at this costs us money. E sell to institutions but there are economies of scale. Two institutions have built their own data repositories and they cost £1million and £2million. Thats a lot of money.

Q2) Mendeley have a copy of all the published scientific data these days. FigShare will have massive value of data in it, huge worth, institutions may want to know what staff are doing, t spy on the,. You have something of vast power, vast potential value. The time is now to create governance structure to address that.

peter Burnhill) there are some fundamental trust issues

Mark Hahnel) you can trust the internet to an extent. Make stuff available and it proliferates but you can reuse, you can sell it on etc.

Peter Burnhill) next year we need a discussion of ethics

Q3 – Kevin Ashley) FigShare for institutions. can you say anything about the background consultation around that. A contract is very different to free stuff

A3) sure, legally we have a lot of responsibility. Eve been working with universities, individual ones, to see what the needs are. We spoke to lots of people. Mainly in London but to see we didn’t tread on toes, we didn’t risk their research leaking out. We spoke to institutions more globally. Digital science is a good thing, this is where they come in.

Peter Burnhill) I am a member of the CLOCKSS brand. There is contract between all publishers that CLOCKSS ingests everything they make available and it says that if a failure to deliver happens – for whatever reason – then CLOCKSS have the right to make that data available via platforms (one here at EDINA, one at Stanford) so in terms of assurance that what comes in goes out, joining CLOCKSS does that. The agreement is supra government. You give up that right there that it will remain available.

Mark: absolutely. And all data is available via the API if you want to.

Final Wrap Up – Kevin Ashley

Thanks you to mark for a great final session. So, at an event like this we come here to share ideas, we come to share experience, we look for answers, we come to meet people and to make new connections. We come to learn. We may come with one or many objectives. We at the DCC certainly have been able to. Many of you are new here.

I have learnt lots of stuff. A few things stuck. A whole room of experts can’t put an object into an EPrints repository, there’s a lesson there somewhere about interfaces. And the other interesting idea I picked up from les Carr. Maintaining open access and having a business plan for what we do. So the Dcc how to set up RDM licenses are free but limited edition leather bound copies to come – great idea Les!

I hope all of you did one or several of those things then share, tell us, this is an unconference! We want to keep making this event better every year. We see the event as being about you, about facilitating you to meet and connect.

There will be a Repository Fringe next year. One reason for that is that we have fantastic sponsors. All of whom put into this event. And hopefully we can extend that further next year. But thank you also to session chairs, the speakers, and to the organising committee here. I know how much work goes into this. And a great deal happens and happens smoothly because of that work.

Two people to thank specifically. Florance Kennedy of the DCC and our chair Nicola Osborne!


How to LiveBlog Part 2: My Top Ten Tips

In How to LiveBlog Part 1 I discussed why you should LiveBlog your event. But once you’ve decided that you will be LiveBlogging how do you actually go about it?  Well…

1. Be Prepared

To borrow a catchy phrase from the boy scouts (and Tom Lehrer) you should always be prepared!

For liveblogging there are several essential bits of preparation which will make your life much much easier:

  • Decide what you will be LiveBlogging – if you are one of the event organisers then talk with your colleagues about what will be useful to capture, what might not be appropriate to cover. Usually you can assume that talks and presentations will be fine to LiveBlog. It can be tempting to decide to cover the main content rather than any question and answer sessions but I would always recommend capturing question sessions – they are the easiest way to add value to an event write up as they are the least easy to capture part of the event (and may be absent from recordings, others’ notes, and obviously are not covered by slides), and they tend to add the most value to a session – surfacing all the issues, awkward questions and surprises that are often absent in a main presentation.
  • Be realistic in your planning – you cannot be in two places at once so don’t over commit your schedule. Full on LiveBlogging is tiring enough without adding running between rooms or buildings so make sure you can deliver the LiveBlogging you plan to.
  • Create draft posts for the session you want to cover – this is a simple and really effective time saver. It will force you to decide if you wish to blog as part of one long post or a series of shorter LiveBlog post. If you are organising a major event I would recommend setting up one post per session or (depending on presentation lengths) per presentation. This will help each talk stand out on your blog, be findable by search engines, and encourage your delegates to engage. If you are blogging an event you are attending then I would instead create a single blog post as you don’t want to jam your blog – and your RSS feed –  with loads of posts on one event and emphasize how frequently (or infrequently) you update your blog the rest of the time.
  • Prepopulate those draft posts -whilst speakers, titles and all kinds of details can change on the day it is increadibly useful to have somewhere to start your blog post. When I’m preparing for LiveBlogging a major event I will set up a draft post with a paragraph explaining the name of the event, a link to the event page and/or programme, a sentence explaining that “this is a liveblog so please be patient and let me know about any errors, typos etc.“. I will also add the speaker name, role, affiliation and talk title. This means all I have to do when their talk begins is to correct any key details (often the title!), add any important framing information (e.g. “well we’re just back from coffee…“) and start typing my record of the presentation/talk/discussion.

2. Work with your Limitations

When you are planning your LiveBlog you need to be aware of and work out how you will deal with any potential limitations, they might include:

  • Typing Speed – I am one of nature’s touch typists thanks to a misspent youth hanging around chat rooms. We can be a slightly smug bunch when it comes to liveblogging but what we gain in verisimilitude, we can lack in quality. Sometimes the very best liveblog summarises down to key nuggets. The popularity of visual notes (such as Francis Rowland’s excellent sketches shared on Flickr) are a super illustration of why summarising can be powerful. I may be able to grab almost every comment in real time (albeit with occasional typos) but slower typers can make for great and still very thorough LiveBlogs.
  • Acronyms – I do a lot of LiveBlogging of acronym-heavy events. If you know you are about to encounter a lot of these I’d recommend making a handy cheat sheet or keeping Google open in another tab or application for swift checking – it can mean the difference between an embaressing typo and a hugely valuable link through to a website/wikipedia page that enlightens others.
  • Come to Terms with Your Spelling and Autocorrect Demons – No LiveBlogger has 100% spelling or accuracy hit rate. The nature of the medium means errors will creep in. You either have to live with that or find a way to fix errors fast. Other attendees will often be happy to comment on your post and correct any facts, name spellings etc. so do keep an eye on your comments and approve those (if you don’t already authorise each comment before it is published you should be, there are too many spammers out there not to). But embarrassing typos can creep in often through autocorrect functions in word processing packages (again a reason to stick to the blogging software or a plain text editor) or, worst of all, tablets and phones. I will usually LiveBlog on my laptop but sometimes I run out of power or decide to LiveBlog something at the last minute and find myself trying to take notes on the iPad. Because they are not designed for long form typing the autocorrect function is particularly awful. I suggest switching it off entirely or applying a wee bit more proofreading than normal as mobile devices seem hugely imaginative and bizarre in their autocorrect suggestions.
  • Connectivity – if you are organising event you should know if wifi/wired internet access will be available and may be able to ensure it is. If you are attending an event it can be a hit and miss affair. You can and should ask the organisers or venue about connectivity ahead of time – it will help raise their awareness of the importance of wifi for their attendees and they might be able to do something about it – if I am speaking at an event or if I have been asked to LiveBlog an event for others I will always ask whether wifi will be available and find out about logins/connection set up either ahead of or at the very beginning of the event. However it may also be worth having a backup plan. If you can take some sort of device that ensures you have a connection then do – I usually carry a pay-as-you-go 3G dongle with me to events and it has been hugely helpful many times. If a dongle is not an option or the issue is an intermittant wifi connection then most blogging programmes will save your work as you go – but you can always do a swift CTRL-A, CTRL-C to copy everything in the post before hitting “Publish” or “Post” so that you don’t lose any work if the connection falls over. If you know that the wifi connections resets every hour, or cannot handle the load of a whole conference or twitterers, or is just very slow, then you may want to draft your work in another application to ensure it’s safe even if the internet connection goes down. Given how badly formatting transfers between programmes I would recommend a really basic plain text or rich text editor if you are using this method – it will be much easier to format basic text than to fix formatting conflicts between, say, Word and WordPress.
  • Power – depending on how difficult it is to find power you may need to preserve your battery life in creative ways – closing down background programmes, turning down the brightness etc. It can be the difference between a saved/posted blog post and a wasted afternoon.  This is where having a second device – if only a phone – to hand to email yourself any final notes can be useful. As a last resort I have also been known to switch down to paper notes (but if your handwriting is like mine that really will be a very last resort)!
  • Guidelines – these really shouldn’t be a limitation but… you may need to ensure you are going to be able to stick with any organisational social media guidelines (like the EDINA guidelines we have here) as you blog. Typing quickly and constantly can really push your adrenalin up and you need to always have a little concious reminder to employ good judgement before you publish that blogpost.

3. Advertise your Blog – with Realistic Expectations

I find that readership of my blog sees massive spikes when I’m LiveBlogging – that’s a reflection of the fact that I will make it known that I am LiveBlogging, usually through Twitter and using the event hashtag. If I am at an event all day I will tweet at the very beginning or – or even en route to – that event to let people know that I will be liveblogging and where they will find the post.

If I’m attending an event I might post a link to my skelatal draft saying something like “I will be blogging x in this post: <URL> today…“. If I am covering a multi-day event or am organising an event I will usually post something brief explaining forthcoming liveBlog activities. I try to explain where I will be, where more information can be found, and what should be expected: am I just LiveBlogging or am I also planning to tweet? Will I be taking pictures of the event? Is any of the event being videoed or streamed somewhere? You don’t have to promise the world, you just need to advertise what will be blogged, where, and how. Set realistic expectations and make sure you can deliver on them.

4. Know your Kit Bag

On the day it’s important to know you have everything you need to hand. That means that what you pack is important but also how you pack it – you need to know where you can quickly find your power lead, your pen, your schedule for the event, etc. Typically I will have the following items packed in my own eccentric combinations of bag pockets/sections:

  • Laptop. This will be fully charged the night before the event but I will try to use mains power throughout to ensure I don’t have to think about checking battery level.
  • Laptop Power Cord. This will always be very near the laptop in the bag, usually in a bag full of cables.
  • Extension Cord(s). I work in academia and the kind of buildings events are held in can be a real lottery in terms of power access. In the last year I’ve LiveBlogged in venues including a medieval chapel with two power sockets, a railway museum with numerous sockets but only at the edges of the room, an education room with 4 power sockets in the corners of the room and with a film crew using half of them, and a seminar room with multiple sockets on every desk. There are no guarantees. So I usually carry either a 10m surge protection 6-way extension lead (essential if you are carrying a large number of devices) or a 20m 2-way extension lead. As a result of barcamps past I have my name, email address, mobile number and twitter handle permanently marked on both of these as it’s easy to lose your cables out there! If I’m staying overnight with a bigger bag I’ll take both. A side benefit of multiway extension cords is that it’s a great way to make new friends at events as  there are always a raft of laptop users looking for power!
  • Tablet, SmartPhone or similar second screen. If you are organising an event you may want several of these but there are two reasons you should always have at least one extra screen: (1) To have a spare device to take notes on and (2) to keep an eye on conference/event tweets in parallel to notetaking. It’s often easier to grab your phone and do a quick check of the discussion whilst you are saving a post than to switch tabs, wait for a page reload, etc.
  • Camera. Pictures add value to blog posts so I try to take some form of camera with me to every event. My iPhone does the job fine but if I can find space for it a DSLR does better. If I’m running an event I use both with the DSLR on a tripod with remote and the iPhone for quick complimentary snaps.
  • Chargers and cables, various. To keep LiveBlogging you need to know your kit will all be fine. I keep a cable bag stocked with iPhone cable, VGA converter cable for my laptop, mini USB cable (for camera), spare headphones, memory sticks, and the all important 3G dongle. That little bag comes in handy as a LiveBlogger or presenter and I’ll top it up with camera remote, micro USB cable, iPad charger, etc. depending on what else I’m carrying.
  • Printed programme. Not all event organisers think to provide you with the details you want to hand for liveblogging. Often you want to be able to glance at the schedule and remind yourself of names, topics, etc. to complement those pre-populated posts. I tend to print my own programme and keep it my laptop case so it’s always to hand.
  • Business Cards with Blog URL. If you do this LiveBlogging lark a lot it’s helpful to have your blog on your business card – then if colleagues ask where they will find your post you can quickly reply without them needing to note down a full URL. My cards have a QR code for my blog on them which is even easier!
  • Paper, Pen. Sometimes tech lets you down. A trusty old pen and paper are essential for those quick notes, reminders, emergency note taking etc.
  • Water. Because almost no event has an endless supply of water and sitting with a warm computer on your lap in an air conditioned room can be dehydrating. If there are only short breaks having your own stash of water also enables you to finish a post rather than join slow moving refreshment queues.
  • Emergency Snacks. A flapjack, a banana, some chocolate, some wasabe peas… it doesn’t matter what type of snack you pick (as long as you like it) but some sort of energising snack (bonus points for those that make no noise) will help you cope with unrealistically short coffee breaks or just very tiring long sessions. LiveBlogging may look like sitting still for the day but typing for that long is a bit like running a marathon. If you have friendly colleagues on hand to pass you refreshments that’s great but my experience of big conferences is that having a snack to hand will save time, queuing and keep you at the energy level you need to keep up with the action. For similar reasons you should never begin a LiveBlogging day without a proper breakfast and, for me at least, a coffee.
Cake, an excellent emergency snack...

Cake, an excellent emergency snack…

There is other kit I’ll take to events I am full-on social media amplifying – video camera, MP3 recorder, etc. – but the list above is what you’ll find in my everyday kit bag for attending events.

5. Add Value

Capturing Q&A sessions, as already mentioned, can add a lot of value. Adding links, explaining acronyms or pointing to related projects or websites is also really valuable for remote readers and those in the room. You also need to get a flavour of the room and to put across the mood without being too judgemental about the event or providing too biased an account (assuming you are there to record not critique – which is better done after the event anyway).

Do capture the detail others may not: lunch and coffee breaks make readers feel involved but, most importantly, they also explain gaps in streaming, liveblog update speed, a quietening down of tweets, etc. But remember that you do need to respect your fellow participants – if someone asks you not to record a question or comment or service name then make sure you respect that wish. If someone falls over a step there’s no need to blog that. But if a fire alarm starts going off LiveBlogging that moment may help explain any tweets or recording issues – you are the eyes and ears of the remote audience so reflect the character and mood of the room but don’t feel like you must be on surveillance duty.

Speed is the other big value-add that you can offer. I try to hit “Publish” or “Update” often as that keeps the version being read as near to current as possible. If you are using a plugin to help with shorter, more frequent updates then this can be easier to manage but the general thing to note is that the faster you share, the more useful that is to your readers. The more often you share, the harder it is to fall behind or lose data.

6. Images matter

I don’t use a huge number of images in myLiveBlog posts but I do usually take them and they can make a big difference – if you can include them you should (attributing correctly of course) even if those images are added back in after the event. Images are even more important now that Pinterest and Tumblr are so popular – the sharing of posts and websites via particularly interesting images is becoming a mainstream method of discovery so a good picture may not say a thousand words but it could garner you several hundred more clicks.

The OR2012 Pinterest page showing how images are collated and used.

The OR2012 Pinterest page showing how images are collated and used.

As an event organiser images are essential – even if these are shared elsewhere they will help others write up the event. At OR2012 we created a Flickr group and allowed any delegate to add their images here. Use of this group and the taking of hundreds of photos by the OR2012 team, all shared under liberal CC licences, meant anyone else reporting on the event could find details from liveblogs and add their own value by pulling out their own highlights and illustrating their reports with photos.

7. Don’t be Afraid to Ask for Help

I’ve already said that you shouldn’t plan to be in two places at once… but if you let people know what you are LiveBlogging you may be able to get some friendly fellow bloggers our covering that second room, that other round table, etc. I’ve also already said that other attendees and presenters will often be more than happy to help with corrections or clarifications. If you ask for help you’ll hear about others’ blog posts that complement your own, you’ll see those reports of your events, and you’ll make sure you correct that speaker’s surname before the autocorrect error becomes too big of an issue.

8. Link, Connect, Be a Good Blogger… 

Links to related websites, slides, etc. add real value and can be done on the day or afterwards. If people leave comments make sure you engage with them. Connect to speakers’ websites or blogs, point to related resources. Basically make sure that you add value without being too cynical – it’s not about SEO type linking to anyway, it’s about adding value for yourself, your readers, and your fellow bloggers, writers, participants.

9. Shout About It

This is the best way to ensure that YOU get the best value out of your post. Do make sure you let people know about your LiveBlog – tweet when you update it or when the event is completely blogged, let the organisers know your post is there and so make sure you link back to their website.  Don’t get obsessed but make sure that those that want to see the post know where to find it. If you are running the event I would recommend including links to blogs – and a note that LiveBlogging will be taking place, in any printed materials (if you do this via you have the bonus feature of being able to track the most effective route to accessing your post(s).

A rather modest recent example of a tweet shouting out about a LiveBlog.

A rather modest recent example of a tweet shouting out about a LiveBlog.

Finally and most importantly make sure that you shout about your post to your colleagues, your peers, etc. It can be really easy to only think about those in the know about the event, your fellow delegates, and that big wide world of people on the web but the most value in your post might be the person at the next desk. Shouting out to the web is easy, summarising the relevance and advertising your posts to colleagues can be harder but is at least as important in most cases.

10. Keep the Momentum Going

Make sure you build on your LiveBlog. If you have been attending an event you might just make sure you link back to that post where appropriate – in your weekly round up of activity perhaps, by highlighting it next time you blog about the same project, event in a series, etc. Again this adds value for you and for your readers.

If you are running an event your LiveBlog should be the start of the conversation. Others will be blogging and reporting on your event and your LiveBlog will be linked to. Do keep an eye on those other posts and help to highlight them through tweets, through highlights posts on your own blog, etc. This helps reward your fellow bloggers for their participation, it recognises their own efforts, and it reinforces the value in LiveBlogging an event as it evidences interest in that event and, through links, in those specific LiveBlog posts.

So, those are my 10 rather extended top tips… what are yours? Leave a comment or any questions below!



How to LiveBlog Part 1: Why LiveBlog?

After working on amplification of big events this year, the most notable being Open Repositories 2012,  I thought it would be a good time to share some of my tips for liveblogging and why that should be part of a plan for social media amplification of a variety of events. As I’ve also just been asked for advice on LiveBlogging I thought that would be a really useful topic to talk about. In this post, part one of  two, I’ll be telling you why I think LiveBlogging is so useful. Tomorrow, in part two, I’ll share my top ten practical tips for LiveBlogging.

What is LiveBlogging?

Well it’s blogging in real time, “live”, around some sort of event or key moment. However, different people have different definitions…

Sometimes liveblogging means blogging throughout an event that are shared at the end of talks, at the end of sessions or  later the same day. It’s faster than traditional “blogging” and typically includes a record of what has been said with only minimal reflection on content when compared with other bloggers who might write up an event a week later as a summary with commentary. That’s a style of liveblogging that can work for any blog set up or choice of software and for any level of blogging experience. It’s a good way to get started but it’s more “as live” than “live” I think.

UKSG is a great example of a high quality "as live" blog with multiple contributors.

UKSG is a great example of a high quality “as live” blog with multiple contributors.

Others see LiveBlogging as short instant updates to a page – that’s the model that the Guardian use and works well for the moment-critical sports (e.g. Olympics Closing Ceremony) and media journalism (e.g. X-Factor Season 8 Finale) they use liveblogging for.  That style of liveblogging will require a slightly more specialist set up for your blog – use of the liveblogging WordPress plugin or similar – or an awful lot  more draft blog posts at the ready. It’s a good approach if minute by minute updates are needed but you could achieve a similar style through tweets, or through embedding a Storify or CoverItLive and using tweets and brief notes instead of a blog format.

Guardian Olympic Closing Ceremony LiveBlog - this screenshot shows the mini update format.

Guardian Olympic Closing Ceremony LiveBlog – this screenshot shows the mini update format.

My preferred format of liveblogging uses a standard blog – preferably one that already has a specific audience interested in the event or topic – and posting semi-finished blog posts throughout an event. I begin with skeletal blog posts that lay out what will be blogged that day/session. I will tweet links to these out to the event hashtag (assuming there is one) and then edit and update that post hitting “publish” or “update” whenever there is a suitable pause. That might be at the end of each presentation, it could be at the end of a session, but usually I will update roughly every 20ish minutes assuming a short pause – playing of a video, a particularly irrelevant tangent, etc. – arises.  If something important, a major interruption, or similar occurs then I will update that post more frequently. No matter how many times I’ve updated a post I will then tweet that the session/morning/speaker is blogged during proper breaks in the schedule (coffee, lunch, etc.).

ScreenShot of the OR2012 LiveBlog showing the introductory paragraph and my LiveBlog style.

ScreenShot of the OR2012 LiveBlog showing the introductory paragraph and my LiveBlog style.

This style of liveblogging is about making the fullest record available in the quickest time. I am a touch typer so the record tends to be verbatim or near-to. However the same approach works with more edited/summarised/digested blog posts as well. This form of liveblogging is about capturing a lot of detail though as this is what those unable to attend, reading the blog, or awaiting the blog post as record on which to base their own write up, want quick access to. There is not the same urgency for reflection, commentary or criticism of an event.

Why Should You LiveBlog?

A LiveBlog is the fastest way to get meaningful information out to those who cannot attend an event but they can also be an indispensible record of the event for those attending in person. Once your audience/delegates/participants know that the key talks and questions are being recorded they are empowered to choose what they want to record or note… talking full notes of a session is not the best way to engage so if your audience know that they don’t need to do that they are, to know small extent, freed up to listen, to engage, and perhaps to tweet a key highlight. They know that they can go back to their colleagues with some record of the event, something to base a report on and to share. There is not the same urgency for commentary, analysis, reflection, etc. all of which are useful but often benefit from slower drafting processes.

If you are organising an event LiveBlogging also offers a bridge between the live in-person experience and the types of artefacts you might be producing afterwards – the reports, the videos, the articles. It can be hugely expensive to livestream events (particularly as you may need to pre-empt demand and the temptation is to over cater) for very little benefit – often a stream will be viewed by very few people in real time and will be a one-way experience offering very little benefit over the recorded experience. Twitter is a great medium for participating in discussion, or finding out about an event but it can be very hard to quickly get a sense of who is on stage and what the chat is referring to without some sort of note of what has come before, what the topic is, etc. If you see a tweet halfway through a day paging through previous tweets often won’t fill in those gaps but LiveBlogs can be that almost-instant record that provides a reference point of what is taking place, and which provides an essential hub for finding richer artefacts as they are published.

For audiences outside of a room the LiveBlog may be the only way to access the event and they can do it in real time or near real time. More importantly that record is easily searched for, can be used as a connecting point for any video captured, slides shared, and it will be less ephemeral than tweets…

And if you are good at LiveBlogging you become an asset to an event organiser – a person to encourage along in the knowledge that you will help share that event experience with your readers, followers, fellow delegates etc. I have been encouraged to LiveBlog or invited to attend events purely to LiveBlog in the past. I feel privileged to be able to add something extra to what are usually excellent events whilst the organiser knows that someone experienced is on hand capturing the key event content.

That value of sharing, explaining, changing the virtual footprint of an event is such that some conferences do offer discounted rates, free places, or perks to bloggers (not just “live” ones) so if you are planning to LiveBlog something on your event list for the year do make sure you let organisers know!

Why Shouldn’t You LiveBlog?

LiveBlogging isn’t an easy add-on to an event. I’ve probably been liveblogging at least 20 events each year for the last five years and have established my own ways of organising, preparing and managing that process during an event but it can take a while to get used to the process. The main thing to bear in mind is that, whilst a good LiveBlog will get great readership and kudos from your readers and possibly fellow delegates, it is also a task which takes you away from the event you are engaging in.

If you are attending an event to network, to meet new contacts, to establish yourself then LiveBlogging may not be the best option. You will be more occupied by your computer than your peers and that can mean LiveBlogging can be a comforting barrier to making new connections. It can also position you as an organiser, administrator, or otherwise less visible person. If you are already known to many of those at the event this gets a lot easier – if it’s known that you’ll be LiveBlogging people will check in with you, catch up and perhaps even bring you a coffee, they will come to you. That still means you are more likely to meet fewer new people but it can be OK and that chat can have real usefulness.

Sometimes missing out on chat isn’t really an issue. I’ve been LiveBlogging webinars lately and that purely adds value to the experience as it forces you to pay attention – often remarkably hard to do in a busy office – and is still so unusual that other attendees and organisers tend to be particularly delighted to have a searchable record of the event. Video and recorded webinars are brilliant but it’s even better if you can find out about that recorded session by Googling a name captured in a LiveBlog or can use that LiveBlog to skip to the crucial 15 minutes you want to see.

LiveBlogging requires a fair amount of kit – as you’ll see in my next blog post – so you really have to feel it’s worthwhile before you start lugging kit around the country. And that is assuming to have access to a suitable laptop etc. in the first place.  I haven’t weighed my one-day liveblogging kit but would be surprised if it was under 10KG when laptop, extension cord and a bottle of water are all accounted for. If I’m at conference that I’m providing additional amplification for I have a fairly chunky rolling case that tends to be packed with about 70% tech kit. You can travel lighter of course and even if you don’t it’s not a bad way to build up your shoulder strength…  but the odds are that you will be the one with a disproportionately heavy bag on the train home…

The most basic of my LiveBlogging set ups...

The most basic of my LiveBlogging set ups…

LiveBlogging is tiring and no matter how efficient your typing is you will find yourself absolutely exhausted by the end of full day. You may also have posts to tidy up, images to add, comments to reply to before you can be finished for the day. That can be OK for a single day but for two, or three, or five days that becomes an intense experience. There can be more fun ways to enjoy an event so as you work out what you might be blogging bear in mind what else you want to do as part of your attendance or organisation of an event and ensure you have breaks, rests, space to stretch your legs and look away from a screen.

The other reason you might not want to liveblog is that the event just may not suit it. Meetings aren’t usually a thing you would LiveBlog – although project kick off meetings can benefit from being LiveBlogged (or blogged “as live” but edited for discretion later). Sometimes events such as round table discussions or workshops may only be effective and honest if there are shared expectations of privacy. You should only be LiveBlogging where there are reasonable expectations about the public nature of the event. If in doubt you can always apply a little judgement and choose not to attribute – or even record – a controversial comment. Generally this isn’t an issue but people can get nervous if you are typing what they say word for word and it’s worth being aware of that when you are thinking about when it is and isn’t a good idea to liveblog.

So, should you be LiveBlogging?

Well I’m clearly going to say that you should. But only when and where it is useful, valuable, and has benefits for you as well as others. Personally I began LiveBlogging as I was taking near-verbatim notes for my own reference and started to think it was a real waste not to share those with others. It’s fine to report on a meeting to colleagues but it can add a lot of value to LiveBlog then add commentary as your report, to get feedback on your notes, to get clarification from the speakers and corrections in near real time.

I’ve definitely benefited greatly from LiveBlogging events whether I’ve been along as an organiser, a speaker or just there to be in the audience. We find EDINA projects, events, and conferences all benefit from LiveBlogging – but it’s not something we do every day, for every event, or on every blog. But, when used, it is a hugely effective way to increase the impact of an event, to reach out to and encourage other bloggers to join in and add to our perceptions of the event, and to engage with our rather wonderful audiences and communities.

Feeling inspired? Read my next post on LiveBlogging tomorrow!

Disagree? Have I missed something? Add a comment below, I’d love to hear your thoughts on this!



Closing Session by Peter Burnhill

Today we are liveblogging from the OR2012 conference at George Square Lecture Theatre (GSLT), George Square, part of the University of Edinburgh. Find out more by looking at the full program.

If you are following the event online please add your comment to this post or use the #or2012 hashtag.

This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.

Kevin: I am delighted to introduce my colleague Peter Burnhill, Director of EDINA and Head of the Edinburgh University Data Library, who will be giving the conference summing up.
Peter: When I was asked to do this I realised I was doing the Clifford Lynch slot here! So… I am going to show you a Wordle. Our theme for this years conference was Local In for Global Out… I’m not sure if we did that but here is the summing up of all of the tweets from the event. Happily we see Data, open, repositories and challange are all prominent here. But Data is the big arrival. Data is now mainstream. If we look back on previous events we’ve heard about services around repositories… we got a bit obsessed with research articles, in the UK because of the REF, but data is important and great to see it being prominent. And we see jiscmrd here so Simon will be pleased he did come on his crutches [he has broken his leg].
I have to confess that I haven’t been part of the organising committee but my colleagues have. We had over 460 register from over 40 different nations so do all go to PEI. Edinburgh is a beautiful city but when you got here is was rather damp but it’s nicer now – go see those things. Edinburgh is a bit of a repository itself – we have David Hume, Peter Higgs and Harry Potter to boast – and that fits with local in for global out as I’m sure you’ve heard of two of them. And I’ve like to than John Howard, chair of the OR Steering Committe and our Host Organising Committee
Our opening keynote Cameron Neylon talked about repositories beyond academic walls and the idea of using them for turning good research outputs into good research outcomes. We are motivated to make sure we have secure access to content… as part of a more general rumbling with workshops before the formal start there was this notion of disruption. Not only the Digital Economy but also a sense of not being passive about that. We need to take command of the scholarly communication area that is our job – that cry to action from Cameron and we should heed that.
And there was talk of citation… LinkedIn, etc. is all about linking back to research to data. And that means having reliable identifiers. And trust is a key part of that. Publishers have trust, if repositories are to step up to that trust level you have to be sure that when you access that repository you get what it says it is. As a researcher you don’t use data without knowing what it is and where it came from. The respoitory world needs to think about that notion of assurance, not quality assurance exactly. And also that object may be interrogatable to say what it is and really help you reproduce that object.
Preservation and Provenance is also crucial,
Disaster recovery is also important.. When you fail, and you will, you need to know how you cope, really interesting to see this picked up in a number of sessions too.
I won’t  summarise everything but there were some themes…
We are beginning to deal with the idea on registries and how those can be leveaged for linking resources and identifiers. I don’t think solutions were found exactly but the conversations were very valuable.And we need to think about connectivity, as flagged by Cameron. And these places l,e twitter and Facebook… WE don’t own them but we need to be I them, to make sure that citations come back to us from here.And finally, we have been running a thing called repository fringe for the last four years, and then we won the big One. But we had a little trepidation as There afe a lot lf hou! And we had an uncondference strand. Ad i can say that UoE intends to do repository fringe in 2013.

We hope you enjoyed that unconference strand – an addition to complement the open repositories, not to take away from it but to add an extra flavour. We hope that the PEI folk will keep a bit f that flavour at OR and we will be running the fringe a wee bit later in the year, nearer the edinburgh fringe.

As I finish up I wanted to mention an organisation in IASSIST, librarians used to be about the demand side of services but things have shifted over time. We would encourage that those of us here lik up to groups like IASSIST (and we will suggest the same to them) and we can finds way to connect up, to commune together at PEI and to kshare experience. And so finally I think this is about the notion of connectivity. We have the technology, we have the opportunity to connect up more to our colleagues!

And with that I shall finish up!

Begin with an apology….

We seem to have the builders in. We have a small event coming up… The biggest festival in the world… Bt we didn’t realise that the builders would move in about the same week as you….what you haven’t seen yet is out 60x40ft upside down purple cow… If you are here a bit longer you may see it! We hope you enjoyed your time nonetheless

It’s a worrying thing hosting a conference like this… Lke hosting a party you worry if anyone will show up. But the feedback seems to have been good and and I have many thank yous. Firstly to all of those who reviewed papers. To our sponsors. To the staff here – catering, edinburgh first,nthe tech staff. Bt particularly to my colleagues on the local Host Orgnaising Committee: Stuart Macdonald, William Nixon, james toon,  andrew bevan – most persuasive committee member getting our sponsors on board, saly Macgregor, nicola osborne who has led our social media activity, and to Florance Kennedy, who has been using her experience of wrangling 1000 developers at FLOc a few years ago.

The Measure of success for any event like this is about the quality of conversation, of collaboration, of idea sharing, and that seems to have worked well and we’ve really enjoyed having you here. The conference doesn’t end now of course but changes shape.. And so we move onto the user groups!

Developer’s Challenge, Pecha Kucha Winners and Invitation to OR2013 LiveBlog

Today we are liveblogging from the OR2012 conference at George Square Lecture Theatre (GSLT), George Square, part of the University of Edinburgh. Find out more by looking at the full program.

If you are following the event online please add your comment to this post or use the #or2012 hashtag.

This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.

Kevin Ashley is introducing us to this final session…

How many of you managed to get along to a Pecha Kucha Session? It looks like pretty much all of you, that’s fantastic! So you will have had a chance to see these fun super short presentations. Now as very few will have seen all of these we are awarding winners for each session. And I understand that the prizes are on their way to us but may not be at the podium when you come up. So… for the first session RF1, and in the spirit of the ceilidh, I believe it has gone to a pair: Theo Andrew and Peter Burnhill! For the second stream, strand RF3 it’s Peter Sefton – and Anna! For RF3 it’s Peter Van de Joss! And for RF4 it’s Norman Grey!

And now over to Mahendra Mahey for the Developer Challenge winners…

The Developer Challenge has been run by my project, DevCSI: Developer Community Supporting Innovation and we are funded by JISC, which is funded by UK Government. The project’s aims it about highlighting the potential, value and impact of the work developers do in UK Universities in the area of technical innovation, this is through sharing experience, training each other and often on volunteer basis. It’s about using tecnology in new ways, breaking out of silos. And running challenges… so onto the winners of the Developers Challenge at DevCSI this year.

The challenge this year was “to show us something new and cool in the use of repositories”. First of all I’d like to thank Alex Wade of Microsoft Research for sponsoring the Developer Challenge and he’ll be up presenting their special prize later. This year we really encouraged non developers to get involved to, but also to chat and discuss those ideas with developers. We had 28 ideas from splinter apps, repositories that blow bubble, SWORD buttons.. .and mini challenege appeared – Rob Sanderson from Los Alamos put out a mini idea! That’s still open for you to work on!

And so.. the final decisions… We will award the prizes and redo the winning pitches! I’d like to also thank our judges (full list on DevCSI site) and our audience who voted!

First of all honourable mentions:

Mark McGillivray and Richard Jones – getting academics close to repositories or Getting Researchers SWORDable.

Ben O’Steen and Cameron Neylon – Is this research readable

And now the Microsoft Research Prize and also the runners up for the main prize as they are the same team.

Alex: What we really loved was you guys came here with an idea, you shared it, you changed it, you worked collaboratively on it and

Keith Gilmerton and Linda Newman for their mobile audio idea.

Alex: they win a .Net Gadgeteer rapid prototyping kit with motherboard, joystick, monitor, and if you take to Julie Allison she’ll tell you how to make it blow bubbles!

Peter Sefton will award the main prize…

Peter: Patricks visualisation engine won as we’re sick of him entering the developer challenge

The winners and runners up will share £1000 of Amazon Vouchers and the winning entry – the team of one – will be funded to develop the idea – 2 days development time. Patrick: I’m looking for collaborators and also an institution that may want to test it get in touch.

Linda and Keith first

Linda: In Ohio we have a network of DSpace repositories including the Digital Archive of Literacy Narratives – all written in real peoples voices and using audio files, a better way to handle these would be a boon! We also have an Elliston Poetry Curator – he collects audio on analogue devices, digital would be better. And in the field we are increasingly using mobile technologies and the ability to upload audioj or video at the point of creation with transcript would greatly increse the volume of contribution

MATS – Mobile AudioVisual Transcription Service

Our idea is to create an app to deposit and transcript audio – and also video – and we used SWORDShare, an idea from last years conference, as we weren’t hugely experienced in mobile development. We’ve done some mock ups here. You record, transcribe and submit all from your phone. But based on what we saw in last years app you should be able to record in any app as an alternative too. Transcription is hugely important as that makes your file indexable. And it provides access for those with hearing disabilities, and those that want to preview/read the file when listening isn’t an option. So when you have uploaded your file you request your transcription. You have two options. Default is Microsoft Mavis – mechanical transcription. But you can also pick Amazon Mechanical Turk – human transcription, and you might want that if the audio quality was very poor or not in English.

MAVIS allows some additional functionality – subtitling, the ability to jump to a specific place in the file from a transcript etc. And a company called GreenButton offers a webservices API to MAVIS. We think that even if your transcription isn’t finished you can still submit to the repository as new version of SWORD supports updating. That’s our idea! We were pitching this idea but now we really want to build it! We want your ideas, feedback, tech skills, input!

And now Patrick McSweeney and DataEngine.

My friend Dave generated 1TB data in every data run and the uni wouldnt host that. We found a way to get that data down to 10 GB for visualisation. It was back ups on a home machine. It’s not a good preservation strategy. You should educate and inform people and build solutions that work for them!

See: State of the Onion. A problem you see all the time… most science is long tail, and support is very poor in that long tail. You have MATLAB and Excel and that’s about it. Dave had all this stuff, he had trouble managing his data and graphs. So the idea is to import data straight from Dave’s kit to the repository. For Dave the files were CSV. And many tools will export to it, its super basic unit of data sharing – not exciting but it’s simple and scientists understand it.

So, at ingest you give your data provenance and you share your URIs, and you can share the tools you use. And then you have tools for merging and manipulation. the file is pushed into storage form where you can run SQL processing. I implemented this in an EPrints repository – with 6 visualisation but you could add any number. You can go from source data, replay experiment, and get to visualisations. Although rerunning experiments might be boring you can also reuse the workflow with new similar data. You can create a visualisation of that new data and compare it with your original visualisation and know that the process has been entirely the same.

It’s been a hectic two days. It’s a picture (of two bikers on a mountain) but it’s also a metaphor. There are mountains to climb. This idea is a transitional idea. There are semantic solutions, there are LHC type ideas that will appear eventually but there are scientists at the long tail that want support now!

And finally… thank you everyone! I meant what I said last night, all who presented yesterday I will buy a drink! Find me!

I think 28 ideas is brilliant! The environment was huge fun, the developers lounge were a lovely space to work in.

And finally a plug… I’ve got a session at 4pm in the EPrints track and that’s a real demonstration of why the Developer Challenge works as the EPrints Bazaar, now live, busy, changing how we (or at least I) think about repositories started out at one of these Developer Challenges!

At the dinner someone noted that there are very few girls! Half our user base are women but hardly any women presented at the challenge, Ladies, please reprasent.

And also… Dave Mills exist. It is not a joke! He reckons he generated 78 GB of data – not a lot, you could probably get it on a memory stick! Please let your researchers have that space centrally! I drink with reseachers and you should too!

And Ben, Ben O’Steen had tech problems yesterday but he’s always here and is brilliant. is live right now, rate a DOI for whether its working.

And that’s all I have to say.

And now over to Prince Edward Island – Proud Host of OR 2013

I’m John Eade, CEO of DiscoveryGarden and this is Mark Leggot. So, the first question I get is where are you? Well we are in Canada! We are tiny but we are there. Other common questions…

Can I walk from one end of the island to the other? Not in a day! And you wouldn’t enjoy it if you did

How many people live there? 145,000 much more than it was

Do Jellyfish sting? We have some of the warmest waters so bring your swimsuit to OR2013!

Can you fly there? Yes! Direct from Toronto, Montreal, Halifax, Ottawa,(via Air Canada and Westjet) and from New York City (via Delta). Book your flights early! And Air Canada will add flights if neccassary!

We will work diligently to get things up on line as early as possible to make sure you can book travel as soon as possible.

Alternatively you can drive – you won’t be landlocked – we are connected to mainland. Canada is connected to us. We have an 8 mile long bridge that took 2 and a half years to build and its 64 metres high – its the highest point in PEI and also the official rollercoaster!

We are a big tourism destination – agriculture, fishing, farming, software, aerospace, bioresources. We get 1 million tourists per year. That means we have way more things to do there than a place our size should – championship quality gold courses. Great restaurants and a culinary institute. We have live theatre and we are the home of Anne of Green Gables, that plucky redhead!

We may not have castles… but we have our own charms…!

Cue a short video…

Mark: free registration if you can tell me what the guy was doing?

Audience member: gathering oysters?

Mark: yes! See me later!

So come join us in Prince Edward Island. Drop by our booth in the Concourse in Appleton Tower concourse for another chance to win free registration to next years event. We’ve had lots of support locally and this shoudl be a great event!

P6B: Digital Preservation LiveBlog

Today we are liveblogging from the OR2012 conference at Lecture Theatre 5 (LT5), Appleton Tower, part of the University of Edinburgh. Find out more by looking at the full program.

If you are following the event online please add your comment to this post or use the #or2012 hashtag.

This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.

Topic: Digital Preservation Network, Saving the Scholarly Record Together
Speaker(s): Michele Kimpton, Robin Ruggaber

Michelle is CEO of DuraSpace. Myself and Robin are going to be talking about a new initiative in the US. This initiative wasn’t born out of grant funding but by university librarians and CIOs who wanted to think about making persistent access to scholarly materials and knew that something needed to be done at scale and now. Many of you will be well aware that libraries are being asked to preserve digital and born digital materials and there are not good solutions to do that in scale. Many of us have repositories in place. Typically there is an online or regular backup but these aren’t at preservation scale here.

So about a year ago a group of us met to talk about how we might be able to approach this problem. And from this – Digital Preservation and Network – was born. DPN is not just a technical architecture. It’s an approach that requires replication of complete scholarly record access nodes with diverse architectures without single points of failure. It’s a fedration. And it is a community allowing this to work at mass scale.

At the core of DPN are a number of replicated nodes. There are minimum of three but up to five here. The role of the nodes is to have complete copies of content, full replications of each replicating nodes. This is a full content object store, not just a metadata node. And this model can work with multiple contributing nodes in different institutions – so those nodes replicate across architectures, geographic locations, institutions.

DPN Principle 1: Owned by the community

DPN Principle 2: Geographical diversity of nodes

DPN Principle 3: Diverse organisations – Uof Michigan, Stanford, San Diego, Academic Presrvation Trust, University of Virginia.

DPN Principle 4: Diverse Software Architectores – including iRODS, HATHI Trust, FedoraCommons, Standford Digital Library

DPN Principle 5: Diverse Political Environments – we’ve started in the US but the hope it to expand out to a more diverse global set of locations

So DPN will preserve scholarship for future generations, fund replicating ndes to ensure functional independence, audit ad verify content, provide a legal framework for holding succession rights – so if a node goes down this means the content will not be lost. And we have a diverse governance group taking responsibility for specific areas.

To date 54 partners and growing, about $1.5 million in funding – and this is not grant funding – and we now have a project manager in place.

Over to Robin…

Robim: Many of the partners in the APTrust have also been looking at DPN. APTrust ia a consortium committed to creation and management of an aggregated preservation repository and, now that DPN underway, to be a replicating node. APTrust was formed for reasons of community-building, economies of scale – things we could do together that we could not do agin, aggregated content, long term preservation, disaster recovery – particularly relevent given recent east coast storms.

The APTrust has several arms: Business and marketing strategy; governance policy and legal framework; preservation and collection framework; repository implementation plan – the technical side of APTrust and being a DPN node. So we had to bring together University librarians, technology liaisons, ingest/preservation. The APTrust services are the aggregation repository, the separate replicating node for DPN, and the access service – initially for administaration but also thinking about more services for the future.

There’s been a lot of confusion as APTrust and DPN started emerging at about the same time. And we are doing work with DPN. So we tend to think of the explanation here being about Winnowing of Content with researchers repository of files at the top, then local institutions repositories, then AP trust – preservation for our institutions that provide robustness for our content, and DPN is then for long term preservation. APTrust is preservation and access. DPN is about preservation only.

So the objectives of the initial phase of the APTrust is engaging partners, defining sustainable business model, hiring a project director, building the aggregation repository and setting up our DPN node. We have an advisory group for the project looking at governance. The service implementation is a phased approach building on experience, leveraging open soure – cloud storage, compute notes, DuraCloud all come into play, economies of scale, TRAC – we are using as a guideline for architecture. APTrust will be sitting at the end of legacy workflows for ingest, it will take that data in, ingest to DuraCloud services, synching to Fedora aggregation repository, and anything for long term preservation will also move to the APTrust DPN Noder with DuraCloud OS via cloudsync.

In terms of the interfaces there will be a single administrative interface which gives access to admin of DuraCloud, CloudSync and Fedora. Which will allow audit reports, functionality in each individual area etc. And that uses the API for each of those services. We will have a proof of that architecture at end of Q4 2012. Partners will feedback on that and we expect to deploy in 2013. Then we will be looking at disaster recovery access services, end-user acces, format migration services – considered a difficult issue so very interesting, best practices fro content types etc., coordinated collection development – across services, hosted repository services. Find out more at and


Q1) In Denmark we are building our national repository which is quite like DPN. Something in your preserntation: it seems that everything is fully replicated to all nodes. In our organisation services that want to preserve something they can enter a contract with another/a service and that’s an economic way to do things but it seems that this model is everthing for everyone.

A1 – Michelle) Right now the principle is everyone gets a copy in everything. We may eventually have specialist centres for video, or for books etc. Those will probably be primarily access services. We do have a diverse ecosystem – back ups across organisations in different ways. You can’t choose stuff in one or another node.

Q2) This looks a lot like LOCKSS – what is the main difference between DPN and a private LOCKSS network.

A2) LOCKSS is a technology for preservation but it’s a single architecture. It is great at what it does so it will probably be part of the nodes here – probably Stanford will use this. But part of the point is to have multiple architectural systems so that if there is an attack on one architecture just one component of the whole goes down.

Q3) I understand the goal is replication but what about format obsolescence – will there be format audit and conversion etc?

A3 – Michelle) I think who does this stuff, format emulation, translation etc. has yet to be decided. That may be at node level not network level.

Topic: ISO : Trustworthy Digital Repository Certification in Practice

Speaker(s): Matthew Kroll, David Minor, Bernie Reilly, Michael Witt

This is a panel session chaired by Michael Witt of Purdue University. This is about ISO 16363 and TRAC, the Trustworthy Repository Audit Checklist – how can a user trust that data is being stored corrrectly and securely, that it is what it says it is.

Matthew: I am a graduate research assistant working with Micheal Witt at Purdue. I’ve been preparing the Purdue Research Repository (PURR) for TRAC. We are a progressive repository with online workspace and data sharing platform, to user archiving and access, to preservation needs of Purdue University graduates, researchers and staff. So for today I will introduce you to ISO 16363 – this is the users guide that we are using to prepare ourselves, I’ll give an example of trustworthiness. So a neccassary and valid question to ask ourselves is “what is “trustworthiness” in this context?” – it’s a very vague concept and one that needs to grow as the digital preservation community and environment grows.

I’d like to offer 3 key qualities of trustworthiness (1) integrity, (2) sustainability, (3) support. And I think it’s important to map these across your organisations and across the three sections of ISO 16363. So, for example, integrity might be that the organisation has sufficient staff and funding to work effectively. Or for the repository it might be that you do fixity checks, procedures and practices to ensure successful migration or translation, similarly integrity in infrastructure may just be offsite backup. Similarly sustainability might be about staff training being adequate to meet changing demands. These are open to interpretation here but useful to think about.

In ISO 16363 there are 3 sections of criteria (109 criteria in all): (3) Organizational Infrastructure; (4) Digital Object management; (5) Infrastructure and Security Risk Management. There isn’t a one-to-one relationship in documentation here. One criteria might have multiple documents, a document might support multiple criteria.

Myself and Micheal created a PURR Gap Analysis Tool – we graded ourselves and brought in experts from the organisation in the appropriate areas and we gave them a pop quiz. And we had an outsider read these things. This had great benefit – being prepared means you don’t overrate yourself. And secondly doing it this way – as PURR was developing and deploying our process here – we gained a real understanding of the digital environment

David Minor, Chronopolic Program Manager, UC San Diego Libraries and San Dieo Supercomputer Center: We completed the Trac process this April. We did it through the CDL organisation. We wanted to give you an overview of what we did, what we learnt. So a bit about us first. Chronopolis is a digital preservation network based on geographic replication – UCSD/SDSC, NCAR, UMIACS. We were initially funded vid the Livrary of Congress NDIIPP program. We spun out into a different type of organisation recently, a FIFA service. Our management and finances are via UCSD. All nodes are independent entities here – interesting questions arise from this for auditors.

So, why do TRAC? Well we wanted to do validation of our work – and this was a last step in our NDIIPP process – an important follow on for development. We wanted to learn about gaps, things we could do better. We wanted to hear what others in the community had to say – not just those we had worked for and served but others. And finally it sounds cyncial but it was to bring in more business – to let us get out there and show what we could do particularly as we moved into FIFA service mode.

The process logistics were that we began in Summer 2010 and finished Winter 2011. We were a slightly different model. We were a self-audit that then went to auditors to follow up, ask questions, speak to customers. The auditors were three people who did a site visit. It’s a closed process except for that visit though. We had management, finances, metadata librarians, and data centre managers – security, system admin etc all involved – equiverlent of 3 FTE. We had discussed with users and customers. IN the end we had hundreds of pages of documentation – some writen by us, some log files etc.

Comments and issues raised by auditors were that we were strong on technology (we expected this as we’d been funded for that purpose) and spent time commenting on connections with participant data centres. They found we were less strong on business plan – we had good data on costs and plans but needed better projections for future adoption. And we had discussion of preservation actions – auditors asked if we were even doing preservation and what that might mean.

Our next steps and future plans based on this experience has been to implement recommendations working to better identify new users and communities, improve working with other networks. How do changes impact audit – we will “re-audit|” in 18-24 months – what if we change technologies? What is management changes? And finally we definitely have had people getting in touch specifically because of knowing we have been through TRAC. All of our audit and self-audit materials are on the web too so do take a look.

Bernie from the Centre for Research Libraries Global Resources Network: We do audits and certification of key repositories. We are one of the publishers of the TRAC checklist. We are a publisher not an author so I can say that it is a brilliant document! We also participated in development of recent ISO standard 16363. So, where do we get the standing to do audits, certification and involvement in standards. Well we are a specialist centre in

We started in UofChichargo, Northwestern etc. established in the 1949. We are a group of 167 universities in US, Canada and Hong Kong and we are about preserving key research information for humanities and social science. Almost all of our funding comes from the research community – also where are stakeholders and governance sit. And the CRL Certification program has the goal to support advanced research. We do audits of repositories and we do analysis and evaluations. We take part in information sharing and best practice. We aim to do landscape studies – recently been working on digital protest and documentation

Portico, Cronopolic, currently looking at PURR and PTAB test audits. The process is much as described by my colleagues. The repository self-audits, then we request documentation, then a site visit, then report is shared via the web. In the future we will be doing TRAC certification alongside ISO 16363 and we will really focus on Humanities and social science data. We continue to have the same mission as when we were founded in 1949, to enable the resiliance and durability of research information.


Q1 Askar, State University of Denmark) The finance and sustainability for organisations in TRAC… it seemed to be predicated on a single repository and that being the only mission. But national archives are more “too big to fail”. Questionning long term funding is almost insulting to managers…

A1) Certification is not just pass/fail. It’s about identifying potential weakness, flaws, points of failure for a repository. So for a national library they are too big to fail perhaps but the structure and support for the organisation may impact the future of the repository – cost volitility, decisions made over management and scope of content preserved. So for a national institution we look at finance for that – is it a line item in national budget. And that comes out in the order, the factors governing future developments and sustainability.

Topic: Stewardship and Long Term Preservation of Earth Science Data by the ESIP Federation
Speaker(s): Nancy J. Hoebelheinrich

I am principle of knowledge management at Knowledge Motifs in California. And I want to talk to you about preservation of earth science data by ESIP – Earth Science Informaion Partners. My background is in repositories and metadata and I am relatively new to earth sciences data and there are interesting similarities. We are also keen to build synergies with others so I thought it would be interesting to talk about this today.

The ESIP Federation is a knowledge network for science data and technology practitions – people who are building component for a science data infrastructure. It is distributed geographically, in terms of topic, interest. It’s about a community effort, free flowing ideas in a collaborative environment. It’s a membership organisation but you do not have to be a member to participate. It was started by NASA to support Earth Obervation data work. The idea was to not just rely on NASA for environmental resewarch data. They are interested in research, application in education etc. The areas of interest include climate, ecology, hydrometry, carbon management, etc. Members are of four types: Type 4 are large organisations and sponsors including NOAA and NASA. Type 1 are data centres – similar to libraries but considered separate. Type 2 are researchers and Type 3 are Application developers. There is a real cross sectoral grouping so really interesting discussion arises.

The type of things the group is working on are often in data informatics and data science. I’ll talk in more detail in a second but it’s important to note that organisations are cross functional as well – different stakeholders/focuses in each. We coordinate the community via In Person Meetings, ESIP Commons, Telecons/WebEx, Clusters, Working Groups and Committes and these all feed into making us interoperable. We are particularly focused on Professional development, outreach and collaboration. We have a number of active groups, committees and clusters.

Our data and informatics area is about collaborative activities in data preservation and stewardship, semantic web, etc. Data preservation and stewardship is very much about stewardship principles, ditation guidelines, provenance context and content standards, and linked data principles. Our Data Stewardship Principles are hat they are for data creators, intermediaries and data users. So this is about data management plans, open exchange of data, metadata and progress etc. Our data citation guidelines were accepted by ESIP Membership Assembley in January 2012. These are based on existing best practice from International Polar Year citation guidelines. And this ties into geospatial data standards and these will be used by tools like the new Thomson Reuters new Data Citation Index.

Our Provenance, Context and Content Standard are about thinking about the data you need about a data set to make it preservable into the long term. So this is about what you would want to collect and how you would collect that. Initially based on content from NASA and NOAA and discussions associated to them. It was developed and shared via the ESIP wiki. The initial version was in March 2011. latest version is June 2011 but this will be updated regularly. The categories are focused mostly on satellite remote setting – preflight/preopertional instrument descriptions etc. And these are based on Use cases – based on NASA work from 1998. What has happened as a result of that work is that NASA has come up with a specification for their data for earth sciences. They make  a distinction betweeen documentation and metadata, a bit differently from some others. Categories here in 8 areas – many technical but also rationale. And these categories help set baseline etc.

Another project we are working on is Identifiers for data objects. There was an abstract research project on use cases – unique identification, unique location, citable location, scientifically unique identification. They came up with categories and characterstics and questions to ask each ID schemes. The recommended IDs ended up being DOI for a citable locator and UUID for unique identifier but we wanted to test this. We are in the process of looking at this at the moment. Questions and results will be compared again.

And one more thing the group is doing is Semantic Web Cluster Activities – they are creating different ontologies for specific areas such as SWEET – an ontology for environmental data. And there are services built on top of those ontologies (Data Quality Screening Service on weather and climade data from space (AIRE) for instance) – both are available online. Lots of applications for this sort of data.

And finally we do education and outreach – data management training short courses. given that it’s important that researchers know how to manage their data we have come up with a short training courses based on the Khan Acadaemy model. That is being authored and developed by volunteers at the moment.

And we have associated activities and organisations – DataOne, DataConservancy, NSF’s Earth Cube. If you are interested to work with ESIP please get in touch. If you want to join our meeting in Madison in 2 weeks time there’s still time/room!


Q1 – Tom Kramer) It seems like ESIP is an eresearch community really – is there a move towards mega nodes or repository or is it still the Wild West?

A1) It’s still a bit like the Wild West! Lots going on but les focus on distribution and preservation, the focus is much more about making data is ingested and made available – where the repositories community was a few years ago. ESIP is interested in the data being open but not all scientists agree about that, so again maybe at the same point as this community a few years ago.

Q2 – Tom) So how do we get more ESIP folk without a background in libraries to OR2012?

A2) Well I’ll share me slides, we probably all know people in this area. I know there are organisations like EDINA here. etc.

Q3) [didn’t hear]

A3) EarthCube area to talk about making data available. A lot of those issues are being discussed. They are working out the common standard OGC, ISO, sharing ontologies but not nessaccarily preservation behind repositories. It’s sort of data centre by data centre.

Topic: Preservation in the Cloud: Three Ways (A DuraSpace Moderated Panel)
Speaker(s): Richard Rodgers, Mark Leggott, Simon Waddington, Michele Kimtpon, Carissa Smith

Michelle: DuraCloud was developed in the last few years. It’s a software but also a SAAS (Software As A Service) service. So we are going to talk about different usage etc.

Richard Rodgers, MIT: We at MIT libraries participated in several pilot processes in which DuraCloud was defined and refined. The use case here was to establish a geo distributed replication of the repository. We had content in our IR that was very heterogenous in type. We wanted to ensure system administration practices only  address HW or admin failues – other errors unsecured. Service should be automatic yet visible.We developed a set of DSpace tools geared towards collection and administration. DuraCloud provided a single convenient point of service interoperation. Basically it gives you an abstractiojn to multiple backend services. That’s great as it means that applications and protects against lock-in. Tools ad APIs for DSpace integration. High bandwidth acces to developers. Platform for preservation system and institution-friendly service terms.

Challenges and solutions here… It’s not clear how the repository system should create and manae the files yourself. Do all aspects need to have correllated archival units. So we decided to use AIPs – units of replication which packages items together, they gather loose files. There is repository managere involveement – admin UI, integration, batch tools. There is an issue of scale – big data files really don’t suit interactivity in the cloud, replication can be slow, queued not synchronous. And we had to design a system were any local error wouldn’t be replicated (e.g. deletion locally isn’t repeated in replication version). However deletion is forever – you can remove content. The code we did for the pilot has been refined some what and is available for DSpace as an add on – we hink it’s fairly widely used in the DSpace community.

Mark Leggott, University of PEI/DiscoveryGarden: I would echo the complicated issues you need to consider here. We had the same experience in terms of very responsive process with DuraSpace team. Just a quick bit of info on Islandora. It is a Drupal + Fedora framework from UPEI. Flexible UI and apps etc. We think of DuraCloud as a natural extension of what we do. The approach we have is to leverage DuraCloud and CloudSync. The idea is to maintain the context of individual objects and;/or complete collections. To enable a single button restore of damaged edits. And it integrate with standard or private DC. We have an initial release coming. There is a new component in the Manage tab in the Admin panel called “Vault”. It provides full access to DuraCloud and CloudSync services. It’s accessible through Islandora Admin Panel – you can manage settings. you can integrate it with your DuraSpace enabled service. Or you can do this via DiscoveryGarden where we manage DuraCloud on client’s behalf. And in maaging youe maerials you can access or restore at an item or collection level. You can sync to DuraCloud or restore from the cloud etc. You get reports on synching etc. And reports on matches or mismatches so that you can restrore data from the cloud as needed. And you can then manually check the object.

Our next steps are to provide tihhter integratione nad more UI functions, to move to automated recovery, to enable full Fedora/Collection restore, and to include support for private DuraCloud  instances.

Simon: I will be talking about the Kindura project funded by JISC which was a KCL, STFC and ? initiative. The problem is that storage of research outputs (data, documents) is quite ad hoc but it’s a changing language and UK funders can now require data for 10 years+ so it’s important. We wer elooking at hybrid cloud solutions – commercial cloud is very elastic, rapid deployments, transparent cost, but risky in terms of data sensitivity, data protection law, service availablily and loss. In house storage and cloud storage seem like the best way to gain the benefits but mitigate risks.

So Kindura was a proof of concept repository for research data combining commercial cloud and internal storage (iRODS). Based on Fedora Commons. DuraCloud provides a common stoarge intereface and we deployed from source code – we found Windows was best for this and have created some guidelines on this sort of set up. And we developed a storage management framework based on policies, legal and technical constraints as well as cost (including cost of transmitting data in/out of storage) We tried to implement something as flexible as possible. We wanted automated decisions for storage and migration. Content replicaion across storage providers for resiliance. Storage providers transparant to users.

The Kindura system is based on our Fedora Repository feeding Azure, iRODS and Castor (another use case for researchers to migrate to cheaper tape storage) as well as AWS and Rackspace, it also feeds DuraCloud. The repository is populated via web browser depositing into the management server and down into Fedora Respoitory AND DuraCloud.


Q1) For Richard – you were talking about deletion and how to deal with them
A1 – Richard) There are a couple of ways to gather logically delete items. So you can automate based on a policy for garbage collection – e.g. anything deleted and not restored within a year. But you can also  manually delete (you have to do it twice but you can’t mitigate against that).

Q2) Simon, I had a question. You integrated a rules engine and that’s quite interesting. It seems that Rules probably adds some significant flexibility.

A2 – Simon) We actually evaluated several different sorts of rules engines. Jules is easy, open source and for this set up it seemed quite logical to do this. It sits totally separate to DuraCloud set up at the moment but it seemed like a logical extension