New blog for the Jisc Publications Router

The latest phase of the projects documented in this blog has moved to a new blog.


Our new blog will be used to outline the developments and benefits of the The Jisc Publications Router service. It begins with an introductory post that includes links to the service page and information on interacting with the Router.

The Publications Router is a free to use standalone middleware tool that automates the delivery of research publications from data suppliers to institutional repositories. The Router extracts authors’ affiliations from the metadata provided to determine appropriate target repositories before transferring publications to repositories registered with the service. The Router offers a solution to the duplication of effort recording a single research output presents in the increasingly collaborative world of research publications. It is intended to minimise effort on behalf of potential depositors while maximising the distribution and exposure of research outputs.

The Router has its origins in the Open Access Repository Junction project. A brief recap of the various stages of evolution can be found in a post on the history of the project.

If you wish to find out more about the service the Router offers please see the about page.

EPrints importers made publicly available

We are really please to say that RJ Broker now has importers available for both EPrints 3.2 and 3.3

The EPrints 3.2 code is essentially unchanged, however it is now publicly available in GitHub ( and has been proved to work in a number of installations.

The EPrints 3.3 plugin is also available on GitHub ( however it is also available as a direct plugin from the central EPrints Bazaar ( This code has been tested in EPrints 3.3.5 and 3.3.12 (and I believe the patch required to correct the returned atom:id will be applied to EPrints 3.3.13)

The two code-bases are almost identical, and we would be delighted to receive feedback/improvements on either.

Setting up redundancy in a live service

At EDINA, we strive for resilience in the services we run. We may not be competing with the massive international server-farms that the likes of Google, Facebook, Amazon, eBay, etc run… however we try to have systems that cope when things become unavailable due to network outages, power-cuts, or even hardware failures: the services continue and the user may not even notice!

For read-only services, such as ORI, this is relatively easy to do: the datasets are apparently static (any updates are done in the background, without any interaction from the user) so two completely separate services can be run in two completely separate data centres, and the systems can switch between them without anyone noticing – however Repository Junction Broker presented a whole different challenge.

The basic premise is that there are two instances of the service running – one at data centre A, the other at data centre B – and the user should be able to use either, and not spot any difference.

This first part is easy: the user connects to the public name for the service. The public name is answered by something called a load-balancer which directs the user to whichever service it perceives as being least loaded.

In a read-only system, we can stop here: we can have two entirely disconnected systems, and let them respond as they need to…. and use basic admin commands to update the services as they need.

For RJ Broker, this is not the end of the story: RJB has to take in data from the user, and store that information in a database. Again, a relatively simple solution is available: run the database through the load-balancer, so both services are talking to the same database, and use background admin tools to copy database A to database B in the other data-centre.

rjb-loadbalancer Again, RJB needs more: as well as storing information in the database, the service also writes files do disk. Here at EDINA, we make use of Storage Area Network (SAN) systems – which means we mount a chunk of disk space into the file-system, across the network.

The initial solution was to get both services to mount disk space from SAN A (at data centre A) into /home/user/san_A and then get the application to store the files into this branch of the file-system…. meaning the files are instantly visible to both service.

This, however, is still has the “single point of failure” which is SAN A. What is needed is to replicate the data in SAN A in SAN B, and make SAN B easily available to both installations

The first part of this is simple enough: mount disk space from SAN B (in data centre B) into /home/user/san_B. We then use an intermediate symbolic link (/home/user/filestore) to point to /home/user/san_A and get the application to store data in /home/user/filestore/... Now, if there is an outage at Data Centre A, we simply need to swap the symbolic link  /home/user/filestore to /home/user/san_B and the application is none-the wiser.

The only thing that needs to happen is the magic to ensure that all the data written to SAN A is duplicated in SAN B (and database A is duplicated into database B)

RSP Webinar on RJ Broker: Automating Delivery of Research Output to Repositories

On Wednesday 29th May 2013, Muriel Mewissen presented a Webinar on the Repository Junction Broker (RJ Broker) for the Repository Support Project (RSP).

This presentation discusses:

  • the need for a broker to automate the delivery of research output to Institutional Repositories
  • the development of the middleware tool for RJ Broker
  • the data deposit trials involving a publisher (Nature Publishing Group) and a subject repository (Europe PubMed Central) which have recently taken place
  • what the future holds for the RJ Broker.

A recording of the webinar and the presentation slides are available on the RSP website.

Embargoes in real metadata, take 2

Following on from the earlier discussion, we have ruled out the first option (where we add an attribute to METS):

<div ID="sword-mets-div-2" oarj_embargo="2013-05-29">
      <fptr FILEID="eprint-191-document-581-0"/>

as the METS Schema doesn’t allow additional attributes to be added (and the investigation into writing validating schema with additional attributes was fun in its own right) – so this leaves us with the XMLdata within the amdSec solution.

To recap, the amdSec will read something like:

<amdSec ID="sword-mets-adm-1" LABEL="administrative" TYPE="LOGICAL">
  <rightsMD ID="sword-mets-amdRights-1">
        <epdcx:descriptionSet xmlns:epdcx=""
          <epdcx:description epdcx:resourceId="sword-mets-div-3" 
            <epdcx:statement epdcx:propertyURI=""
              <epdcx:valueString epdcx:sesURI="">2013-05-29</epdcx:valueString>
          <epdcx:description epdcx:resourceId="sword-mets-div-2"
            <epdcx:statement epdcx:propertyURI=""
              <epdcx:valueString epdcx:sesURI="">2013-05-29</epdcx:valueString>

One of the questions I have been asked a few times is “why don’t you put the actual file URL with the embargo date”, and I refer you to the explanation in the original article:

  • A document may actually be composed of multiple files (consider a web page – the .html file is the primary file, however there are additional image files, stylesheet files, and possibly various other files that combine to present the whole document)

In other words, whilst 99% of cases will be a single file for a single document, it’s not always that simple and I don’t believe the metadata should lead you into a false understanding of what is, so things don’t break when it goes wrong,

SWORD 1.3 v’s SWORD 2

What’s the difference, and how do they compare?

In summary

SWORD 1.3 is a one-off package deposit system: the record is wrapped up in some agreed format, and dropped into the repository. SWORD 1.3 uses an HTTP header to define what that package format is, and the Individual Repositories use that header to determine how to unpack the  record. Every deposit is a new record.

SWORD 2.0 is a CRUD-based (Create, Read, Update, Delete) system, where the emphasis is on being able to manage existing records, as well creating new records. SWORD 2 uses the URL to identify the record being manipulated, and the mime-type of the object being presented to know what to do with it.

In detail (EPrints-specific)

This is, per force, EPrints specific as I am an EPrints user, with no experience coding in DSpace/Fedora/etc.


With a SWORD 1.3 system, one defines a mapping between the X-Packaging header URI and the importer package to handle it:

  $c->{sword}->{supported_packages}->{""} =
    name => "Open Access Repository Junction Broker",
    plugin => "Sword::Import::Broker_OARJ",
    qvalue => "0.6"

The importer routine then has some internal logic to ensure it only tries to manage records of the right type (XML files, Word Documents, Spreadsheets, Zip files, etc).

In the case of compressed files, it is customary to also indicate the routine to un-compress the file. For example, the same Importer could manage .zip, .tar, and .tgz files – which are all variations on a compressed collection of files – which has the following collection of mime-types:


Therefore our importer would have code like this:

   our %SUPPORTED_MIME_TYPES = ( "application/zip"    => 1, "application/tar"               => 1,
                                 "application/x-gtar" => 1, "application/x-gtar-compressed" => 1,);

   our %UNPACK_MIME_TYPES = ( "application/zip"               => "Sword::Unpack::MyNewZip",
                              "application/tar"               => "Sword::Unpack::MyNewTar",
                              "application/x-gtar"            => "Sword::Unpack::MyNewTar",
                              "application/x-gtar-compressed" => "Sword::Unpack::MyNewTar");

So, a basic SWORD 1.3 deposit is a simple POST request to a defined URL, with a set of headers to manage the deposit, and the record as the body of the request:

  curl -x POST \
       -i \
       -u username:password \
       --data-binary "" \
       -H 'X-Packaging:' \
       -H 'Content-Type: application/zip'  \


This will deposit the binary file into the collection point in the repository, using the importer identified by the Package


This is much vaguer, as I’ve not really got a good working example of a SWORD 2 sequence available (the Broker doesn’t do CRUD).

With SWORD 2, the idea is to be able to update existing records, piecemeal:

  • Create a blank record
  • Add some basic metadata (title, authors, etc)
  • Add the rough-draft file
  • Add the post-review article
  • Delete the rough-draft file
  • Add the abstract
  • Add the publication metadata (journal, issue, pages, etc)

With SWORD 2, what routines are used to process the request is based on the mime-type given in the headers.

Within each importer, there is a new function:

sub new {
 my ( $class, %params ) = @_;
 my $self = $class->SUPER::new(%params);
 $self->{name} = "Import RJBroker SWORD (2.0) deposits";

 $self->{visible}   = "all";
 $self->{advertise} = 1;
 $self->{produce}   = [qw( list/eprint dataobj/eprint )];
 $self->{accept}    = [qw( application/ )];
 $self->{actions}   = [qw( unpack )];
 return $self;

So, to create a new record, one posts a file with no record id:

  curl -x POST -i \
       -u username:password \
      --data-binary "@MyData.xml" \
       -H 'Content-Type: application/' \


This will find the importer that claims to understand ‘application/’, and use it to create a new record. The server response will include URLs for updating the record.

To add a file to a known record:

  curl -x POST -i \
       -u username:password \
      --data-binary "@MyOtherFile.pdf" \


This will use the default application/octet-stream importer, and add the file MyOtherFile.pdf to the record with the id 123.

To add more metadata:

  curl -x POST -i \
       -u username:password \
      --data-binary "@MyData.xml" \
       -H 'Content-Type: application/' \


This will find the importer that claims to understand ‘application/’, and use that code to add the metadata to the record with the id 123.

Note: there is a difference between PUT and POST:

  • POST adds contents to any existing data. Where a field already exists, the action is determined by the importer
  • PUT deletes the data and adds the new information – it replaces the whole record.


  • SWORD 1.3 uses the X-Package header to determine which importer routine to use, and the importer uses the mime-type to confirm suitability
  • SWORD 2 uses the mime-type to determine which importer routine to use.
  • The URLs for making deposits are different.

RJ Broker: a Research Output Delivery Service

Back in August 2009, the idea for a system to deposit research output directly to Institution Repositories (IRs) was formulated (like many other great ideas) on the back of a napkin, and presented in this ‘Basic Premise’ post. Development works on the Open Access Repository Junction finished in March 2011 and were followed a year later by the current project on the Repository Junction Broker (RJ Broker).

Development works have progressed over these last three years and a prototype RJ Broker has been designed but many questions were raised along the way. We decided to take advantage of the attendance of many representatives of the RJ Broker stakeholders at the 7th International Conference on Open Repositories (OR2012) in Edinburgh in July 2012, to refine our vision by using the direct input from key stakeholders.  An evening workshop was organised on the 9th July and representatives from our stakeholders: IR managers, funders, publishers, IR software and service developers from the UK, Europe, US and Australia were invited to take part. The summary of this brainstorming is presented here.

RJ Broker: A delivery service for research output

The RJ Broker Team first set the scene with a short presentation on the current and intended RJ Broker functionality.

The RJ Broker is in effect a delivery service for research output. It accepts deposits from data providers (institutional and subject repositories, funder and publisher systems). For each deposit, it uses the metadata for the deposit to identify organisations and any associated repositories that are suitable for receiving the deposit. It then transfers the deposit to the repositories that have registered with the RJ Broker service. The metadata acts as the address card for the deposit which is the parcel.

In order to receive deposits from the RJ Broker, IRs have to register their SWORD credentials with the service to give the RJ Broker an access point to their systems for data input, like the letter box lets mail in the house.

At this stage, the focus for the type of deposit (or content of the parcel) is research publications. However, the way the RJ Broker works is independent of the deposit type. The RJ Broker will transfer parcels of any type, big or small. For example, a deposit can be a publication with several supporting data files, just an article or just the data files. The parcel can be empty or even taped shut.  Indeed in the case  of an article published under Gold Open Access, the address card is the only information needed to provide a notification of the availability of that article in the publisher system. When an article is subject to an embargo period, a sealed parcel is required to enable the delivery to take place straight away even if it can only be opened later, like the presents sent by far away relatives placed under the tree to be opened on Christmas day!

The mind map below was used to inform the discussion of all the questions we were seeking to answer.

Metadata: The all important label

Like the address card on the top of a parcel, the metadata is available for all to see.  Indeed the metadata is always fully open access regardless of the embargo period imposed on the data itself. The metadata is owned by the person who creates it (author, publisher or IR manager) but there is no copyright on it. The metadata can even be considered to act as a advertissement flyer for the data itself which benefits its owner (whether author, publisher or IR manager) and therefore explains why owners support open access for metadata.

Standards are generally a good thing, improving quality and facilitating exchange. For example, the use of a funders’ code field in the metadata would significantly ease reporting on return on investment for the funding agencies.  Several metadata standards are currently being developed, for example CERIF, RIOXX, COUNTER or the OpenAIRE Guidelines. The RJ Broker will support these standards but it is not its duty to ensure these standard are adhered too or that for example all required fields have been entered. In the same way as one expects an address card to provide enough space for the required information to be supplied in order for the parcel to be delivered, one does not expect the postman to fill in a missing house number or any other missing information.

Deposit: What is in that parcel?

The RJ Broker has to assume some responsibility for the object it is trusted with transferring to IRs, mainly the correct identification of appropriate IRs and the subsequent delivery to these IRs. The RJ Broker is also responsible for the safe keeping of the deposit while it is in transit.

Once it has been successfully transferred to the registered IRs, the responsibility of the RJ Broker ends. It may seem tempting to extend the functonality of the RJ Broker to store a copy of every deposit in order to allow later downloads by newly registered IRs or simply to provide a safety backup. However, this is not the purpose of a delivery service. This would also turn the RJ Broker into a repository that could grow to a massive size. Therefore the RJ Broker will only keep recently transferred deposits for a limited period of time to allow IRs time to accept and process these deposits. Similarly, the postman is not required to scan each postcard he delivers for future safe keeping but undelivered items will be returned to the sorting office and held for a while to allow collection.

If none of the identified IRs have registered with the RJ Broker then no delivery is possible. This constitutes a successfull processing of a deposit for the RJ Broker. Future developments cou;ld consider transferring the deposit to open repositories like or sending a notification to IRs to advise them to register with the RJ Broker should they wish to receive direct deliveries of research output from the RJ Broker.

The RJ Broker will transfer every deposit it receives. It does not provide an inspection or validation service. Therefore will not flag an empty, a duplicate, incomplete or badly formatted deposit.

Dealing with embargo

The RJ Broker aims to support Open Access (OA) by enabling the dissemination of the reseach output across the UK and beyond. It does not matter for the delivery process whether this OA is gold or green. However, it is important that any embargo period is dealt with appropriately.

A legal agreement between the RJ Broker and each data provider requesting the respect of embargo periods will be signed before any data from that provider is transferred by the RJ Broker. Each IR will in turn have to accept a similar agreement before they can receive data, through the RJ Broker, from providers enforcing an embargo. Data providers have to ensure that embargo periods are correctly noted in the metadata. IRs have to respect any embargo specified in the metadata. The RJ Broker acts as a trusted, enabling technology between both parties, not as a control point,  it does not have any responsibility regarding the enforcement of embargos. Legal agreements are currently being set in place for the early adopters of the RJ Broker. The hope is that a set of standard agreements can be derived from these to promote take up and ease the administration process.

Beside the legal agreement, the RJ Broker will not perform additional checks or require further certification or accreditation from IRs. The aim of the RJ Broker is to disseminate research output widely. It is not its purpose to rate IRs for trust or reliability which is best left to the appropriate authorities.

Tracking a Deposit

The RJ Broker assigns a tracking ID to each deposit which enables data suppliers to check on the onward progress of the deposit after it was successfully delivered to an IR SWORD endpoint.

As mentioned previously, the responsibility of the RJ Broker ends once the deposit has been successfully transferred to the registered IRs.  Institutions follow different procedures, workflows and timetables when it comes to processing deposits left for inclusion in their repositories. Therefore asserting that a deposit has been successfully ingested by an IR is a complexe task which is not part of the RJ Broker’s remit as a delivery service. However, the RJ Broker will provide the data suppliers with a send receipt, as a proof that the deposit has been processed by the RJ Broker, which includes a tracking ID. The data supplier can later use this ID to check on the status of the deposit with the IRs in which it has been transferred, i.e. received, queued for processing, accepted, live or rejected.

Keep it simple!

The discussion was very productive, all topics set in our mind map were covered and answers to all questions regarding the functionality of the RJ Broker were agreed. The unanimous conclusion was to keep it simple!


The RJ Broker should aim to be a delivery service only. It will follow a “push only” model. Deposits will be pushed to the RJ Broker by data suppliers and the RJ Broker will push the deposits to the IRs. This enables the RJ Broker to have a streamlined workflow.

Specifically, the RJ Broker will NOT:

  • provide any reporting or statistics
  • filter incoming data
  • improve data or metadata
  • enforce standard compliance
  • be a repository
  • collect (“pull”) data from suppliers

I would like to thank everyone who took part in the workshop and help us shaped the functionality of the future RJ Broker service! Development and trials are on-going with a first version of the RJ Broker due for release to UK RepositoryNet+ in Spring 2013. Watch this space!

List of Attendees

Tim Brody (University of Southampton, UK), Yvonne Budden (University of Warwick , UK), Thom Bunting (UKOLN, UK), Peter Burnhill (UK RepositoryNet+, UK), Pablo de Castro Martin (UK RepositoryNet+, UK), Andrew Dorward (UK RepositoryNet+, UK), Kathi Fletcher (Shuttleworth Foundation, USA), Robert Hilliker (Columbia Univeristy, USA), Richard Jones (Cottage Labs, UK), Stuart Lewis (The University of Edinburgh, UK), John McCaffery (University of Dundee, UK), Paolo Manghi (OpenAIRE, Italy), Muriel Mewissen (RJ Broker, UK), Balviar Notay (JISC, UK), Tara Packer (Nature Publishing Group, USA), Marvin Reimer (Shuttleworth Foundation, USA), Anna Shadbolt (University of Melbourne, Australia), Terry Sloan (UK RepositoryNet+, UK), Elin Strangeland (University of Cambridge, UK), Ian Stuart (RJ Broker, UK), James Toon (The University of Edinburgh, UK), Jin Ying (Rice University, USA)

Embargoes in real metadata

One very important function the RJ Broker needs to support is that of embargoes: Publishers and other data suppliers are going to be much more willing to be involved in a dissemination programme if they believe their property is not being abused. To be blunt about it: most journals make money from selling copies – if they’re giving articles away for free, who would buy them… ?

So – the RJ Broker needs to ensure that embargo periods are defined, and clearly defined in the data that’s passed on….. and that’s why recipients of the data from the RJ Broker need to sign an agreement: to assert they will actually honour any embargo periods for records they receive.

We know from previous conversations that one cannot embargo metadata, so embargo only applies to the binary objects attached to the metadata record.

The first question is “Is there a blanket embargo for all files, or can different files have different embargoes?”, and the second question is “Is there a difference between ‘document’ and ‘file’?”

Actually, thinking about it, a blanket embargo can be mimic’d by having the same embargo for all files, however variable embargoes cannot be (sensibly) implemented using a single field. Thinking about the difference between “files” and “Documents” comes from the EPrints platform: they have this concept that a document may actually be composed of multiple files (consider a web page: the .html file is the primary file, however there are additional image files, stylesheet files, and other files that combine to present the whole document).

The third question is how to encode this embargo information.

The Broker has already defined it’s basic metadata file as being a METs file, with the record metadata encoded in epdcx (see SWAP and epcdx).

Looking round the net, there are several formal structures for defining administrative metadata; archival metadata; preservation metadata; etc…. but none seemed to actually define a nice, simple, embargo date.

In the end, I have loaded up two options, and we’ll investigate which one makes more sense as things get used.

The easier one, but the one that breaks the METS standard is to add an attribute to each Structure element in the METS file:

<structMap ID="sword-mets-struct-1" LABEL="structure" TYPE="LOGICAL">
  <div ID="sword-mets-div-1" DMDID="sword-mets-dmd-eprint-191" TYPE="SWORD Object">
    <div ID="sword-mets-div-2" oarj_embargo="2013-05-29">
      <fptr FILEID="eprint-191-document-581-0"/>
    <div ID="sword-mets-div-3" oarj_embargo="2013-05-29">
      <fptr FILEID="eprint-191-document-582-0"/>

This has the beauty that the embargo metadata is directly linked to the document it belongs to, and that information is immediately available to any import routines.

The second is to write a more convoluted, but formally correct, structure within the METS Administrative section:

<amdSec ID="sword-mets-adm-1" LABEL="administrative" TYPE="LOGICAL">
  <rightsMD ID="sword-mets-amdRights-1">
        <epdcx:descriptionSet xmlns:epdcx=""
          <epdcx:description epdcx:resourceId="sword-mets-div-3" 
            <epdcx:statement epdcx:propertyURI=""
              <epdcx:valueString epdcx:sesURI="">2013-05-29</epdcx:valueString>
          <epdcx:description epdcx:resourceId="sword-mets-div-2"
            <epdcx:statement epdcx:propertyURI=""
              <epdcx:valueString epdcx:sesURI="">2013-05-29</epdcx:valueString>

As you can see, this follows the rules for the epcdx structure. This was a deliberate choice as it is already used for the primary metadata, so the importers will already have routines for following the structure.

What will be interesting is which is more usable when it comes to writing importers.

OA-RJ is dead. Long live ORI and RJ Broker.

JISC is funding EDINA to participate in the UK Repository Net+  infrastructure project.  UK Repository Net+, or RepNet for short,  is a socio-technical infrastructure supporting deposit, curation and  exposure of Open Access research literature.  EDINA’s contribution to RepNet is based on the Open Access Repository Junction (OA-RJ) project outcomes. Two independent tools will be developed as two separate projects from the ‘discovery’ and ‘delivery’ functionality of the OA-RJ.

The Organisation and Repositories Identification (ORI) project will design a standalone middleware tool for identifying academic organisations and their associated repositories. It tool will harvest data from several authoritative sources  in order to provide information on over 23,000 organisations and 3,000 repositories worldwide.  APIs will be provided to query the content of the ORI tool.

The Repository Junction Broker (RJ Broker) project will deliver a standalone middleware tool for handling the deposit of research articles to multiple repositories. This will offer an effective solution to authors and publishers wishing to deposit open access publications in relevant subject and institutional repositories.  The RJ Broker will parse the metadata of an article to determine the appropriate target repositories and transfer the publication to the registered repositories. It is intended to minimise efforts on behalf of potential depositors, and thereby maximise distribution and exposure of research outputs.

The RJ Broker project has been funded for a year as part of the wave 1 development of  UK Repository Net+ while the ORI project was awarded 6 months funding.