Embargoes in real metadata

One very important function the RJ Broker needs to support is that of embargoes: Publishers and other data suppliers are going to be much more willing to be involved in a dissemination programme if they believe their property is not being abused. To be blunt about it: most journals make money from selling copies – if they’re giving articles away for free, who would buy them… ?

So – the RJ Broker needs to ensure that embargo periods are defined, and clearly defined in the data that’s passed on….. and that’s why recipients of the data from the RJ Broker need to sign an agreement: to assert they will actually honour any embargo periods for records they receive.

We know from previous conversations that one cannot embargo metadata, so embargo only applies to the binary objects attached to the metadata record.

The first question is “Is there a blanket embargo for all files, or can different files have different embargoes?”, and the second question is “Is there a difference between ‘document’ and ‘file’?”

Actually, thinking about it, a blanket embargo can be mimic’d by having the same embargo for all files, however variable embargoes cannot be (sensibly) implemented using a single field. Thinking about the difference between “files” and “Documents” comes from the EPrints platform: they have this concept that a document may actually be composed of multiple files (consider a web page: the .html file is the primary file, however there are additional image files, stylesheet files, and other files that combine to present the whole document).

The third question is how to encode this embargo information.

The Broker has already defined it’s basic metadata file as being a METs file, with the record metadata encoded in epdcx (see SWAP and epcdx).

Looking round the net, there are several formal structures for defining administrative metadata; archival metadata; preservation metadata; etc…. but none seemed to actually define a nice, simple, embargo date.

In the end, I have loaded up two options, and we’ll investigate which one makes more sense as things get used.

The easier one, but the one that breaks the METS standard is to add an attribute to each Structure element in the METS file:

<structMap ID="sword-mets-struct-1" LABEL="structure" TYPE="LOGICAL">
  <div ID="sword-mets-div-1" DMDID="sword-mets-dmd-eprint-191" TYPE="SWORD Object">
    <div ID="sword-mets-div-2" oarj_embargo="2013-05-29">
      <fptr FILEID="eprint-191-document-581-0"/>
    </div>
    <div ID="sword-mets-div-3" oarj_embargo="2013-05-29">
      <fptr FILEID="eprint-191-document-582-0"/>
    </div>
  </div>
</structMap>

This has the beauty that the embargo metadata is directly linked to the document it belongs to, and that information is immediately available to any import routines.

The second is to write a more convoluted, but formally correct, structure within the METS Administrative section:

<amdSec ID="sword-mets-adm-1" LABEL="administrative" TYPE="LOGICAL">
  <rightsMD ID="sword-mets-amdRights-1">
    <mdWrap MDTYPE="OTHER" OTHERMDTYPE="RJ-BROKER">
      <xmlData>
        <epdcx:descriptionSet xmlns:epdcx="http://purl.org/eprint/epdcx/2006-11-16/"
                              xsi:schemaLocation="http://purl.org/eprint/epdcx/2006-11-16/
                                                  http://purl.org/eprint/epdcx/xsd/2006-11-16/epdcx.xsd ">
          <epdcx:description epdcx:resourceId="sword-mets-div-3" 
                             epdcx:resourceURI="http://devel.edina.ac.uk:1203/191/">
            <epdcx:statement epdcx:propertyURI="http://purl.org/dc/terms/available"
                             epdcx:valueRef="http://purl.org/eprint/accessRights/ClosedAccess">
              <epdcx:valueString epdcx:sesURI="http://purl.org/dc/terms/W3CDTF">2013-05-29</epdcx:valueString>
            </epdcx:statement>
          </epdcx:description>
          <epdcx:description epdcx:resourceId="sword-mets-div-2"
                             epdcx:resourceURI="http://devel.edina.ac.uk:1203/191/">
            <epdcx:statement epdcx:propertyURI="http://purl.org/dc/terms/available"
                             epdcx:valueRef="http://purl.org/eprint/accessRights/ClosedAccess">
              <epdcx:valueString epdcx:sesURI="http://purl.org/dc/terms/W3CDTF">2013-05-29</epdcx:valueString>
            </epdcx:statement>
          </epdcx:description>
        </epdcx:descriptionSet>
      </xmlData>
    </mdWrap>
  </rightsMD>
</amdSec>

As you can see, this follows the rules for the epcdx structure. This was a deliberate choice as it is already used for the primary metadata, so the importers will already have routines for following the structure.

What will be interesting is which is more usable when it comes to writing importers.

Comments are closed.