Its been a long time coming (OK, I’ve been distracted by other things too), however the new APIs using a new dataset, are nearly ready.
The new calls return far more data, and in a consistent way!
The new dataset is a better merging of OpenDOAR and ROAR (and it updates from those “Authoritative” sources on weekly), and adds in records from the UK Access Management Federation (harvesting daily) and the webometrics list of 12,000 universities (http://www.webometrics.info/ – harvested on an ad-hoc basis)
The OA Organisation Identification Service (as we are now starting to call it) is now predominantly a list of [academic] organisations, with details of networks and repositories associated with them…. it is no longer a list of repositories and their organisations (as ROAR & OpenDOAR are)
How big is it?
How does 14,000 Organisations, 2,700 repositories, and 6,700 networks grab you? There are 17,00 URLs and 33,000 names for these objects…. its big! …. and growing bigger all the time!
If you can find me more good sources of Repositories or Academic Organisations, I’ll see about including them too!
What data is returned?
When you get data on an organisation, you get:
org_id | The ID for the org (can be used in other API calls) | |||||||||
lat | The Latitude held for the organisation | |||||||||
long | The Longitude held for the organisation | |||||||||
identites | A list of names (and URLs) for the organisation (see below for details) | |||||||||
Data is also pulled in from the identities data: … the following are taken from the first identity record:
…. and these are taken from the first matching (else non-matching) URL for the first identity:
|
When you get data on a repository, you get:
repo_id | The ID for the repository (can be used in other API calls) | |||||
lat | The Latitude held for the repository | |||||
long | The Longitude held for the repositiry | |||||
postaddress | The address the repository is located at | |||||
countrycode | The country the repository is in | |||||
oaibaseurl | The URL for OAI harvesting | |||||
softwarename | What software it uses (EPrints, DSpace, flubber, etc) | |||||
softwareversion | What version of the software | |||||
description | The main description for the repository | |||||
comment | A list of additional comments for the repositories | |||||
types | A list of repository types the repository is (institutional, data, etc) | |||||
content | A list of content types the repository accepts (Pre-prints, data, etc) | |||||
external_ids | A list of external ids [OpenDOAR_123, etc] | |||||
language | A list of languages used in the repository interface | |||||
sword | A list of servicedocument locations for the repository | |||||
identites | A list of names (and URLs) for the organisation (see below for details) | |||||
… the following are taken from the first identity record:
|
||||||
…. and these are taken from the first matching (else non-matching) URL for the first identity:
|
When you get data on a network, you get:
net_id | The ID for the network (can be used in other API calls) | |||||
inetnum | The IP range for the network (123.234.0.0-123.234.63.255) | |||||
dec_lower | The first IP number of the range (123.234.0.0, from above) | |||||
dec_upper | The last IP number of the range (123.234.63.255, from above) | |||||
identites | A list of name(s) for the network (see below for details) – there are no URLS, obviously | |||||
… the following are taken from the first identity record:
|
identities
Each entry in the array is a name for the object, with whichever name is defined as “Primary” at the start of the list.
Each identity object contains the following keys (if they exist in the database):
name | The name of the object (‘Poppleton Univeristy’, ‘Plink-Plonk Repository’, etc) |
acronym | Any acronym the object may be known as (‘PU’, ‘PPR’, etc) |
npref | A true/false flag that indicates which is the preferred term. (Absent means true, not false…. or “There is no statement that the name is not the preferred term” ) |
pri | A true/false flag that indicates if the name is marked as Primary.Again, this flag in not always defined, as there may be only one option, or there may be know definite name that is the primary name. |
iri | The Open Linked-Data iri to get the linked-data record |
nid | The database ID for the name |
urls | A sub-element containing URL data for the object, as associated with the particular name. |
urls
In the database, there is an association between names and URLs. This is to enable objects to have multi-lingual names, and appropriate urls for each language (eg: Ukranian, Russian, and English)
The urls element contains two keys: “matching” and “non-matching”, both of which are lists on url objects:
'urls' => { 'matching' => [ {....}, {....} ], 'non_matching' => [ {....}, {....} ] }
If a URL is flagged as Primary, it is placed at the front of the appropriate list
Within each url object, the following data is returned:
url | The actual URL |
pri | Whether the URL is marked as a primnary one |
live | A true/false flag to indicate if the URL returns [a non-error] web page |
date | The date that the URL was last checked. Note that no history is kept of the alive/not-alive checking. Hosts that are alive are re-checked weekly, hosts that are not flagged as alive are checked on a daily basis |
uid | The database ID for the URL |
Comprehensive enough? want more? speak to me….