OK, so there’s some interesting data to get – but how do you get it?
There are three general APIs, or 10… depending on how you count them.
Data returns
All APIs return data in the same ways:
- You can specify the format either with the Accepts header in the http request, or with the format parameter. The options are ‘json’, ‘xml’, or ‘text’, with ‘json being the default if nothing is specified.
- If there’s a callback parameter, and the format is json, then a crossDomain package is returned… very useful!
- All return the data as a nested object, with three top-level elements:
{
'message' => {}
'status' => 'ok',
'to' => 'http://.....'
}
status
is “ok” or “fail”, to
is the url that made the query, and message
contains the actual data being returned…. which is dependant on the query!
The queries
Lets start with the suite that list things (cf the AJAXie get_xxx functions and the main api)… currently at http://devel.edina.ac.uk:1201/cgi/list5/xxx, this is a suite of six APIs that pull out a list things:
- type
- content
- country
- lang
- org
- net
type
This lists the type (or classification) of repository.
'message' => {
'type' => [
{
'code' => 1,
'text' => 'Subject (Research Cross-Institutional)'
},
{
'code' => 2,
'text' => 'Other'
},
......
]
},
code |
text |
1 |
Undetermined – Repositories whose type has not yet been assessed |
2 |
Institutional (Institutional or departmental repositories) |
3 |
Disciplinary (Cross-institutional subject repositories) |
4 |
Aggregating (Archives aggregating data from several subsidiary repositories) |
5 |
Governmental (Repositories for governmental data) |
6 |
Subject (Research Cross-Institutional) |
7 |
Journal (e-Journal/Publication) |
8 |
Thesis |
9 |
Database (Database/A&I Index) |
10 |
Learning (Learning and Teaching Objects) |
11 |
Other |
12 |
Demonstration |
When a repository type is needed by /api, it is the code number you need.
Adding the parameter full=1
will cause the query to return all the repositories that are of that type listed under a repos
element. Note that repositories are not exclusively one type or another, and may appear under multiple types.
The repos sub-elements are indexed by repo_id
. There is also a count
element which will tell you how many repositories are in the set.
content
This lists the type of content that repositories accept
<message>
<content>
<code>1</code>
<text>Research papers (pre- and postprints)</text>
</content>
<content>
<code>2</code>
<text>Research papers (preprints only)</text>
</content>
.....
</message>
code |
text |
1 |
Research papers (pre- and postprints) |
2 |
Research papers (preprints only) |
3 |
Research papers (postprints only) |
4 |
Bibliographic references |
5 |
Conference and workshop papers |
6 |
Theses and dissertations |
7 |
Unpublished reports and working papers |
8 |
Books & chapters and sections |
9 |
Datasets |
10 |
Learning Objects |
11 |
Multimedia and audio-visual materials |
12 |
Software |
13 |
Patents |
14 |
Other special item types |
When a content type is to be defined in /api, it is the code number you need.
Adding the parameter full=1
will cause the query to return all the repositories that accept the content-type listed under a repos
element. Note that repositories are usually accept multiple content-types, so will appear under multiple entries.
The repos sub-elements are indexed by repo_id
. There is also a count
element which will tell you how many repositories are in the set.
lang
This lists all the languages the dataset knows about (in essence, the ISO 639 codes).
(We are limited to ISO 639-2 as ISO639-3 & later are not Open Access lists and there is a clause which states “the product, system, or device does not provide a means to redistribute the code set.”)
{
"to" : "http://devel.edina.ac.uk:1201/cgi/list5/lang",
"status" : "ok",
"message" : {
"lang" : [
{
"text" : "Abkhazian",
"iso3_b" : "abk",
"code" : "ab"
},
{
"text" : "Achinese",
"iso3_b" : "ace"
},
]
}
}
Adding the parameter full=1
will cause the query to return all the repositories that assert they use that language in their interface, listed in a repos
element. Many non-english interfaces are multi-lingual, and those repositories will appear in multiple lists.
The repos sub-elements are indexed by repo_id
. There is also a count
element which will tell you how many repositories are in the set.
country
This lists all the counties the dataset knows about (in essence, the ISO 3166-1 codes).
{
"to" : "http://devel.edina.ac.uk:1201/cgi/list5/country",
"status" : "ok",
"message" : {
"country" : [
{
"text" : "Andora",
"code" : "ad"
},
{
"text" : "United Arab Emirates",
"code" : "ae"
},
]
}
}
Adding the parameter full=1
will cause the query to include all the repositories, under a repos
element, that are listed [in OpenDOAR] as from of that country. OpenDOAR does not have a concept of multiple countries for a repository.
The repos sub-elements are indexed by repo_id
. There is also a count
element which will tell you how many repositories are in the set.
org
This lists all the organisations in the dataset. This script will take over 15 minutes to complete… there is a LOT of data to return!
{
"to" : "http://devel.edina.ac.uk:1201/cgi/list5/org",
"status" : "ok",
"message" : {
"org" : {
"1" : {
<as per org listing>
},
"4": {
<as per org listing>
},
]
}
}
Adding the parameter full=1
will cause the query to return all the repositories that are of that type listed under a repos
element. Running the query with the full flag can take twenty minutes!
The repos sub-elements are, in this situation, listed as described in this post .
net
Adding the parameter full=1
will cause the query to return all the repositories that are of that type listed under a repos
element. Note that repositories are not exclusively one type or another, and may appear under multiple types.
The repos sub-elements are indexed by repo_id
. There is also a count
element which will tell you how many repositories are in the set.