P5A: Deposit, Discovery and Re-use LiveBlog

Today we are liveblogging from the OR2012 conference at Lecture Theatre 4 (LT4), Appleton Tower, part of the University of Edinburgh. Find out more by looking at the full program.

If you are following the event online please add your comment to this post or use the #or2012 hashtag.

This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.

Topic: Repositories and Microsoft Academic Search
Speaker(s): Alex D. Wade, Lee Dirks

MSResearch seeks out innovators from the worldwide academic community. Everything they produce is freely available, non-profit.

They produce research accelerators in the for of Layerscape (visualization, storytelling, sharing), DataUp (used to be called DataCuration for Excel), and Academic Search.

Layerscape provides desktop tools for geospatial data visualization. It’s an Excel add-in that creates live-updating earth-model visuals. It provides the tooling to create a tour/fly-through of the data a researcher is discussing. Finally, it allows people to share their tours online – they can be browsed, watched, commented on like movies. If you want to interact with the data you can download the tour with data and play with it.

DataUp aids scientific discovery by ensuring funding agency data management compliance and repository compliance of Excel data. It lets people go from spreadsheet data to repositories easily. This can be done through an add-in or via cloud service. The glue that sticks theses applications together is repository agnostic, with minimum requirements for ease of connection. It’s all open source, driven by DataOne and CDL. It is in closed beta now with a wide release later this summer.

Now, Academic Search. It started by bringing together several research projects in MSResearch. It’s a search engine for academic papers from the web, feeds, repositories. Part of the utility of it is a profile of information around each publication, possibly from several sources, coalesced together. As other full-text documents cite in, those can be shown in context. Keywords can be shown and linked to DOI, can be subscribed to for change alerts. These data profiles are generated automatically, and that can build automatic author profiles as well. Conferences and journals they’ve published in, associations, citation history, institution search.

The compare button lets users compare institutions by different publication topics – by the numbers, by keywords, and so on. Visualizations are also available to be played with. The Academic Map shows publications on a map.

Academic Search will also hopefully be used a bit more than as a search engine. It is a rich source of information that ranks journals, conferences, academics, all sortable in a multitude of ways.

Authors also have domain-specific H-Index numbers associated with them.

Anyone can edit author pages, submit new content, clean things up. Anyone can also embed real-time pulls of data from the site onto their own site.

With the Public API, and an API key, you can fetch information with an even broader pull. Example: give me all authors associated with University of Edinburgh, and all data associated with them (citations, ID number, publications, other others, etc). With a publication ID, a user could see all of the references included, or all of the documents that cite it.

Q: What protocol is pushing information into the repositories?

A: SWORD was being looked at, but I’m uncertain about the merit protocol right now. SWORD is in the spec, so it will be that eventually.

Q: Does Academic Search harvest from repositories worldwide?

A: We want to, but first we’re looking at aggregations (OCLC Oyster). We want to provide a self-service registration mechanism, plus scraping via Bing. Right now, it’s a cursory attempt, but we’re getting better.

Q: How is the domain hierarchy generated?

A: The Domain hierarchy is generated manually with ISI categories. It’s an area of debate: we want an automated system, but the challenge is that more dynamic systems make rank lists and comparison over time more difficult. It’s a manual list of categories (200 total, at the journal level).

Q: Should we be using a certain type of metadata in repos? OAIPMH?

A: We use OAIPMH now, but we’re working on analysis of all that now. It’s a long term conversation about the best match.

Topic: Enhancing and testing repository deposit interfaces
Speaker(s): Steve Hitchcock, David Tarrant, Les Carr

Institutional repositories are facing big challenges. How are they presenting a range of services to users? How is presentation of repositories being improved, made easier? The DepositMO project hopes to improve just that. It asks how we can reposition the deposit process in a workflow. SWORD and V2 enable this.

So, IRs are under pressure. The Finch report suggests a transition with clear policy direction toward open access. This will make institutional open access repositories for publication obsolete, but not for research data. Repositories are taking a bigger view of that, though. Even if publications are open access, they can still be part of IR stores.

DepositMO has been in Edinburgh before. It induced spontaneous applause. It was also at OR before, in 2010.

This talk was a borderline accepted talk, perhaps because there is not a statement included: few studies of user action with repositories.

There are many ways that users interact with repositories, which ought to be analyzed. SWORD for Facebook, for Word.

SWORD gives a great scope of use between the user and repository, especially with V2. V2 is native in many repositories now, partially because of DepositMO.

With convenient tools built into already used software, like Word, work can be saved into repositories as it is developed. Users can set up Watch Folders for adding data, either as a new record or an update to an older version if changed locally. The latter example is quite a bit like Dropbox or Skydrive, but repositories aren’t harddrives. They aren’t designed as storage devices. They are curation and presentation services. Depositing means presenting very soon. DepositMO is a bit of a hack to prevent presentation while iteratively adding to repository content. Save for later, effectively.

Real user tests of DepositMO have been done – set up some laptops running created services and inviting users to test in pairs. This wasn’t about download, installation, and setup, but actual use in a workflow. Is it useful in the first place? Can it fit into the process? Task completion and success rates of repository user tasks were collected as users did these things.

On average, Word and watch folder deposit tools improved deposit time amongst other things. However, these entries aren’t necessarily as well documented as is typically necessary. The overall summary suggests that while there is a wow-factor in terms of repository interaction, the anxiety level of users increases as the amount of information they have to deposit increases. Users sometimes had to retrace steps, or else put things in the wrong places as they worked. They needed some trail or metadata to locate deposit items and fix deposit errors.

There are cases for not adding metadata during initial entry, though, so low metadata might not be the worst thing.

Now it’s time to do more research, exploring the uses with real repositories. That project is called DepositMOre. Watch Folder, EasyChair one-click submission, and to an extent the word add in will be analyzed statistically as people actually deposit into real repositories. It’s time to accomodate new workflows, to accomodate new needs, and face down challenges of publishers offering open access.

Q: Have you looked into motivations for user deposit into repositories?

A: No, it was primarily a study of test users through partners in the project. The how and what of usage and action, but not the why. There was a wonder whether more data about the users would be useful. If more data was obtainable, the most interesting thing would be understanding user experience with repositories. But mandate motivation, no, not looking into that.

Q: You’ve identified a problem users have with depositing many things and tracking deposits. Did you identify a solution?

A: It’s more about dissuading people from reverting to previous environments and tools. There are more explicit metadata tools, and we could do a better job of showing trails of submission, so that will need to filter back in. Unlike cloud drives, losers use control of an object once they are submitted to a repository. So, suddenly something else is doing something, and the user it’s disconcerting.

Topic: OERPub API for Publishing Remixable Open Educational Resources (OER)
Speaker(s): Katherine Fletcher, Marvin Reimer

This talk is about a SWORD implementation and client. Most of this work has happened in the last year, very quick.

Remixable open education repositories target less academic and more multi-institution, open repos. Remixability lets users learn anywhere. It’s a ton of power. All these open resources can seed a developer community for authoring and creation, machine learning algorithms, and it all encourages lots of remixable creation.

Remixability can be hard to support, though. Connexions, and other organizations, had grand ambition but not a very large API. And you need an importer/editor that is easy to use. Something that can mash data up.

In looking at APIs needed for open education, discoverability is important, but making publishing easier is important, too. We need to close the loop so that we stop losing the remixed work externally. That’s where SWORD comes in. V2.

Why SWORD V2 for OER? It has support for workflow. The things being targeted are live edited objects, versioned. Those versions need to be permanent so that changes are nondestructive. Adapting, translating, deriving are great, but associating them with common objects helps tie it all together.

OERPub extends SWORD V2. It clarifies and adds specificity to metadata. Specificity is required for showing the difference between versions and derivatives, specifically. And documentation is improved. Default values, repository controlled and auto-generated values are all documented. Precedents have been made clear, that’s it.

OERPub also merges semantics header for PUT. It simplifies what’s going on. Also added a section on Transforms under packaging. If a repository will transform content, it has a space to explain its actions. It provides error handling improvements, particularly elaboration on things like transform and deposit fails.

This is the first tool to submit to Connexions from outside of Connexions.

Lessons learned? Specification detail was great. Good to model on top of and save work. Bug fixes also lead the project away from multiple metadata specifications – otherwise bugs will come up. Learned that you always need a deposit receipt, which is normally optional. Finally, auto-discovery – this takeaway suggests a protocol for accessing and editing public item URLs.

A client was built to work with this – a transform tool to remixable format in very clean HTML, fed into Connexions, and pushed to clients on various devices. A college chemistry textbook was already created using this client. And a developer sprint got three new developers fixing three bugs in a day – two hours to get started. This is really enabling people to get involved.

Many potential future uses are cropping up. And all this fits into curation and preservation – archival of academic outputs as an example.

Q: Instead of PUT, should you be using PATCH?

A: Clients aren’t likely to not know repositories, but it is potentially dangerous to ignore headers. Other solutions will be looked at.

Q: One lesson learned was to avoid multiple ways of specifying metadata. What ways?

A: DublinCore fields with attributes and added containers. That caused errors. XML was mixed in, but we had to eventually specify exactly which we wanted.

EDINA Blogs

A Blogs.edina.ac.uk weblog

P5A: Deposit, Discovery and Re-use LiveBlog