Developer’s Challenge: Show and Tell LiveBlog

Today we are liveblogging from the OR2012 conference at Lecture Theatre 1 (LT1), Appleton Tower, part of the University of Edinburgh. Find out more by looking at the full program.

If you are following the event online please add your comment to this post or use the #or2012 hashtag.

This is a liveblog so there may be typos, spelling issues and errors. Please do let us know if you spot a correction and we will be happy to update the post.

Hi there, I’m Mahendra Mahey, I run the DevCSI project, my organisation is funded by JISC. This is the fifth Developer Challenge. This is the biggest to date! We had 28 ideas. We have 19 presentations, each gets 3 minutes to present! You all need a voting slip! At the end of all of the presentations we will bring up a table with all the entries. To vote write the number of your favourite pitch. If it’s a 6 or a 9 please underline to help us! We will take in the votes and collate them. The judges won’t see that. They will convene and pick their favourites and then we will see if they agree… there will then be a final judging process.

The overall prize and runner up shares £1000 in Amazon vouchers. The overall winner will be funded to develop the idea (depending on what’s logitically possible). And Microsoft research have a .Net gadgeteer prize for the best development featuring Microsoft technology. So we start with…

1 – Matt Taylor, University of Southampton – Splinter: Renegade Repositories on Demand

The idea is that you have a temporary offshoot of your repository, can be disposed or reabsorbed, ideal for conferences or workshops, reduces overhead, network of personal microrepositories – the idea is that you don’t have to make accounts for anyone temporarily using your repositoriy. It’s a network of personal microrepository, A lightweight standalone anotation system. Its independent of the main repository. Great for inexperienced users, particularly important if you are a high prestige university. And the idea is that it’s a pseudopersonal workspace – can be shared on the web but separate of your main repository. And it’s a simplified workflow – so if you make a splinter repository for an event you can use contextual information – conference date, location, etc. to populate metadata. Microrepository already in development and tech exists: RedFeather.ecs.soton.ac.uk. Demo at Bazaar workshop tomorrow. Reabsorption trivial using SWORD.

2 – Keith Gilmerton and Linda Newman – MATS: Mobile Audio Transcription and Submission

The idea is that you submit audio to repositories from phones. You set up once. You record audio. You select media for transcription, you add simple metadata You can review audio. Can pick from Microsoft Research’s MAVIS or Amazon’s Mechanical Turk. When submission back you get transcription and media to look at, can pick which of those two – either or both – you upload. And even if transcript not back its OK – new SWORD protocol does updates. And this is all possible using Android devices and code reused from one of last years challenges! Use cases – digital archive of literacy studies seek audio files, elliston poetry curator make analogue recordings , tablets in the field – Pompeii Archeaological Research Project would greatly increase submissions of data from the field.

3 – Joonas Kesaniemi and Kevin Van de Velde – Dusting off the mothballs introducing duster

The idea is to dust off time series here.  The only thing constant is change (Heraclitus 500BC). I want to get all the articles from AAlto university. It’s quite a new university but there used to be three universities that merged together. It would help to describe that the institution changed over time. Useful to have a temporal change model. Duster (aka Query expansion service) takes a data source that is a complex data model and then makes that available. Makes a simple Solr document for use via API. An example Kevin made – searching for one uni searches for all…

4 – Thomas Rosek, Jakub Jurkiewicz [sorry names too fast and not on screen] – Additional text for repository entries

In our repository we have keywords on the deposits – we can use intertext to explain keywords. Polish keywords you may not know them – but we can see that in English. And we can transliterate cyrillic. The idea is to build a system from blogs – connected like lego bricks. Build a blog for transliteration, for translating, for wikipedia, blog for geonames and mapping. And these would be connected to repository and all work together. And it would show how powerful

5 – Asger Askov Blekinge – SVN based repositories 

Many repositories have their own versioning systems but there are already well established versioning systems for software development that are better (SVN, GIT) so I propose we use SVN as the back end for Fedora.

Mass processing on the repository dowsn’t work well. Checkout the repo to a hadoop cluster, run the hadoop job, and commit the changed objects back. If we used standardised back end to access repository we could use Gource – software version control visualisation. I have developed a proof of concept that will be on Github in next few days to prove that you can do this, you can have a Fedora like interace on top of SVN repository.

6. Patrick McSweeney, University of Southampton – DataEngine

This is a problem we encountered, me and my friend Dabe Mills. For his PhD he had 1 GB of data, too much for the uni. Had to do his own workaround to visualise the data. Most of our science is in tier 3 where some data, but we need support! So the idea is that you put data into repository, allows you to show provenance, can manipulate data in the repository, merge into smaller CSV files, create a visualisation of your choice. You store intermediary files, data and the visualisations. You could do loads of visualisations. Important as first step on road to proper data science. Turns repository into tool that engages researchers from day one. And full data trail is there and is reproducable. And more interesting than that. You can take similar data, use same workflow and compare visualisation. And you can actually compare them. And I did loads in 2 days, imagine what I could do in another 2!

7. Petr Knoth from the Open University –  Cross-repository mobile application 

I would like to propose an application for searching across all repositories. You wouldn’t care about which repository it’s in, you would just get search it, get it, using these apps. And these would be provided for Apple and Google devices. Available now! How do you do this? You use APIs to aggregate – we can use applications like CORE, can use perhaps Microsoft Academic Search API. The idea of this mobile app is that it’s innovation – it’s a novel app. The vision is your papers are everywhere through syncing and sharing. It’s relevance to user problems: WYFIWYD: What you find is what you download. It’s cool. It’s usable. Its plausible for adoption/tech implementation.

8. Richard Jones and Mark MacGillivray, Cottage Labs – Sword it!

Mark: I am also a PhD student here at Edinburgh. From that perspective I know nothing of repositories… I don’t know… I don’t care… maybe I should… so how do we fix it. How do we make me be bothered?! How do we make it relevent.

Richard: We wrote Sword it code this week. It’s a jQuery plugin – one line of javascript in your header – to turn the page into a deposit button. Could go in repository, library website, your researchers page… If you made a GreaseMonkey script – we could but we haven’t – we could turn ANY page into a deposit! Same with Google results. Let us give you a quick example…

Mark: This example is running on a website. Couldn’t do on Informatics page as I forgot my login in true researcher style!

Richard: Pick a file. Scrapes metadata from file. Upload. And I can embed that on my webpage with same line of code and show off my publications!

9. Ben O Steen – isthisresearchreadable.org

Cameron Neylon came up to me yesterday saying that lots of researchers submit papers to repositories like PubMed but also to publishers… you get DOIs. But who can see your paper? How can you tell which libraries have access to your papers? I have built isthisresearchreadable.org. We can use CrossRef and a suitable size sample of DOIs to find out the bigger picture – I faked some sample numbers but CrossRef is down just now. Submit a DOI, see if it works, fill in links and submit. There you go.

10. Dave Tarrant – The Thing of Dreams: A time machine for linked data

This seemed less brave than kinect deposit! We typically publish data as triples… why aren’t people publishing this stuff when they could be… well because they are slightly lazy. Technology can solve problems I’ve created LDS3.org. It’s very Sword, very CRUD, very Amazon webs services… So in a browser… I can look at a standard Graphite RDF document. But that information is provided by this endpoint, gets annotated automatically. Adds date submitted and who submitted it. So, the cool stuff… well you can click view doc history… it’s just like Apple time machine that you can browse through time! And cooler yet you can restore it and browse through time. Techy but cool! But what else does this mean… we want to get to semantic web, final frontier.. how many countries have capital cities with an airport and a population over 2 million… on 6th June 2006. Can do it using Memento. Time travel for the web + time travel for data! The final frontier.

11. Les Carr – Boastr – marshalling evidence for reposting outcomes

I have found as a researcher I have to report on outcomes. There is technology missing. Last month a PhD student tweeted that he’d won a prize for a competition from the world bank – with link to World bank page and image of him winning prize, and competition page. We released press release, told EPSRC, they press released. Lots of dissemination, some of that should have been planned in advance. All published on the web. And it disappears super fast. It just dissapates… we need to capture that stuff for 2 years time when we report that stuff! It all gets lost! We want to capture imagination while it happens. We want to put stuff together. Path is a great app for stuff like Twitter has a great interface – who, what, where. Tie to sources of open data, maybe Microsoft Academic Live API. Capture and send to repositories! So that’s it: Boastr!

12. Juagr Adam Bakluha? – Fedora Object Locking

The idea is to allow multiple Fedora webapps working together to allow multiheaded fedora working we can do mass processing like: Fedora object store on a Hadoop File System, one fedora head, means bottlenecks, multiple heads mean multiple apps. Some shared stat between webapps. Add new rest methods – 3 lines in some jaxrs.xml. Add the decorator – 3 lines in Fedora.fcfg and you have Fedora Object locking

13. Graham Triggs – SHIELD

Before the proposal lets talk SWORD… its great, but just for deposit. With SWORD2 you can edit but you get edit iri and you need those, what if you lose them. What if you want to change content in the repository? So, SWORD could be more widely used if edit iris were discoverable. I want an ATOM feed. I want it to support authentication. Better replacement for OMI-PMH. But I want more. I want it to complete non archived items, non complete items, things you may have deposited before. Most importantly I want the edit iri! So I said I have a name…. I want a Simple Harvest Interface for Edit Link Discovery!

14. Jimmy Tang, DRI – Redundancy at the file and network level to protect data

I wanted to talk about redundancy at file and network level to protect data. One of the problems is that people with multi-terabyte archives like to protect it. Storage costs money. Replicating data is wasteful and expensive I think. LOCKSS/Replicating data can be wasteful. Replication means N times cost and money. My idea is to take an alternative approach… Possible solutions is using forward error correcting or erasure codes to a persistant layer – like setting up a RAID disc. You keep pieces of files and you can reconstruct it – move complexity from hardware to software world and save money with the efficiency. There are open source libraries to do this, most are mash ups. Should be possible!

15. Jose Martin – Machine and user-friendly policifying

I am proposing a way to embed data from SHERPA ROMEO webservices into records waiting to be reviewed in a repository. Last week I heard how SHERPA/ROMEO receives over 250K requests for data, he was looking for a script to make that efficient, a script to run on a daily or weekly basis. Besides this task is often fairly manual. Why not put machines to work instead… so we have an ePrints repository with 10 items to be reviewed. We download SHERPA/ROMEO information here. We have the colour code that give a hint about policy. Script would go over all items looking for ISSN matches and find colour code. and let us code those submissions – nice for repository manager and means the items are coded by policy ready to go. And updated policy info done in just one request for, say, 10 items. More efficient and happier! And retrieve journal title whilst at it.

16. Petr Knoth – Repository ANalytics

Idea to make repository managers lives very easy. They want to know what is being harvested and if everything is correct in their system. It’s good if someone can check from the outside. The idea is that analytics sit outside repository, lets them see metadata harvested, if it works OK and also provides stats on content – harvesting of full text PDF files. Very important. even though we have OMI-PMH there are huge discrepancies between the files. I am a repository manager I can see that everything is fine, that it has been carried out etc.  So we can see a problem with an end point. I propose we use this to automatically notify repository manager that something is wrong. Why do we count metadata not PDFs – latter are much more important. Want to produce other detailed full text stats, eg citation levels!

17. Steffan Godskesen – Current and complete CRIS with Metadata of excellent quality 

Researchers don’t want to do thinsg with metadata but librarians do care. In many cases metadata is already available from other sources and in your DI. So When we query the discovery iunterface cleverly we can extract metadata inject into CRIS, have librarians quality check it and obtain excellent CRIS. Can we do this? We have done this between our own DI (discovery system) and CRIS. And again when we changed CRIS, again when we changed DI. Why do again and again… to some extent we want help from DI and CRIS developers to help make these systems extract data more easily!

18. Julie Allison and Ben O’Steen – Visualising Repositories in the Real World

We want to use .Net Gadgeteer or Arduino to visualise repository activity, WHy? to demonstrate in the real world what happens in the repository world. Screens showing issues maybe. A physical guage for hits for hourse – great demo tool. A bell that ring when met deposits per day target. Or blowing bubbles for each deposit. Maybe 3D printing of deposited items? Maybe online Chronozoom, PivotViewer – explore content, JavaScript InfoVis – set of visualisation tools. Repository would be mine – York University. Using query interface to return creation date etc. Use APIs etc. So for example a JSPN animation of publications and networks and links between objects.

19. Ben O’Steen – Raid the repositories!

Lots of repositories with one managers, no developers. Raid them! VM that pulls them all in, pull in text mining, analysis, stats, enhancer etc. Data. Sell as a PR tool £20/month as a demo. Tools for reuse.

Applause meter in the room was split between Patrick MacSweeney  and Richard Jones & Mark MacGillivray’s presentation.

Comments are closed.