Today we bring you a guest post reflecting on the experience of being a first timer at Repository Fringe. Our blogger is Richard – and we’ll let him introduce himself…
My name is Richard Wincewicz and I work at EDINA as a software engineer. My background is synthetic chemistry but three years ago I got involved in the Islandora project (http://islandora.ca) based on the east coast of Canada. A year ago I moved back to the UK and started my current position at EDINA.
This was my first Repository Fringe and I was surprised at how comfortable it felt. I’ve not been in the repository field for that long but I’ve got to know some people and this was a great opportunity to catch up with them. Being relatively new also meant that there were plenty of people there that I’d not met before and so I spent a lot of time making new connections with people. The sessions were fairly informal and plenty of time was allowed between them to let people engage and share their thoughts and ideas. Even so there were occasions where Nicola had to heard a number of stragglers (me included) into the next session because the impromptu discussions were so engrossing that we’d lost track of time.
When I signed up I’d indicated that I wanted to take part in the developer challenge. At first I looked at the topic of ‘preservation’ and thought, “That’s a broad topic, I’m sure I can come up with a useful idea over the next couple of weeks.” On my way home on the evening before my entry had to be submitted I finally came up with an idea that was potentially useful and feasible given that I only had a night to get it done (as well as sleep and eat).
Over the previous couple of days I had heard a few people mention the lack of metadata provided when given content to store, alongside the lack of willingness of providers to change. My idea was to create a web service that would take any file that you wanted to throw at it and provide as much metadata as it could glean from the file in a useable form. There are plenty of tools around that will extract the embedded metadata in a file, the Apache Tika project (http://tika.apache.org/) being one of the more comprehensive ones, and my application was basically a front end for this.
The added value that I provided was to return the metadata in Dublin Core. This meant that this web service could be integrated into a repository workflow with very little effort. My plan was to expand the number of metadata schemas available to make it easier for the repository to incorporate the output directly but I sadly ran out of time. One thing that became clear while testing my code was that often the quality of the embedded metadata was poor. After discussing my project with Chris Gutteridge I decided that mining the document for relevant information would give richer metadata but require a lot more time to produce anything even remotely functional.
In the end I spent around 3 hours on my entry but I was proud that I had something that not only worked but didn’t fail horribly when I demoed it live to a roomful of people.
I enjoyed my first Repository Fringe immensely. I got a huge amount out of it both in terms of learning and networking. I plan to attend next year and hopefully find a couple more hours to work on my developer challenge entry.
In December I took part in the Will’s World hack organised by EDINA. I was involved with the collection of the data and so had a pretty good idea of what the registry contained but it was still challenging to come up with a novel way of exploiting this information. My first thought was to process the marked-up play texts and add links to any relevant content in the registry. This way I’d be using the two main sources of information provided by the project. After the first check-in I decided that there were a lot of people using the marked-up plays and so I started thinking about different ways of visualising the data in the registry. It’s easy to visualise a play because it relates to the way we live our lives, but a store of millions of pieces of information is less familiar.
A Timeline Visualisation
After a quick search of the internet I had found a number of timelines but very few were properly interactive. In fact the only one that I found that was interactive was this one. This timeline contained all of the key points in Shakespeare’s life but it draws from a static list of events. I wanted to create something that brought in diverse information from numerous sources and displayed it in a way that people could relate to and use to explore more of the content.
Trying to display the entire contents of the registry on a timeline would result in a very cluttered display and sluggish page load so I decided to limit the results to three collections. The three collections I chose were Open Library, Culture Grid and National Library of Scotland. This still gave me around 500,000 results to work with so I limited the results further and split them up into 50-year sections. This gave me 11 separate timelines, each of which had a sensible amount of content. The example shown below is of a record from the Open Library collection showing a publication of part of Shakespeare’s play Henry VI. Clicking on the image will take you to the specific entry on the Open Library’s site.