Richard Wincewicz

Today we bring you a guest post reflecting on the experience of being a first timer at Repository Fringe. Our blogger is Richard – and we’ll let him introduce himself…

My name is Richard Wincewicz and I work at EDINA as a software engineer. My background is synthetic chemistry but three years ago I got involved in the Islandora project (http://islandora.ca) based on the east coast of Canada. A year ago I moved back to the UK and started my current position at EDINA.

First impressions

This was my first Repository Fringe and I was surprised at how comfortable it felt. I’ve not been in the repository field for that long but I’ve got to know some people and this was a great opportunity to catch up with them. Being relatively new also meant that there were plenty of people there that I’d not met before and so I spent a lot of time making new connections with people. The sessions were fairly informal and plenty of time was allowed between them to let people engage and share their thoughts and ideas. Even so there were occasions where Nicola had to heard a number of stragglers (me included) into the next session because the impromptu discussions were so engrossing that we’d lost track of time.

Developer challenge

When I signed up I’d indicated that I wanted to take part in the developer challenge. At first I looked at the topic of ‘preservation’ and thought, “That’s a broad topic, I’m sure I can come up with a useful idea over the next couple of weeks.” On my way home on the evening before my entry had to be submitted I finally came up with an idea that was potentially useful and feasible given that I only had a night to get it done (as well as sleep and eat).

Over the previous couple of days I had heard a few people mention the lack of metadata provided when given content to store, alongside the lack of willingness of providers to change. My idea was to create a web service that would take any file that you wanted to throw at it and provide as much metadata as it could glean from the file in a useable form. There are plenty of tools around that will extract the embedded metadata in a file, the Apache Tika project (http://tika.apache.org/) being one of the more comprehensive ones, and my application was basically a front end for this.

The added value that I provided was to return the metadata in Dublin Core. This meant that this web service could be integrated into a repository workflow with very little effort. My plan was to expand the number of metadata schemas available to make it easier for the repository to incorporate the output directly but I sadly ran out of time. One thing that became clear while testing my code was that often the quality of the embedded metadata was poor. After discussing my project with Chris Gutteridge I decided that mining the document for relevant information would give richer metadata but require a lot more time to produce anything even remotely functional.

In the end I spent around 3 hours on my entry but I was proud that I had something that not only worked but didn’t fail horribly when I demoed it live to a roomful of people.

Summary

I enjoyed my first Repository Fringe immensely. I got a huge amount out of it both in terms of learning and networking. I plan to attend next year and hopefully find a couple more hours to work on my developer challenge entry.

In December I took part in the Will’s World hack organised by EDINA. I was involved with the collection of the data and so had a pretty good idea of what the registry contained but it was still challenging to come up with a novel way of exploiting this information. My first thought was to process the marked-up play texts and add links to any relevant content in the registry. This way I’d be using the two main sources of information provided by the project. After the first check-in I decided that there were a lot of people using the marked-up plays and so I started thinking about different ways of visualising the data in the registry. It’s easy to visualise a play because it relates to the way we live our lives, but a store of millions of pieces of information is less familiar.

A Timeline Visualisation

After a quick search of the internet I had found a number of timelines but very few were properly interactive. In fact the only one that I found that was interactive was this one. This timeline contained all of the key points in Shakespeare’s life but it draws from a static list of events. I wanted to create something that brought in diverse information from numerous sources and displayed it in a way that people could relate to and use to explore more of the content.

Trying to display the entire contents of the registry on a timeline would result in a very cluttered display and sluggish page load so I decided to limit the results to three collections. The three collections I chose were Open Library, Culture Grid and National Library of Scotland. This still gave me around 500,000 results to work with so I limited the results further and split them up into 50-year sections. This gave me 11 separate timelines, each of which had a sensible amount of content. The example shown below is of a record from the Open Library collection showing a publication of part of Shakespeare’s play Henry VI. Clicking on the image will take you to the specific entry on the Open Library’s site.

To display the events I used the JavaScript applet TimeLineJS. This applet will take a number of different data formats but the simplest for my purposes was the JSON handler. The handler passes the data through a number of filters and, if an external URL is provided, determines the best way to handle the content. There are a number of filters already provided that handle information from sites such as Twitter and Google Maps although it is possible to add custom filters. The filters I added processed the information from the JSON input but also checked the relevant website for a thumbnail image that could be displayed. In a lot of cases a default thumbnail is displayed for a lot of content which serves to mark the event, but can become boring and repetitive. I wanted to give the user a better idea of what the event was about and having a thumbnail helped to do that. In some cases no thumbnails were available and so I had to fall back to a default image.

All in all my experience of the hack was very positive. I’m a big fan of metadata in general, and aggregated metadata specifically, so getting the chance to play with data like this was great. It is always interesting to see what ideas other people have when presented with the same data. In this hack we had a range of levels from those that dealt with the very nature of the data and how it can be arranged up to projects like mine which concentrated on how the data can be presented in a useful and appealling way. It was also good experience to play around with a JavaScript applet. I’ve not had an opportunity to write much JavaScript and the way TimeLineJS is written made it pretty straightforward to extend the original code.

EDINA Blogs

A Blogs.edina.ac.uk weblog

Author Archives: Richard Wincewicz

My First Repository Fringe

Shakespeare Through The Ages

A Timeline Visualisation