Day Three of the Will’s World Hack

Will’s World Hack is in its third day now and we are hoping to hear more from potential hacks as the weekend approaches and people can leave work behind for some hack time fun!

We held two drop-in sessions today on Google+ Hangout and as usual, they are now available for viewing on YouTube, Google+ and linked in our daily update post at:

In the first session (1pm GMT), we highlighted the addition of rules on the Prize & Rules wiki page. There is nothing new or unusual in these but they do formalise what have been discussing earlier on. In particular, we would like people to register their hack or hack idea, preferably by Sunday evening and be ready to present live at the closing session or provide a video ahead of the time. Let us know if you have any queries.

You can watch the first check in session here:


Our 5pm Check in saw Jeffrey Kerzner join us from the US.  Jeff has been looking at the Shakespeare data and how it might be accessed and analysed. He would love to hear from any academics about the real academic aims, the scholarly questions they want to ask of this data. Please do take a look at this video from the session and leave us a comment here, email us (willsworldhack@gmail.com) or contact Jeff directly via his details on the wiki if you’d like to get in touch with him.

In this check in our developer Neil also discussed various improvements and updates – including improvements to the Solr search that may require some to make small tweaks to already-developed code – that are being made to the Will’s World Registry tonight ready for your further hacking delight over the weekend.


In other news a tweet and email highlighted a great new Shakespeare resource from the Folger Library which provides the text for the plays rigorously encoded with every word, every punctuation mark, every space, within a sophisticated TEI-compliant XML structure, and you can download them!

Also released today were the new Google+ “Communities” – these are groups that allow you to share updates and discussion with other participants so do join ours and start chatting!

We have also had updates from various hackers so here’s what’s happening so far:

  • Jeffrey is looking for academic collaborators to work with – he’s got the coding skills so bring him the academic questions to answer!
  • Richard has been looking at URLs for Shakespeare characters (LCSH has been suggested) and has been considering the best approach for this – it will fit with the Linked Data texts he is working on and he would appreciate any advice or collaborators for this work.
  • Owen has been developing his ShakespearePress WordPress Plugin and has shared his code on Github: https://github.com/ostephens/shakespearepress. Owen would love to hear from potential collaborators so again do leave comments here, email us, or get directly in touch with Owen via the Participants Wiki page.

We will be supporting your hacks and working on a few of our own over the weekend so we shall see you online very soon and will provide another daily update here on the blog tomorrow.

The next check ins will be at 5pm on Saturday (8th December) and 1pm on Sunday (9th December) when the team will be available to answer your questions and you can continue to meet and work with each other.

And finally… a quick reminder of our last (but not least) prize category: Best set up for the hack. Send us picture of the environment you are working in and we will showcase the best on our Pinterest page with a prize for the very best!

Share

Day Two of Will’s World Online Hack

The Will’s World Online Hack has now been going for over 24 hours. While most traditional hacks would be over by now. Will’s World just started! And we are very pleased to see that the conversation is starting to flow and connections are being made.

We held our daily session on Google+ Hangout which was streamed live on YouTube and recorded. The  videos for all sessions are made available on our YouTube channel and Google+ page, so feel free to catch up on any session you may have missed.

Click here to view the embedded video.

Today’session brought a few questions about the Registry:

  • How do you find the relevance of records in the Registry? It is not always obvious why some records are returned as results for a specific search.
  • There is currently no faceted search available for the Registry which would help with the exploration of the content and getting a feel for the data. This might help with undertanding the relevance of some records.
  • The service directory is not yet on to the Registry website.

The Registry is still under construction and we are looking into improving these. In addition, not all data available has been loaded in the Registry.  We are hoping to address this within the next day (or two).

Connecting

Although, some participant have reported that it was not obvious how to join a Google+ Hangout. We are delighted to see that communication betwen participants, and between participants and the project team, has started to happen on Twitter @WillsWorldHack #willhack and IRC. These conversations yielded some useful questions about the data which have been added to FAQs on the wiki at: http://willsworld.edina.ac.uk/wiki/index.php/The_Data#Data_FAQs.

  • Track discussion around the Will’s World Hack with our #willhack Storify gathering tweets, videos, etc.

Two days, Two project ideas

We are delighted to have two ideas for project already listed on the Current Hacks page and hoping for more to be listed there in the coming days.

One of our hacker, Owen Stephens, has also blogged about his progress to date in this “To scrape or not to scrape” post about creating a queriable/api form of perfomance cast lists using scraper wiki for the thing he has built.

Share

Day One of the Will’s World Hack

In readiness for the beginning of the Will’s World Hack we were delighted to launch the Will’s World Registry this morning. The Registry, the development of which we have been charting on this blog, includes the metadata from our fantastic project partners, information on the schema and mappings, and related resources including XML versions of all of Shakespeare’s plays. During the hack we will be continuing to finesse the Registry and we’d really welcome your comments and feedback on the version we have launched today.

Then, this afternoon (at 1pm GMT) saw the launch of the Will’s World Hack (#willhack) via a live Google+ chat streaming to YouTube!

We introduced ourselves and the Will’s World project as well as saying a bit about what we hope may come out of the hack (Will’s World Hack Event Presentation).  It was brilliant to have four of our twenty or so registered hack participants joining us for this live portion of the event and we hope many more of you will be dropping into live sessions later in the week or able to catch up on YouTube. Today’s launch session can be viewed here:


We also had a check in session this afternoon and we got to hear about the first hack taking place: turning the Shakespeare plays we made available as XML for the hack into a Linked Data database for use in others hacks. Richard, who is working on this, would appreciate any pointers to existing ontologies for Shakespeare plays (or plays in general) so, if you have any suggestions, please leave a comment here or email the team. You can also find out more about that on our Current Hacks page on the Will’s World Hack Wiki – and you can add details of your own ideas, hacks and hack teams to the page while you’re there!

We plan to post a summary of the Hack each day so keep an eye here on the blog for updates all week of the Will’s World Hack. You can also join the conversation on Twitter with the #willhack hashtag.

And finally…

Share

Register for the Will’s World Online Hack

The last week has been very exciting for the Will’s World team as we have been busy preparing for the Online Hack event taking place next week.  By the way, there are still a few days left to register if you fancy joining us!

Taking the stage

I made my YouTube stage debut in this video introducing the Will’s World project and the motivation behind the online format of the hackathon.

Click here to view the embedded video.

My colleague Neil got into the creative spirit of the hack for this short video presenting the metadata that will be available in the Shakespeare Registry.

Click here to view the embedded video.

Data sources

The Shakespeare Registry will include metadata from:

With additional sources of data listed and some hack participants bringing their own data to add to ours, this will truely ensure we have some rich data to work with!

Goodies

All the items for our goodie bags have now arrived and I’ve been enjoying putting the packs together. I will be sending them soon to the lucky 18 people already registered. We are hoping that these goodie bags will support your creativity and provide a little bit of fun. Here’s a sneaky peek at the smallest item!

Find out more about Will’s World Online Hack at http://willsworld.edina.ac.uk/wiki/.

Share

Join Will’s World Online Hack 5-12 Dec 2012

Are you interested in Shakespeare? Are you tempted to take part in a hackathon or know someone else who might be? Do you have a great idea for a new app? Do you want to mash your own data with ours? Then get involved in the Will’s World Online Hack! We are pleased to announce that registration for this event is now opened at http://willsworld.edina.ac.uk/wiki.

This hackathon aims first to promote innovative use of the Shakespeare metadata registry built by the Will’s World project to hold metadata describing online digital resources relating Shakespeare,but also to explore an online format for hackathon.

How does an online hack work?

Well, like a traditional hackathon, technical and creative people with different expertise like software developers, graphic designers, domain experts and project managers, get together and collaborate to develop applications and explore concepts. But instead of getting physically together in one location, social media technologies are used to communicate and collaborate online. We used your very useful and positive feedback to our online survey to plan this event.

The event will take place over a week:

  • Opening session on Wednesday, 5th December, 1pm (BST):
    This will be a live and interactive session to present the data, the goal of the event, prize categories, the set of social media tools and technologies to be used during the hack and the Will’s World project itself. Participants will be able to introduce themselves and put forward ideas.
  • Hack, 24 hours spread over 6 days:
    The participants will have six days to form teams, familiarise themselves with the data and code. Participants are free to organise when they spent their 24 hack hours over these six days. They will have the flexibility to work when it suits them. Teams can set their own schedule either for members to work concurrently or consecutively. The Will’s World project team will be on hand throughout to answer any questions and regular interactive drop-in sessions are planned.
  •  Closing session on wednesday, 12th December, 1pm (BST):
    This will be a live and interactive session where each team will present their hack either live or as a pre-recorded video and prizes will be awarded.

We are hoping to capture as much as possible of the communication taking place. In particular, the opening and closing sessions will be videoed. All recordings will be shared on the event wiki or the blog.

The use of technology and social media is at the core of this online hack. We will be using a wiki to act as hub to support communication, before, during and after the event. Mailing lists, Google+ hangouts, YouTube, Skype, Twitter, IRC, Github and Dropbox will all help the communication and creativity flow. You will find more information about the event, the data, the technologies and how to take part here.

Register now and receive a goodie bag!

If you fancy taking part in this exciting event and be one of the first pioneering online hackers then please register on the Will’s World Online Hack wiki. Participation is free and the first 50 participants to register will receive a goodie bag!

Share

Will’s World online hack survey results: Your Views!

Over the last three weeks we have been drumming up interest for our idea of an online hack event. This twist on the traditional “in person” format has exciting potential to be more flexible and make great use of social media. It seemed like a very attractive idea to us but, we wondered, what did you think?

We drew up a short survey (15 questions) to capture your views, feedback and any experiences that would help us plan a great online hack. We spread the word through this blog, twitter, mailing lists, websites and asked other to do the same.

To date (the survey is still open) we have received 30 replies to the survey and many direct emails with further input. So a BIG THANK YOU to all! We are delighted that you found time to make this hugely valued contribution and we thought that the least we can do is share here what you told us.

A Good idea

In answer to that core question we found that 84% of our survey respondents felt that the online hack was a good idea, of whom: 57% of respondents thought that an online hack was a good idea and would be interested in taking part; a further 27% of respondents felt it was a good idea but they were not sure how it would work.

  • 75% of respondents had attended hack events in the past, and interestingly 3 have already taken part in an online hack.
  • It is very encouraging to see that most people are supportive of the online format – only 10% would prefer an in-person event. Only one person doesn’t think it will work and another said they wouldn’t be interested in taking part.
  • Significantly, all three experienced online hackers think it’s a good idea with two of them definitely interested in participating is this hack – this is really encouraging!

Timing – it’s all relative…

Opinions are divided over what format might work best. This is not surprising since most of our respondents had not been to an online hack before so were being asked to speculate on what might work. However, close to half of those who responded favour a week-long drop-in format. Others were split between weekend and weekday days – we had lots of conflicting comments about availability here.

We didn’t ask you where you were based – although we would if we did this again – but from your experience and email addresses we know we have respondents from both sides of the Atlantic which further encourages us that any possible timings and format needs to support an international hack attendance as elegantly as possible.

Participation

We were really pleased to see that you weren’t just being lovely in sharing your views, you were also really up for participating!
  • 50% of respondents said they are definitely interested in taking part in this hack, with an additional 30% a “maybe”, and several others interested but unable to attend on the specific suggested dates in December.
  • A significant number of people (52%) indicated that they may be able to bring additional data to the hack. However, most note that it would depend on having enough time to prepare it and/or obtain approval for sharing the data.

Social Media Technologies

Social media tools are essential in supporting the communication required by an online hack. Many applications are popular and received support from the participants of the survey, as seen in the graph below:

Knowing what tools you already use means we now feel well informed to choose the right combination of social media and web technologies that will ensure you feel comfortable and familiar with the tools and work for the functionality we think we would need.

We need you… but what do you need?

We also asked you what you might need to be able to take part in an online hack. The main requests were for:

as much data as possible


information on the data available ahead of the event


easy access to the data


access to the APIs ahead of the event

We can definitely see from these responses and our word cloud for this question (below) that the data is crucial!

Team-building and help with communication tools ahead of the event were also highlighted. The importance of time, pizza, publicity, prizes and a greater technology know-how were also mentioned!

Participants

So who are you all?

  • 50% of  respondents work in the Higher Education section. A further 18% are freelance and 11% work in the private sector.
  • Participants were from highly varied background, with different expertise and interests: from experienced developers, to artists, designers, managers, engineers, teachers, students, librarians – the only common characteristics seemed to be a passion for hacking, for Shakespeare or for both!
  • 92% shared their email with us to be kept informed on the developments of our online hack event – thank you! We will be in touch with you soon!

If you’re not one of those who responded but would like to stay up to date on the hack event please either fill in the survey now or drop an email to edina@ed.ac.uk with “Wills World” in the subject line.

I Love Shakespeare

You have shared with us your wishes for playing with data, engaging with communication tools, supporting learning and producing creative material. You have encouraged us in our ambitious vision but warned us of the difficulties too.

Most of all, the word cloud for our additional comments section seems to indicate that you simply love Shakespeare!

Will’s World Online Hack is Coming Soon!

Following the positive responses we have received, we have looked further into the practicalities of organising an online hack event and are delighted to let you know that we will be going ahead with the event in early December! Further details and the official announcement will be out very soon… Watch this space!

Share

Online Hack Event

Our Will’s World project is soon coming to an end. While we are busy populating the Shakespeare Registry with great data, we are keen to put this wonderful resource to good use. How can we achieve this in an unusual and fun way?

How about an online hack event!

The idea first came to us as we began planning a traditional hack event – something in person, overnight, possibly featuring pizza. As we started looking at how this would work best we realised there were lots of logistical issues to deal with from finding a suitable venue, to catering to the issue of how participants could find the time to travel and take part.

We also started to think about things that don’t always work so well in in-person hack days… sometimes the software you want to use is sitting on a machine you don’t have with you, sometimes you need to attend to caring responsibilities which just don’t fit into a 24 hour marathon lockdown, and sometimes you just can’t move that meeting or spare the travel time to fly miles away to make that fantastic looking hack day somewhere at the other end of the country.

And that’s when we realised that an online hack event might not only resolve our logistical issues but that the flexibility and potential benefits of an online hack event are also very exciting!

More people can participate:

  • Participants who wouldn’t be able to travel (whether because of the cost, time, distance, or scheduling conflicts) can easily join in.
  • The event can be scheduled to allow participants from further afield and different time zones to be included turning a local event into a global hack event.
  • There is no restriction on the number of participants due to the venue size or cost.

Enable use of familiar tools:

  • Participants can use their own machine, familiar set up and well-loved applications in their own environment. There is no need to get used to a different technical environment or to first install the tools you can’t do without. That means more time to be creative, to hack, play and collaborate.
  • Wifi and cable internet connections are also a lot more likely to be fast and reliable – at least something you are used to managing – if they are not being shared intensely by a room full of coders!
Keyboard disassembled and planted with cress

“Prepared keyboard waiting to sprout” by Flickr user wetwebwork

Flexibility in participation:

  • Participants can choose when and how much efforts they put in to the hack.
  • Participants can fit their participation around their other commitments.

Promote use of social media technologies:

  • Many collaboration tools can be used to run the event and connect people: blog, Twitter, wiki, Skype, Google Hangout, videos, websites…
  • Participants or those who find out about the event later can still share in the event with records of the hacks more easily captured via video, wikis, text chats etc.

Obviously,  the critical issue will be to ensure that communication takes place effectively during the hack between people scattered in various locations. Team formation will be interesting – but no weirder than grabbing a coke or a beer and introducing yourself around a room of fellow hackers. And we know that  interruptions could be tricky for some participants since they will be occupied with the Will’s World hack from their normal office or home desk. We can see challenges here but we think the benefits could make this a great format for our hack event.

But we want to know what YOU think of the idea…

We have come up with a few scenarios on how this type of online hack could work. We would really appreciate your help in evaluating different formats and communication tools for this event. Please take a moment to tells us what you think and provide us with your feedback by taking this short survey:

https://www.survey.ed.ac.uk/willsworldhack/

Please do feel free to share that survey link with others you think might be interested in this event. We also welcome any comments here as well.

Shakespeare's Globe Theater, Southwark, London

Image based on “Shakespeare’s Globe Theater, Southwark, London” by Flickr User nikoretro/Sheri

One last thing, we are aiming for the online hack to take place during the first week of December. Put it in your diary!

Share

SPARQL – What’s up with that?

The title of this post is intended to convey a touch of bewilderment through use of a phrase from the Cliff Clavin school of observational comedy.

Linked data and SPARQL

In the linked data world, SPARQL (SPARQL Protocol and RDF Query Language) is touted as the preferred method for querying structured RDF data. In recent years several high profile institutions have worked very hard to structure and transform their data into appropriate formats for linked data discovery and sharing, and as part of this, many have produced RDF triple (or quadruple) stores accessible via SPARQL endpoints – usually a web interface where anyone can type and run a SPARQL query in order to retrieve some of that rich linked data goodness.

This is admirable, but I have to admit to having had little success getting something out of SPARQL endpoints that I would consider useful. Every time I try to use a SPARQL facility I find I do better by scraping data from search results in the main interface. I have also increasingly become aware that I am not the only one to find it difficult.

RDF stores are different to relational databases; they are not so amenable to performing a search over the values on a particular field. Nor are they as flexible as text search databases like Solr. Instead they record facts relating entities to other entities. So it is important that as consumers of the data we know what kind of questions make sense and how to ask them in a way that yields useful results for us and does not strain the SPARQL endpoint unduly. If these are not the kind of questions we want to ask then we might need to question the application of SPARQL as the de facto way of accessing RDF triple stores.

I’d like to point out that my aim here is not to complain or to disparage SPARQL in general or anybody’s data in particular; I think it is fantastic so many institutions with large archives are making efforts to open up their data in ways that are considered best practice for the web, and for good reasons. However if SPARQL endpoints turn out to be flawed or inadequately realised, they will not get used and both the opportunity to use the data, and the work to produce it, will be wasted.

Problems with SPARQL endpoints

These are the problems I have commonly experienced:

  • No documentation of available vocabularies.
  • No example queries.
  • No access to unique identifiers so we can search for something specific.
  • Slowness and timeouts due to writing inefficient queries (usually without using unique ids or URIs).
  • Limits on the number of records which can be returned (due to performance limits).

Paraphrasing Juliette Culver’s list of SPARQL Stumbling Blocks on the Pelagios blog, here are some of the problems she experienced:

  • No query examples for the given endpoint.
  • No summary of the data or the ontologies used to represent it.
  • Limited results or query timeouts.
  • SPARQL endpoints are not optimised for full-text searching or keyword search.
  • No link from records in the main interface to the RDF/JSON for the record. (This is mentioned in relation to the British Museum, who provide a very useful search interface to their collection, but don’t appear to link it to the structured data formats available through their SPARQL endpoint.)

Clearly we have experienced similar issues. Note that some of these are due to the nature of RDF and SPARQL, and require a reconception of how to find information. Others are instances of unhelpful presentation; SPARQL endpoints are generally pretty opaque, but this can be alleviated by providing more documentation. With the amount of work it takes to prepare the data, I am surprised by how few providers accompany their endpoints with a clear list of the ontologies they use to represent their data, and at least a handful of example queries. This takes a few minutes but is invaluable to anybody attempting to use the service.

Nature provide the best example I have seen of a SPARQL endpoint, providing a rich set of meaningful example queries. Note also the use of AJAX for (minimal) feedback while query is running, and to keep the query on the results page.

Confusion about Linked Data

A blog post by Andrew Beeken of the JISC CLOCK project reports dissatisfaction with SPARQL endpoints and linked data, and provoked responses from other users of linked data:

“What is simple in SQL is complex in SPARQL (or at least what I wanted to do was) … You see an announcement about Linked Data and don’t know whether to expect a SPARQL endpoint, or lots of embedded RDF.” Chris Keene

“SPARQL seems most useful for our use context as a tool to describe an entity rather than as a means of discovery.” Ed Chamberlain

Chris’ point gives another perspective on linked data in general – what does it mean to provide linked (or should that be linkable) data, and how do we use it? Embedded RDF (RDFa) is good in that it tends to provide structured data in context, enriching a term in a webpage in a way that is invisible by default but that people can consume if they choose to. Ed indicates a fact about RDF as a data storage medium: it is a method of representing facts about entities which are named in an esoteric way; it is not structured in a way that is ideal for the freer keyword searching or SQL-style queries that we are used to.

Owen Wilson suggests the Linked Data Book‘s section 6.3 which describes approaches to consuming linked data, describing three architectural patterns. It looks worth a read to get one thinking about linked data in the right way.

Unique identifiers

“My native English, now I must forego” Richard II, Act 1, Scene 3

One of the tenets of linked data is that each object has a unique identifier. If we are looking for “William Shakespeare” we must use the URI or other identifier that represents him in the given scheme, rather than using the string “William Shakespeare”. It is thus also necessary that we have an easy way to access the unique identifiers that are used in the data, so that we can ask questions about a specific entity without forming a fuzzy, complex and resource-consuming query. The British Museum publicises its controlled terms, that is the limited vocabulary that they use in describing their collection, along with authority files which provide the canonical versions of terms and names, standardised in terms of spelling, capitalisation and so on, and thesauri which map synonymous terms to their canonical equivalents. These terms are used in describing object types, place names and so on, supporting consistency in the collections data. They are all available via the page British Museum controlled terms and the BM object names thesaurus. Armed with knowledge of what words are used in particular fields to categorise or describe entities in the data, and similarly with a list of ids or canonical names for things, we can then start to form structured queries that will yield results.

Shakespeare and British Museum

I have looked in particular at the British Museum’s SPARQL endpoint as an example, as BM is a project partner and because it has several items germane to Will’s World. To start with, the endpoint gives some context; a basic template query is included in the search box, which can be run immediately and which implicitly documents all the relevant ontologies by pulling them in to define namespaces. There is a Help link with some idea of how data is represented and can be accessed/referenced using URIs. All of this is good and I found it easy to get started with the endpoint.

However before long I came up against the problem I’ve had with other endpoints, namely that it is difficult to perform a keyword search, or at least perform a multi-stage search in order to (a) resolve a unique identifier for a keyword or named thing and then (b) retrieve information about or related to that thing. In this case I found a way to achieve what I needed by supplementing my usage of the SPARQL endpoint with keyword searches of the excellent Collection database search – and with some help from the technical staff at BM to resolve a couple of mistakes in my queries, can now harvest metadata about objects related to the person “William Shakespeare”.

It is reassuring to find out I am not alone in having difficulty retrieving and using SPARQL data. I followed Owen Stephen’s blog post about the British Museum’s endpoint with interest. Owen found the CIDOC CRM data model hard to query due to its (rich, but thereby counter-intuitive) multi-level structure. Additionally, he encountered the common issue that it is very difficult to perform a search for data containing or “related to” a particular entity which to start with is represented merely by a string literal such as “William Shakespeare”:

The difficulty of exploring the British Museum data from a simple textual string became a real frustration as I explored the data – it made me realise that while the Linked Data/RDF concept of using URIs and not literals is something I understand and agree with, as people all we know is textual strings that describe things, so to make the data more immediately usable, supporting textual searches (e.g. via a solr index over the literals in the data) might be a good idea.

Admittedly RDF representations and SPARQL are not really intended to provide a “search interface” in the sense to which most users are accustomed. But from the user’s perspective, there must be an easy way to start identifying objects about which we want to ask questions, and this  tends to start with performing some kind of keyword search. It is then necessary to identify the ids representing the resulting objects or records which are of interest. With the BM data this involves mapping a database id, which can be retrieved from the object URL, to the internal id used in the collections.

So what are the right questions?

Structured data requires a structured query – fair enough. However what sort of useful or meaningful query can we formulate when the data, the schema used to represent the data, the identifiers used within the data, are all specified internally? In order to construct an access point into the data, it is helpful to have not just a common language, but a common (or at least public) identifier scheme; canonical ways of referencing the entities in the data, such as “Shakespeare” or “the Rosetta Stone”. Without knowing the appropriate URI or the exact textual form (is it “Rosetta Stone”, “The Rosetta Stone”, “the Rosetta stone”? would we get more results for “Rosetta”?) it is nigh on impossible to easily ask questions about the entity of a SPARQL endpoint.

So how is one supposed to use a SPARQL endpoint? It is not a good medium for asking general questions or performing wide-ranging searches of the data. Instead it seems like a good way to link up records from different informational silos (BM, BL, NLS, RSC…) that share a common identifier scheme. If we know the canonical name of a work (“Macbeth”) or the ISBN of a particular edition, then we can start to link up these disparate sources of data on such entities.

But the variety of translations, the plurality of editions (which will only increase) and other degrees of freedom make it hard to perform an exhaustive analysis/usage of the data. In the case of the BM, who might have unique objects we don’t know we want to see, the way to find them is through keyword search. It seems that only by going first through a search interface or other secondary resource can we identify the items we want to know about and how to refer to them.

What we have in common between different sources is the language or ontologies used to describe the schema (foaf, dc, etc) – but this is syntax rather than semantics; structure rather than content. To echo Ed Chamberlain’s comment, we have access to how data is described, but not so much to the data itself.

 

British Museum data

The approach we will use to harvest British Museum metadata related to Shakespeare is outlined below. It is essentially the same approach that Owen Stephens found workable in his post on SPARQL, and involves reference to a secondary authority (the BM collection search interface) to establish identifiers.

  1. Conduct a search for “Shakespeare” in the collections interface.
  2. Extract an object id from each result. The Rosetta Stone has the id 117631.
  3. Find the corresponding collection id from SPARQL with this query:
    SELECT * WHERE { 
       ?s <http://www.w3.org/2002/07/owl#sameAs> 
          <http://collection.britishmuseum.org/id/codex/117631> 
    }
  4. The result should be a link to metadata describing the object, and the object’s collection id (in this case YCA62958) can be extracted for use in further searches.
    http://collection.britishmuseum.org/id/object/YCA62958
  5. If there is a result, retrieve metadata about the object from the URL: http://collection.britishmuseum.org/description/object/YCA62958.rdf (or .html, .json, .xml)
  6. If there is no result, scrape metadata from the object’s description page in the collections interface. There is plenty of metadata available, but it is far less structured than RDF, being distributed through HTML.

This last step looks like it will be quite common as many of the Shakespeare-related results are portraits or book frontispieces which have no collection id. I am not sure whether this is an omission, or because they are part of another object, in which case it will require further querying to resolve the source object (if that is what we want to describe).

Another difficulty is that although Owen found a person-institution URI for Mozart, I cannot find one for Shakespeare. There is a rudimentary biography but little else, so we do not have a “Shakespeare” identifier for use in SPARQL searches.

Conclusion

Ultimately I am still finding it non-trivial and a bit hacky to identify, and ask questions about, the real Shakespeare through a SPARQL endpoint.

Click here to view the embedded video.

In summary:

  • SPARQL endpoint providers could provide more documentation and examples.
  • RDF stores allow us to ask structural questions, but semantic questions are much harder without knowing some URIs.
  • It is often necessary to make use of a secondary resource or authority in order to identify the entities we wish to ask about.

Share