Internet Librarian International 2014 (London Olympia, 21-22nd October 2014)

I attended the Internet Librarian International (ILI) 2014 Conference at Olympia  a couple of weeks ago and found the opportunity to talk about our experience of transforming SUNCAT, learning about the latest library trends and generally meeting follow Librarians very useful.

My presentation is available on the SUNCAT website but unfortunately the other presentations from the conference are password protected.

After mistakenly trying to join some of the numerous other conferences taking place at Olympia at the same time some of the highlights for me included:

The opening keynote from Michael Edson of the Smithsonian who talked about the dark matter of the Internet – the huge amount of cultural activity on the Internet which is valuable but difficult to capture so is not well covered by our cultural institutions. He made particular reference to the Vlog Brothers who you can check out at http://www.youtube.com/user/vlogbrothers

A number of the speakers spoke extensively about or at least touched upon the changing and new roles available for Librarians with the advent of new trends and technologies. Developments in publishing, open access, open source, mobile apps and research data management were highlighted by Brian Kelly (CETIS) as key findings from the 2014 NMC Horizon Report for Libraries.

Suzanne Enright from the University of Westminster described how they used Agile Methodology to develop a Virtual Research Environment, while Mary Antonesa from Maynooth University Library presented on the development of a simple directional app to assist users find locations and items.

Ben Showers Head of Scholarly and Library Futures at Jisc encouraged us to follow three principles when collecting and measuring metrics:

  • Principle 1: Measure what really matters, not just what you can get data for
  • Principle 2: Don’t collect or measure if you are not going to act on it.
  • Principle 3: Make as much of your data available as possible.

And finally of great interest was the presentation around engaging users in the tender exercise for a new LMS and discovery tool at the Open University. This included setting up a user panel, interview and observation sessions and creating wireframe prototypes to gather initial feedback. Sodertorn University in Sweden also conducted similar exercises with users and discovered the importance of:

  • Relevance ranking
  • Terminology is vital – we should avoid using too much library lingo in discovery system
  • Facets should be highlighted so that users don’t overlook them

All very helpful as we continue to develop the SUNCAT service…

SUNCAT features at Interlend 2014

I thoroughly enjoyed the recent Interlend 2014 conference at the Carlton Highland Hotel in Edinburgh. Interlend is the annual conference for the Forum for Interlending and Information Delivery (FIL), which is an organisation for those involved in interlending and document supply, enabling them to exchange ideas and views and also to raise the profile of this area of work nationally and internationally.

This year’s conference took place on EDINA’s home turf in Edinburgh and featured an excellent range of talks focusing on marketing interlending services, developments to systems supporting interlending and case studies of evolving interlending services in practice. My highlights would have to include:

Anthony Brewerton, Head of Academic Services at the University of Warwick, who kicked off the conference with a lively and engaging tour of the key concepts to be considered when marketing and branding a library service. This included the ladder of loyalty – developing relationships with your customers, until they become advocates of, then champions of and finally partners in developing your service.

Ann Lees and Stephen Winch from NHS Education for Scotland Knowledge Services Group (NES KSG) recounted the trials of dealing with a “no copying� policy across NHS Scotland (NHSS), following the Scottish Government’s decision several years ago not to renew the then existing CLA licence. To compensate, a service was set up to provide copyright fee paid copies of material via the British Library. In order to streamline this process NES KSG utilised the British Library’s API to enable NHSS users to make requests via the Knowledge Network search platform. Users can run a search on the Knowledge Network and if no full text is available to them a link to login to the new Document Delivery service is displayed. The user is asked to fill in details about the reason for the request, preferred delivery option and then the order is placed via the British Library DDS API. NHSS librarians also receive email copies of the requests and go into the system to approve them. The system went live earlier this year and usage is gradually taking off. However, since June this year a revised CLA licence has been signed so restricted copying is now also available within NHSS.

I feared that a presentation on copyright could be rather dry but Emily Stannard, the Copyright & Compliance Officer from the University of Reading gave an engaging and informative update on the current status of key copyright developments in the UK, particularly the copyright exceptions which came into force at the start of June 2014. These include:

Supplying single copies of published works to (non-profit) libraries and to library users. No contract or individual licence can override this exception, which could have implications for those libraries looking to fulfil ILL requests via copies of articles from e-journals. Potentially libraries would not need to check individual licences before supplying copies. Emily advised us to keep our eyes peeled for more information on this topic.

Other exceptions include:

  • Preservation copying covers all works
  • No requirement for paper copyright declarations, an online declaration with checkbox or digital signature is now sufficient
  • Libraries can copy all types of work for persons doing non-commercial research/private study
  • Text and data mining for non-commercial purposes
  • Accessible copies for disabled people
  • Making works available on dedicated terminals (providing there is no contract saying you can’t)

Marjory Lobban’s (Document Delivery Supervisor at the University of Edinburgh) review of interlending at the University of Edinburgh was set against the backdrop of the changing environment the library is operating within the University, with more online courses, more distance learners, more students overall and reduced library sites.

Following a downward trend in ILL requests from the late 1990s to early 2000s with the emergence of e-journals, figures started to level out again when the University started using WorldShare in 2007 and started to increase in 2010 when the University started using Iliad leading to more exposure to overseas libraries accompanied by a move to online requesting, which streamlines the process for users and ILL staff. An increasing number of supplies to the University are coming from overseas libraries so ILL requests are now often sent straight overseas rather than to the British Library or other UK libraries. Lending to overseas is also increasing.

Future plans include looking at pay per view options where full text isn’t immediately available to the user. Purchasing items if cheaper than the interlending option and rebranding the ILL service.

I also gave a presentation focusing on the new SUNCAT service, including:

  • Background and context to the recent redevelopment
  • Highlighting the key features which can be found on the new service
  • Describing how SUNCAT can assist end users, library staff and in particular ILL staff
  • A live demo of the new service
  • An update on future plans for the service

Attending Interlend 2014 not only let me introduce the new SUNCAT interface to one of our valued user groups, but also helped to give me more information on what is happening and some key priorities in the world of interlending, all very helpful as we consider how to continue to develop the SUNCAT service.

The presentations for all the sessions will soon be available on the FIL website.

Trading Consequences at the Geospatial in the Cultural Heritage Domain Event

Earlier this month Claire Grover, one of the Trading Consequences team based at University of Edinburgh Schools of Informatics, gave a presentation on the project at the JISC GECO Geospatial in the Cultural Heritage Domain event in London.

The presentation gives a broad overview of the Trading Consequences project and the initial text mining work that is currently taking place. The slides are now up on SlideShare and the audio recording of Claire’s talk will also be available here shortly:

You can also read a liveblog of all of the talks, including Claire’s, over on the JISC GECO blog.

Share

Chalice at WhereCamp

I was lucky enough to get to WhereCamp UK last Friday/Saturday, mainly because Jo couldn’t make it. I’ve never been to one of these unconferences before but was impressed by the friendly, anything-goes atmosphere, and emboldened to give an impromtu talk about CHALICE. I explained the project setup, its goals and some of the issues encountered, at least as I see them –

  • the URI minting question
  • the appropriateness (or lack of it) of only having points to represent regions instead of polygons
  • the scope for extending the nascent historical gazetteer we’re building and connecting it to others
  • how the results might be useful for future projects.

I was particularly looking for feedback on the last two points: ideas on how best to grow the historical gazetteer and who has good data or sources that should be included if and when we get funding for a wider project to carry on from CHALICE’s beginnings; and secondly, ideas about good use cases to show why it’s a good idea to do that.

We had a good discussion, with a supportive and interested audience. I didn’t manage to make very good notes, alas. Here’s a flavour of the discussion areas:

  • dealing with variant spellings in old texts – someone pointed out that the sound of a name tends to be preserved even though the spelling evolves, and maybe that can be exploited;
  • using crowd-sourcing to correct errors from the automatic processes, plus to gather further info on variant names;
  • copyright and IPR, and the fact that being out of print copyright doesn’t mean there won’t be issue around digital copyright in the scanned page images;
  • whether or not it would be possible – in a later project – to do useful things with the field names from EPNS;
  • the idea of parsing out the etymological references from EPNS, to build a database of derivations and sources;
  • using the gazetteer to link back to the scanned EPNS pages, to assist an online search application.

Plenty of use cases were suggested, and here are some that I remember, plus ideas about related projects that it might be good to tie up with:

  • a good gazetteer would aid research into the location of places that no longer exist, eg from Domesday period – if you can locate historical placenames mentioned in the same text you can start narrowing down the likely area for the mystery places;
  • the library world is likely to be very interested in good historical gazetteers, a case mentioned being the Alexandria Library project sponsored by the Library of Congress amongst others;
  • there are overlaps and ideas to share with similar historical placename projects like Pleiades, Hestia and GAP (Google Ancient Places).

I mentioned that, being based in Edinburgh, we’re particularly keen to include Scottish historical placenames. There are quite a few sources and people who have been working for ages in this area – that’s probably one of the next things to take forward, to see if we can tie up with some of the existing experts for mutual benefit.

There were loads of other interesting presentations and talk at WhereCamp… but this post is already too long.

Connecting archives with linked geodata – Part I

This is the first half of the talk I gave at FOSS4G 2010 covering the Chalice project and the Unlock services. Part ii to follow shortly….

My starting talk title, written in a rush, was “Georeferencing archives with Linked Open Geodata” – too many geos; though perhaps they cancel one another out, and just leave *stuff*.

In one sense this talk is just about place-name text mining. Haven’t we seen all this before? Didn’t Schuyler talk about Gutenkarte (extracting place-names from classical texts and exploring them using a map) in like, 2005, at OSGIS before it was FOSS4G? Didn’t Metacarta build a multi-million business on this stuff and succeed in getting bought out by Nokia? Didn’t Yahoo! do good-enough gazetteer search and place-name text mining with Placemaker? Weren’t *you*, Jo, talking about Linked Data models of place-names and relations between them in 2003? If you’re still talking about this, why do you still expect anyone to listen?

What’s different now? One word: recursion. Another word: potentiality. Two more words: more people.

Before i get too distracted, i want to talk about a couple of specific projects that i’m organising.

One of them is called Chalice, which stands for Connecting Historical Authorities with Linked Data, Contexts, and Entities. Chalice is a text-mining project, using a pipeline of Natural Language Processing and data munging techniques to take some semi-structured text and turn the core of it into data that can be linked to other data.

The target is a beautiful production called the English Place Name Survey. This is a definitive-as-possible guide to place-names in England, their origins, the names by which things were known, going back through a thousand years of documentary evidence, reflecting at least 1500 years of the movement of people and things around the geography of England. There are 82 volumes of the English Place Name Survey, which started in 1925, and is still being written (and once its finished, new generations of editors will go back to the beginning, and fill in more missing pieces).

Place-name scholars amaze me. Just by looking at words and thinking about breaking down their meanings, place-name scholars can tell you about drainage patterns, changes in the order of political society, why people were doing what they were doing, where. The evidence contained in place-names helps us cross the gap between the archaeological and the digital.

So we’re text mining EPNS and publishing the core (the place-name, the date of the source from which the name comes, a reference to the source, references to earlier and later names for “the same place”). But why? Partly because the subject matter, the *stuff*, is so very fascinating. Partly to make other, future historic text mining projects much more successful, to get a better yield of data from text, using the one to make more sense of the other. Partly just to make links to other *stuff*.

In newer volumes the “major names”, i.e. the contemporary names (or the last documented name for places that have become forgotten) have neat grid references, point-based, thus they come geocoded. The earliest works have no such helpful metadata. But we have the technology; we can infer it. Place-name text mining, as my collaborators at the Language Technology Group in the School of Informatics in Edinburgh would have it, is a two-phase process. First phase is “geo-tagging”, the extraction of the place-names themselves; using techniques that are either rule-based (“glorified regular expressions”) or machine-learning based (“neural networks” for pattern cognition, like spam filters, that need a decent volume of training data).

Second phase is “geo-resolution”; given a set of place-names and relations between them, figuring out where they are. The assumption is that places cluster together in space similarly as they do in words, and on the whole that works out better than other assumptions. As far as i can see, the state of the research art in Geographic Information Retrieval is still fairly limited to point-based data, projections onto a Cartesian plane. This is partly about data availability, in the sense of access to data (lots of research projects use geonames data for its global coverage, open license, and linked data connectivity). It’s partly about data availability in the sense of access to thinking. Place-name gazetteers look point-based, because the place-name on a flat map begins at a point on a cartesian plane. (So many place-name gazetteers are derived visually from the location of strings of text on maps; they are for searching maps, not for searching *stuff*)

So next steps seem to involve

  • dissolving the difference between narrative, and data-driven, representations of the same thing
  • inferring things from mereological relations (containment-by, containment-of) rather than sequential or planar relationsOn the former – data are documents, documents are data.

On the latter, this helps explain why i am still talking about this, because it’s still all about access to data. Amazing things, that i barely expected to see so quickly, have happened since i started along this path 8 years ago. We now have a significant amount of UK national mapping data available on properly open terms, enough to do 90% of things. OpenStreetmap is complete enough to base serious commercial activity on; Mapquest is investing itself in supporting and exploiting OSM. Ordnance Survey Open Data combines to add a lot of as yet hardly tapped potential…

Read more, if you like, in Connecting archives with linked geodata – Part II which covers the use of and plans for the Unlock service hosted at the EDINA data centre in Edinburgh.

Chalice poster from AHM 2010

Chalice had a poster presentation at All Hands Meeting in Cardiff, the poster session was an evening over drinks in the National Museum of Wales, and all very pleasant.

Chalice poster

View the poster on scribd and download if from there if you like, be aware the full size version is rather large.

I’ve found the poster very useful; projected it instead of presentation slides while I talked at FOSS4G and at the Place-Names workshop in Nottingham on September 3rd.

Posters and presentations

Happy to have had CHALICE accepted as a poster presentation for the e-Science All Hands Meeting in Cardiff this September. It will be good to have a glossy poster. Pleased to have been accepted at all, as the abstract was rather scrappy and last-minute. I had a chance to revise it, and have archived the PDF abstract.

I’m also doing a talk on CHALICE, related work and future dreams, at the FOSS4G 2010 conference in Barcelona a few days earlier. Going to be a good September, hopes.