Data Visualisation Talk by Martin Hawksey

Today EDINA is hosting a talk by Martin Hawksey on data visualisation. He has posted a whole blog post on this, which includes his slides, so I won’t be blogging verbatim but hoping to catch key aspects of his talk.

Martin will be talking about achievable and effective ways to visualise data. He’s starting with Jon Snow’s 1850s map of cholera deaths identifying the epicentre of the outbreak through maps of death. And on an information literacy note you do need to know how to find the story in the graphics. Visualisation takes data, takes stories, and turns them into something of a narrative, explaining and enabling others to explore that data.

Robin Wilton georeferenced that original Snow data then Simon Rodgers (formally of Guardian, latterly of twitter) put data into CartoDB. This re interpretation of the data really makes the infected pump jump out at you, the different ways of visualising that data make the story even clearer.

Not all visualisations work, you may need narration. Graphics may not be meaningful to all people in the same way. E.g. Location of the pumps on these two maps. So this is where we get into theory. Reptinsp, a French cartographer, came up with his own systems of points, lines, symbols etc. but not based on research etc, his own cheat system. If you look at Gestalt psychology you get more research based visualisatsions – laws of similarity, proximity, continuity. There is something natural about where the eye is drawn but there is theory behind that too.

Jon Snows map was about explaining and investigating the data. His maps were explanatory visualisation and we have that same idea in Simon Rodgers map but it is also an exploratory visualisation, the reader/viewer can interact and interrogate it. But there are limitations of both approaches. Within both maps it’s essentially a heat map, more of something (in this case deaths). And you see that in visualisations you often get heat maps that actually map population rather than trends. Tony Hirst says “all charts are lies”. They are always an interpretation of the data from the creator’s point of view…

So going back to Simon Rodgers map we see that the radius of a dots based on the number of deaths. Note from the crowd “how to lie with statistics”. Yes, a real issue is that a lot of the work to get to that map is hidden, lots of room for error and confusion.

So having flagged up some examples and pitfalls I want to move onto the process of making data visualisations. Tools include Excel, Carto GB, Gephi, IBM Many Eyes, etc. but in addition to those tools and services you can also draw. Even now so many visualisations are made via drawing, if only final tweaking. Sometimes a sketch of a visualisation is the way to prototype ideas too. There are also code options, D3JS, SigmaJS, R, GGplot, etc.

Some issues around data: data access can be an issue, hard to find, hard to identify source data etc. Tony Hirst really recommends digging around for feeds, for RSS, find the stuff that feeds and powers pages. There are tools for reshaping feeds and data. Places like Yahoo Pipes, which lets you do drag and drop programming with input data. And I’ve started touching upon data shapes. Data may be provided in certain ways or shapes, but it may not suit your use. So a core skill is the transformation of data to reshape data, tools like Yahoo Pipes, Open Refine – which also lets you clean up data as well. I’ve tried Open Refine with public Jiscmail lists, to normalise for those with multiple user names.

So now the fun stuff…

For the Olympics last year for the cultural Olympiad last yer in Scotland we had the #citizenrelay tracking the progress of The Olympic torch. So lots of data to play with. First talk twitter (Topsy) media timeline. Uses Timeline by verity plus Topsy data. This was really easy to do. So data access was using Topsy, it pulls in data from Twitter to make its own archive. Has API to allow data. Make it easy to query for media against a hashtag. Can return data in XML but grabbed in Jason. Then output created with timelineJS. You can also use google spreadsheet template from timelineJS template (manually or automatically). Used spreadsheet her, yahoo pipes to manipulate. Can pull data in with google spreadsheets, when you’ve created the formula it will constantly refresh and update. So self updates when published.

Originally Topsy allowed data access without API key but now they require it. Google app script, JavaScript based – see big Stack Overflow community – has similar curl function for fetching URLs and dumping back into spreadsheet. Have also done this with yahoo pipes (use
Rate module for API key aspect).

Next as the relay went around the country they used Audioboo. When you upload AudioBoo geolocates your Boos. So AudioBoo has an API (without key) and you can filter for a tag. You can get the data out in XML, JSON and CSV option but they also produce KML. If you can access a public KML file and paste into Google Maps search box then it just gives you the map. Can then embed, or share link to that file. So super easy visualisation there. But disappointingly didn’t embed audio in the map pins. But that’s a google map limitation. Google Earth does let you do that though…

So using Google Earth we only have a bit of work to do. We need t work out the embed code. So Google now provides a template that lets you bring in placemark data (place marker templates). You can easily make changes here. And you can choose how to format variables. Yu can fill in manually but can also be automatically done SL use Google AppScript here. I go to AudioBoo API, grabs as JSON, then parses it. Then for each item push to spreadsheet. So for partial Geodata these Google templates are really useful.something else to mention: Google Spreadsheets are great, sit in the cloud. But recently was using Kasabi and it went down… And everything relying on it went live. Sometimes useful to take a flat capture as spreadsheet for back up.

So the next visualisation… Used NodeXL (SNA). This is an open source plug in for excel. It has a umber of data importers, including for twitter, Facebook, media wiki, etc. just from the menu). And it has lots of room for reformatting etc. then a grid view from that.

And this is where we start chaining tools together. So I had twitter data, I had NodeXL to identify community (who follows who, who is friends with who) so used Gephi, which lets you start using network graphs. A great way to see how nodes relate to which other. Often using for Social Network Analysis but people have also used it for cocktail recipes (there’s an academic paper on it). There is a recipe site that lets you reform recipes using same approach. Gephi is another tool.. You spend an hour playing… And then wonder about how to convey to others and you can end up with flat graphic. So I created something called TAGS Explorere to let anyone interact – and there are others who have done similar.

Another example here. A network of those using #ukoer hashtag and looking for bridges in the community, the key people. This is an early visualisation I created. It was generated From twitter connections and tag use with Gephi, but then combined and finished in a drawing package.

This is another example looking at different sources. A bubble chart for click throughs of tweets. Man get a degree of that info from bit.ly. But if you use another service it’s hard to get click through however can see referrals in Google Analytics – each twitter URL is unique to each person who tweets it so you can therefore see click through rate for an individual tweet. This is created in google spreadsheet. An explore interactively, reshape for your own exploration. So this spreadsheet goes and uses google analytics API and Twitter API then combines with some reshaping. One thing to be aware of is that spreadsheets have a duality of value and formulae. So when you call on APIs etc. it can get confusing. So sometimes good to use two sheets, second flr manipulaton. There’s a great blog post on this duality – “spreadsheet addiction”. if you are at IWMW next week I’m doing a whole session at Google Analytics data and reshaping.

Q&A

Comment: study/working group on social network analysis, some of these techniques could be buildpt onto our community of expertise here.

Comment: would have to slow way down for me but hopefully we can devise materials and workshops to make these step by step.

Martin: But there are some really easy wins, like that Google Maps one. And there is a good community of support around these tools. But for instance R, if I ask on Stack Overflow then I will get an answer back.

Q) is there a risk that if you start trying to visualise data you might miss out on proper statistical processes and vigour?

Martin: yes, that is a risk. People tend to be specialists in one area rather than all of them. Manchester Metroplitan use R as part of analysis of student surveys, recruitment etc. this was from an idea of Mark Stubbs, head of eLearning, raised by speaking to specialist in Teridon flight. r is wily used in the sciences and increasingly in big data analysis. So there it started with expert who did know what he was doing.

Q: have you done much with data mining or analysis, like Google N Gram?

Martin: not really. Done some work on sentiment analysis and social network data though.

DeliciousShare/Bookmark

GeoForum 2013 LiveBlog

GeoForum 2013 takes place at the Congress Centre in London from 10am until 4.15pm tomorrow. Throughout the day we will be liveblogging so, whether you are able to join us or not, we suggest you bookmark this post (link here) and take a look late tomorrow morning for notes from Shelley Mosco’s keynote. Keep an eye on the same post throughout the day as it will be updated after every session. We also welcome your comments (below) whether during or after the event.

You can also take part in GeoForum 2013 via our Twitter hashtag, #geoforum2013, where you are welcome to comment, contribute and engage with the Digimap team and our GeoForum attendees. We will also be tweeting key updates, images and notes from the event so if you don’t already follow @EDINA_Digimap, now’s the time to do it!

The schedule for the day, which will be reflected by our updates, is:

10:00 Registration with Refreshments
10:30 Welcome
10:45 Keynote Speaker – Shelley Mosco
11:30 Break
11:40 OpenSource Geo Resources
12:15 Lunch Including:

  • Service Demonstrations
14:00 Fieldtrip GB Excursion / EDINA Geoservices Review
14:50 Break with Refreshments
15:10 EDINA Geoservices Review / Fieldtrip GB Excursion
16:00 Closing Remarks
16:15 Close

EmailShare

SUNCAT updated

SUNCAT has been updated. Updates from the following libraries were loaded into the catalogue last week. The dates displayed indicate when the files were received by SUNCAT.

  • Aberystwyth University (15 May 13)
  • CONSER (05 Jun 13)
  • Dundee University (03 Jun 13)
  • Kent University (01 Jun 13)
  • London School of Economics and Political Science (07 May 13)
  • National Library of Scotland (03 Jun 13)
  • Nottingham University (03 Jun 13)
  • Oxford University (24 May 13)
  • Southampton University (02 Jun 13)
  • Swansea University (15 May 13)

To check on the currency of other libraries on SUNCAT please check the updates page for further details.

SUNCAT is the Serials Union Catalogue for the UK. Visit the service at http://www.suncat.ac.uk.

Share

SUNCAT updated

SUNCAT has been updated. Updates from the following libraries were loaded into the catalogue last week. The dates displayed indicate when the files were received by SUNCAT.
  • Aberystwyth University (15 May 13)
  • CONSER (05 Jun 13)
  • Dundee University (03 Jun 13)
  • Kent University (01 Jun 13)
  • London School of Economics and Political Science (07 May 13)
  • National Library of Scotland (03 Jun 13)
  • Nottingham University (03 Jun 13)
  • Oxford University (24 May 13)
  • Southampton University (02 Jun 13)
  • Swansea University (15 May 13)
To check on the currency of other libraries on SUNCAT please check the updates page for further details.

UK Survey of Academic: implications for SUNCAT

Whilst it is the norm for services such as SUNCAT to carry out an annual survey (reports are here) of users and their use of the service, it is also most important for those running such services to be aware of how users (actual and potential) approach resource discovery generally.  The report entitled UK Survey of Academics 2012 ,funded by Jisc and RLUK and carried out by Ithaka S+R, is therefore of considerable interest.  The report details the findings from a survey of a sample of UK academic staff with just under 3,500 responses received.
One chapter, of particular significance from a SUNCAT perspective, is that entitled Providing materials to academics: formats and sources.  One most interesting finding is:
“In the case of journal collections, about half of all respondents–slightly more in the arts and humanities than in other fields–strongly agreed that they “often would like to use journal articles that are not in [their] library’s print or digital collections.”
(P.38)
Given that SUNCAT’s principal raison d’être is to provide information on the serials’ holdings (print and digital) of major research libraries (there are currently 90 Contributing Libraries) this makes welcome reading.  Of some worry, though, is the response that, when locating information at the outset of research:
“Overall, the largest share of respondents–about 40%–indicated that they begin their research processes at a general purpose search engine on the internet or world wide web. A slightly smaller share–about one-third of respondents–indicated that they begin their research at a specific electronic research resource/computer database. A relatively smaller share–slightly less than 15% each–of respondents reported starting with an online library catalogue or a national or international catalogue or database”.  (p.21)
For a service such as SUNCAT it is vital for all potential users to know it exists and what facilities it provides.  SUNCAT is assisted in alerting users to the existence of the service by information provided on institutional websites and EDINA is very grateful to the many institutions who have promoted the service in this way.  To assist library staff, a leaflet outlining specific ways some organisations had promoted the service was distributed in 2011 and proved to be a useful source of information.  We will be looking anew at the information in the leaflet and updating it where appropriate.  We will also be looking at other ways of promoting the service and bringing it to the attention of potential users and would, of course, welcome any suggestions on ways we might consider.
There was another response in the report of considerable interest to SUNCAT.
Roughly 3 out of 5 respondents indicated that they often or occasionally use library-provided inter-library loan or document delivery services to access journal articles and monographs. (p.39)
The importance of serving inter-library loan staff has long been recognised.  In the recent survey for the provision of feedback on the new interface  there were requests for the inclusion of British Library Codes and email addresses.  BL Codes will be made available in the initial release of the service and it is hoped to provide the email addresses in a future release of the software.
The report, therefore, is of much interest to SUNCAT.  Whilst it does reinforce some of the reasons for the establishment of the service in the first place, it also is a prompt to us to review our promotional activities to try and ensure that all who might have reason to use the service know about its existence.
SUNCAT is the Serials Union Catalogue for the UK. Visit the service at http://www.suncat.ac.uk.

Share

UK Survey of Academics: implications for SUNCAT

Whilst it is the norm for services such as SUNCAT to carry out an annual survey (reports are here) of users and their use of the service, it is also most important for those running such services to be aware of how users (actual and potential) approach resource discovery generally.  The report entitled UK Survey of Academics 2012 ,funded by Jisc and RLUK and carried out by Ithaka S+R, is therefore of considerable interest.  The report details the findings from a survey of a sample of UK academic staff with just under 3,500 responses received.
One chapter, of particular significance from a SUNCAT perspective, is that entitled Providing materials to academics: formats and sources.  One most interesting finding is:

“In the case of journal collections, about half of all respondents–slightly more in the arts and humanities than in other fields–strongly agreed that they “often would like to use journal articles that are not in [their] library’s print or digital collections.”
(P.38)

Given that SUNCAT’s principal raison d’être is to provide information on the serials’ holdings (print and digital) of major research libraries (there are currently 90 Contributing Libraries) this makes welcome reading.  Of some worry, though, is the response that, when locating information at the outset of research: 

“Overall, the largest share of respondents–about 40%–indicated that they begin their research processes at a general purpose search engine on the internet or world wide web. A slightly smaller share–about one-third of respondents–indicated that they begin their research at a specific electronic research resource/computer database. A relatively smaller share–slightly less than 15% each–of respondents reported starting with an online library catalogue or a national or international catalogue or database”.  (p.21)

For a service such as SUNCAT it is vital for all potential users to know it exists and what facilities it provides.  SUNCAT is assisted in alerting users to the existence of the service by information provided on institutional websites and EDINA is very grateful to the many institutions who have promoted the service in this way.  To assist library staff, a leaflet outlining specific ways some organisations had promoted the service was distributed in 2011 and proved to be a useful source of information.  We will be looking anew at the information in the leaflet and updating it where appropriate.  We will also be looking at other ways of promoting the service and bringing it to the attention of potential users and would, of course, welcome any suggestions on ways we might consider. 
There was another response in the report of considerable interest to SUNCAT.

Roughly 3 out of 5 respondents indicated that they often or occasionally use library-provided inter-library loan or document delivery services to access journal articles and monographs. (p.39)

The importance of serving inter-library loan staff has long been recognised.  In the recent survey for the provision of feedback on the new interface  there were requests for the inclusion of British Library Codes and email addresses.  BL Codes will be made available in the initial release of the service and it is hoped to provide the email addresses in a future release of the software.

The report, therefore, is of much interest to SUNCAT.  Whilst it does reinforce some of the reasons for the establishment of the service in the first place, it also is a prompt to us to review our promotional activities to try and ensure that all who might have reason to use the service know about its existence.
Posted in Uncategorized

SUNCAT updated

SUNCAT has been updated. Updates from the following libraries were loaded into the catalogue in the last two weeks. The dates displayed indicate when the files were received by SUNCAT.

  • Aberdeen University (01 Apr 13)
  • Bradford University (07 May 13)
  • CONSER(29 May 13)
  • Durham University (22 May 13)
  • Edinburgh University (21 May 13)
  • Essex University (03 May 13)
  • Hull University (24 Apr 13)
  • ISSN (16 May 13)
  • Leicester University (24 May 13)
  • Manchester University (21 May 13)
  • National Art Library (16 May 13)
  • Natural History Museum (25 May 13)
  • Queen Mary, University of London (15 May 13)
  • School of Oriental & African Studies, University of London (15 May 13)
  • Southampton University (26 May 13)
  • University College London (07 May 13)
  • Zoological Society of London (30 May 13)

To check on the currency of other libraries on SUNCAT please check the updates page for further details.

SUNCAT is the Serials Union Catalogue for the UK. Visit the service at http://www.suncat.ac.uk.

Share

SUNCAT updated

SUNCAT has been updated. Updates from the following libraries were loaded into the catalogue in the last two weeks. The dates displayed indicate when the files were received by SUNCAT.
  • Aberdeen University (01 Apr 13)
  • Bradford University (07 May 13)
  • CONSER(29 May 13)
  • Durham University (22 May 13) 
  • Edinburgh University (21 May 13)
  • Essex University (03 May 13)
  • Hull University (24 Apr 13)
  • ISSN (16 May 13)
  • Leicester University (24 May 13)
  • Manchester University (21 May 13)
  • National Art Library (16 May 13)
  • Natural History Museum (25 May 13)
  • Queen Mary, University of London (15 May 13)
  • School of Oriental & African Studies, University of London (15 May 13)
  • Southampton University (26 May 13)
  • University College London (07 May 13)
  • Zoological Society of London (30 May 13)
To check on the currency of other libraries on SUNCAT please check the updates page for further details.

Summer weird and wonderful titles

Summer is here at last, though not today in Edinburgh! Take a look at some of the weird and wonderful titles we have in SUNCAT.
  • Ally Sloper’s summer number.
  • Annual report of summer tent and open air campaign.
  • Bus enthusiast summer special.
  • The ’Pink ’un’ summer annual, 1913, ed. by W.F. De Wend Fenton. More nuts in a nutshell.
  • Report of the Men of the Trees Summer School and Conference ... 1938 [etc.].
  • Spring into summer.
  • The summer hog outlook.
  • Summer grilling.
  • The summer sheep and wool outlook.
  • Worzel Gummidge summer special.
  • The Alps : (A magazine of light literature and useful information for the summer tourist).
  • Machine knitting news. Summer tops collection.
  • Summer salt :
  • Hello! : this is sunny Rhyl.

For more serials with a summer theme and other weird and wonderful search SUNCAT.

Summer weird and wonderful titles

Summer is here at last, though not today in Edinburgh! Take a look at some of the weird and wonderful titles we have in SUNCAT.

  • Ally Sloper’s summer number.
  • Annual report of summer tent and open air campaign.
  • Bus enthusiast summer special.
  • The ’Pink ’un’ summer annual, 1913, ed. by W.F. De Wend Fenton. More nuts in a nutshell.
  • Report of the Men of the Trees Summer School and Conference … 1938 [etc.].
  • Spring into summer.
  • The summer hog outlook.
  • Summer grilling.
  • The summer sheep and wool outlook.
  • Worzel Gummidge summer special.
  • The Alps : (A magazine of light literature and useful information for the summer tourist).
  • Machine knitting news. Summer tops collection.
  • Summer salt :
  • Hello! : this is sunny Rhyl.

For more serials with a summer theme and other weird and wonderful search SUNCAT.

SUNCAT is the Serials Union Catalogue for the UK. Visit the service at http://www.suncat.ac.uk.

Share