Aerial Digimap data: The mapping service where itâ€™s always sunny

Posted on February 8, 2017 by Pauline Ward

The latest Digimap addition is aerial photo images, covering the whole of Great Britain to 25cm precision. The University of Edinburgh has just subscribed to Aerial Digimap, so the great news is that staff and students can now access these wonderful images, overlay them onto other map layers, and combine them with building height and topology data to make amazing and beautiful three-dimensional maps of the whole of Britain.

I’ve used Aerial Digimap to label the entrance to Argyle House, home of EDINA. Â© GetMapping and University of Edinburgh. This map contains OS data.

Digimap is a visual interface that allows users to explore, annotate and download mapping data covering the whole of Great Britain.* Digimap’s historical map data go back as far as the 1840s, while geological, marine and environmental data have been available for some time.

It’s strikingly sunny in the images of Edinburgh. The Digimap team confirmed this is a UK-wide phenomenon: “Aerial Photography can only be captured on clear days, so it’s always sunny in Aerial Roam!”

You can watch a guided tour of Aerial Digimap’s features and a demonstration of how to make the most of them byÂ EDINA’s Ian Holmes in this recently recorded webinar:Â

Click here to view the embedded video.

To get started with Aerial Digimap, login with your EASE account at: http://digimap.edina.ac.uk/aerial

* For mapping data covering Northern Ireland, please see Ordnance Survey of Northern Ireland.

Pauline Ward is a Research Data Service Assistant based at EDINA, supporting staff and students at the University of Edinburgh

DataShare upgraded to v2.3 – The embargo enhancement release

Posted on December 7, 2016 by Pauline Ward

The latest upgrade of Edinburgh DataShare, from version 2.2 to 2.3, brings in several useability improvements.

Embargo expiry reminder
If you want to deposit your data in DataShare, but you want to impose a delay before your files become freely downloadable, you can apply an embargo to your submission – see our “Checklist for deposit” for a fuller explanation of the embargo feature. As of DataShare v2.3, if you apply an embargo to your deposit, DataShare will now send you an email reminder one week before the embargo is due to expire. This gives you time to make us aware if you need the embargo to be extended, or to send us the details of your paper if it has been published, so that we can add those to the metadata, to help users understand your data.
DOI added to the citation field immediately
When your DataShare deposit is approved by the curator, the system mints a new DOI for you. As of version 2.3, DataShare now immediately appends the URL containing that DOI into the â€œCitationâ€� field, which is visible at the top of the summary view page of your item. The â€œCitationâ€� field makes it easy for others to cite your data, because it provides them with text which they can copy and paste into any manuscript (or any other document where they want to cite the data). Previously you would have had to click on â€œShow full item recordâ€� to look for the DOI in the â€œPersistent identifierâ€� field, or wait for an overnight script to paste the DOI onto the end of the â€œCitationâ€� field.

Tombstone records
We now have the ability to leave a â€˜tombstoneâ€™ record in place for any DataShare item that is withdrawn. We only withdraw items in exceptional circumstances â€“ for example where there is a substantive error or omission in the data, such that we feel merely labelling the item as â€œSupersededâ€� is not sufficient. Now, when we tombstone an item, the files become unavailable indefinitely, but the metadata remain visible at the DOI and handle URLs. Whereas until now, every withdrawn item has become completely invisible, so that the original DOI and handle URLs produced a â€˜not foundâ€™ error.

Screenshot of a DataShare item's citation field with the DOI

Cortical parcellation citation – now with DOI!

Enjoy!

Pauline Ward

Research Data Service

Twenty’s Plenty: DataShare v2.1 Upload Upgrade

Posted on June 17, 2016 by Pauline Ward

We have upgraded DataShare (to v2.1) to enable HTML5 resumable upload. This means depositors can now use the user-friendly web deposit interface to upload numerous files at once via dragâ€™nâ€™drop. And to upload files up to 15 GB in size, regardless of network â€˜blipsâ€™.

In fact we have reason to believe it may be possible to upload a 20 GB file this way: in testing, I gave it 2 hours till the progress bar said 100%, and even though the browser then produced an error message instead of the green tick I was hoping for, I found when I retrieved the submission from the Submissions page that I was able to resume, and the file had been added.

*** So our new advice to depositors is: our current Item size limit and file size limit is 20 GB. Files larger than 15 GB may not upload through your browser. If you have files over 15 GB or data totalling over 20 GB which youâ€™d like to share online, please contact the Data Library team to discuss your options. ***

See screenshots below. Once the files have been selected and the upload commenced, the â€˜Statusâ€™ column shows the percentage uploaded. A 10 GB file may take in the region of 1 hour to upload in this way. 15 GB files have been uploaded with Chrome, Firefox and Internet Explorer using this interface.

Until now, any file over 1 GB had caused browsers difficulties, meaning many prospective depositors were not able to use the web deposit interface, and instead had to email the curation team, arrange to transfer us their files via DropBox, USB or through the Windows network, and then the curator had to transfer these same files to our server, collate the metadata into an XML file, log into the Linux system and run a batch import script. Often with many hiccups concerning permissions, virus checkers and memory along the way. All very time-consuming.

Soon we will begin working on a download upgrade, to integrate a means for users to download much bigger files from DataShare outside of the limitations of HTTP (perhaps using FTP). The aim is to allow some of the datasets we have in the university which are in the region of 100 GB to be shared online in a way that makes it reasonably quick and easy for users to download them. We have depositors queueing up to use this feature. Watch this space.

Further technical detail about both the HTML5 upload feature and plans for an optimised large download release are available on the slides for the presentation I made at Open Repositories 2016 in Dublin this week: http://www.slideshare.net/paulineward/growing-open-data-making-the-sharing-of-xxlsized-research-data-files-online-a-reality-using-edinburgh-datashare .

A simple interface invites the depositor to select files to upload.

A 15 GB file uploaded via Firefox on Windows and included in a submitted Item.

A 20 GB file uploaded and included in an incomplete submission.

Pauline Ward, Data Library Assistant, University of Edinburgh

Publishing Data Workflows (Guest Post by Angus Whyte)

Posted on March 8, 2016 by Pauline Ward

In the first week of March the 7^th Plenary session of the Research Data Alliance got underway in Tokyo. Plenary sessions are the fulcrum of RDA activity, when its many Working Groups and Interest Groups try to get as much leverage as they can out of the previous 6 months of voluntary activity, which is more usually coordinated through crackly conference calls.

The Digital Curation Centre (DCC) and others in Edinburgh contribute to a few of these groups, one being the Working Group (WG) on Publishing Data Workflows. Like all such groups it has a fixed time span and agreed deliverables. This WG completes its run at the Tokyo plenary, so thereâ€™s no better time to reflect on why DCC has been involved in it, how weâ€™ve worked with others in Edinburgh and what outcomes itâ€™s had.

DCC takes an active part in groups where we see a direct mutual benefit, for example by finding content for our guidance publications. In this case we have a How-to guide planned on â€˜workflows for data preservation and publicationâ€™. The Publishing Data Workflows WG has taken some initial steps towards a reference model for data publishing, so it has been a great opportunity to track the emerging consensus on best practice, not to mention examples we can use.

One of those examples was close to hand, and DataShareâ€™s workflow and checklist for deposit is identified in the report alongside workflows from other participating repositories and data centres. That report is now available on Zenodo. [1]

In our mini-case studies, the WG found no hard and fast boundaries between â€˜data publishingâ€™ and what any repository does when making data publicly accessible. Itâ€™s rather a question of how much additional linking and contextualisation is in place to increase data visibility, assure the data quality, and facilitate its reuse. Hereâ€™s the working definition we settled on in that report:

Research data publishing is the release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way.

The â€˜key componentsâ€™ of data publishing are illustrated in this diagram produced by Claire C. Austin.

Data publishing components. Source: Claire C. Austin et al [1]

As the Figure implies, a variety of workflows are needed to build and join up the components. They include those â€˜upstreamâ€™ around the data collection and analysis, â€˜midstreamâ€™ workflows around data deposit, packaging and ingest to a repository, and â€˜downstreamâ€™ to link to other systems. These downstream links could be to third-party preservation systems, publisher platforms, metadata harvesting and citation tracking systems.

The WG recently began some follow-up work to our report that looks â€˜upstreamâ€™ to consider how the intent to publish data is changing research workflows. Links to third-party systems can also be relevant in these upstream workflows. It has long been an ambition of RDM to capture as much as possible of the metadata and context, as early and as easily as possible. That has been referred to variously as â€˜sheer curationâ€™ [2], and â€˜publication at source [3]). So we gathered further examples, aiming to illustrate some of the ways that repositories are connecting with these upstream workflows.

Electronic lab notebooks (ELN) can offer one route towards fly-on-the-wall recording of the research process, so the collaboration between Research Space and University of Edinburgh is very relevant to the WG. As noted previously on these pages [4] ,[5], the RSpace ELN has been integrated with DataShare so researchers can deposit directly into it. So we appreciated the contribution Rory Macneil (Research Space) and Pauline Ward (UoE Data Library) made to describe that workflow, one of around half a dozen gathered at the end of the year.

The examples the WG collected each show how one or more of the recommendations in our report can be implemented. There are 5 of these short and to the point recommendations:

Start small, building modular, open source and shareable components
Implement core components of the reference model according to the needs of the stakeholder
Follow standards that facilitate interoperability and permit extensions
Facilitate data citation, e.g. through use of digital object PIDs, data/article linkages, researcher PIDs
Document roles, workflows and services

The RSpace-DataShare integration example illustrates how institutions can follow these recommendations by collaborating with partners. RSpace is not open source, but the collaboration does use open standards that facilitate interoperability, namely METS and SWORD, to package up lab books and deposit them for open data sharing. DataShare facilitates data citation, and the workflows for depositing from RSpace are documented, based on DataShareâ€™s existing checklist for depositors. The workflow integrating RSpace with DataShare is shown below:

RSpace-DataShare Workflows

For me one of the most interesting things about this example was learning about the delegation of trust to research groups that can result. If the DataShare curation team can identify an expert user who is planning a large number of data deposits over a period of time, and train them to apply DataShareâ€™s curation standards themselves they would be given administrative rights over the relevant Collection in the database, and the curation step would be entrusted to them for the relevant Collection.

As more researchers take up the challenges of data sharing and reuse, institutional data repositories will need to make depositing as straightforward as they can. Delegating responsibilities and the tools to fulfil them has to be the way to go.

[1] Austin, C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Available at: http://dx.doi.org/10.5281/zenodo.34542

[2] â€˜Sheer Curationâ€™ Wikipedia entry. Available at: https://en.wikipedia.org/wiki/Digital_curation#.22Sheer_curation.22

[3] Frey, J. et al (2015) Collection, Curation, Citation at Source: Publication@Source 10 Years On. International Journal of Digital Curation. 2015, Vol. 10, No. 2, pp. 1-11

http://doi:10.2218/ijdc.v10i2.377

[4] Macneil, R. (2014) Using an Electronic Lab Notebook to Deposit Data http://datablog.is.ed.ac.uk/2014/04/15/using-an-electronic-lab-notebook-to-deposit-data/

[5] Macdonald, S. and Macneil, R. Service Integration to Enhance Research Data Management: RSpace Electronic Laboratory Notebook Case Study International Journal of Digital Curation 2015, Vol. 10, No. 1, pp. 163-172. http://doi:10.2218/ijdc.v10i1.354

Angus Whyte is a Senior Institutional Support Officer at the Digital Curation Centre.

open.ed report

Posted on March 13, 2015 by Pauline Ward

Lorna M. Campbell, a Digital Education Manager with EDINA and the University of Edinburgh, writes about the ideas shared and discussed atÂ the open.ed eventÂ this week.

Earlier this week I was invited by Ewan Klein and Melissa Highton to speak atÂ Open.Ed, an event focused on Open Knowledge at the University of Edinburgh. Â A storify of the event is available here:Â Open.Ed â€“ Open Knowledge at the University of Edinburgh.

â€œOpen Knowledge encompasses a range of concepts and activities, including open educational resources, open science, open access, open data, open design, open governance and open development.â€�

Â â€“ Ewan Klein

Ewan set the benchmark for the day by reminding us that open data is only open by virtue of having an open licence such as CC0, CC BY, CC SA. CC Non Commercial should not be regarded as an open licence as it restricts use. Â Melissa expanded on this theme, suggesting that there must be an element of rigour around definitions of openness and the use of open licences. There is a reputational risk to the institution if weâ€™re vague about copyright and not clear about what we mean by open. Melissa also reminded us not to forget open education in discussions about open knowledge, open data and open access. Edinburgh has a long tradition of openness, as evidenced by theÂ Edinburgh Settlement, but we need a strong institutional vision for OER, backed up by developments such as the Scottish Open Education Declaration.

I followed Melissa, providing a very brief introduction toÂ Open ScotlandÂ and theÂ Scottish Open Education Declaration, before changing tack to talk about open access to cultural heritage data and its value to open education. This isnâ€™t a topic I usually talk about, but with a background in archaeology and an active interest in digital humanities and historical research, itâ€™s an area thatâ€™s very close to my heart. As a short case study I used the example of Edinburgh Universityâ€™s excavations atÂ Loch na Berie brochÂ on the Isle of Lewis, which I worked on in the late 1980s. Although the site has been extensively published, itâ€™s not immediately obvious how to access the excavation archive. Iâ€™m sure itâ€™s preserved somewhere, possibly within the university, perhaps at RCAHMS, or maybe at the National Museum of Scotland. Where ever it is, itâ€™s not openly available, which is a shame, because if I was teaching a course on the North Atlantic Iron Age there is some data form the excavation that I might want to share with students. This is no reflection on the directors of the fieldwork project, itâ€™s just one small example of how greater access to cultural heritage data would benefit open education. I also flagged up a rather frighteningÂ blog post, Dennis the Paywall Menace Stalks the Archives, Â by Andrew Prescott which highlights the dangers of what can happen if we do not openly licence archival and cultural heritage data â€“ it becomes locked behind commercial paywalls. However there are some excellent examples of open practice in the cultural heritage sector, such as the National Portrait Galleryâ€™sÂ clearly licensedÂ digital collections and the work of theÂ British Library Labs. However openness comes at a cost and we need to make greater efforts to explore new business and funding models to ensure that our digital cultural heritage is openly available to us all.

Ally Crockford, Wikimedian in Residence at the National Library of Scotland, spoke about the hugely successfulÂ Women, Science and Scottish HistoryÂ editathon recently held at the university. However she noted that as members of the university we are in a privileged position inÂ that enables us to use non-open resources (books, journal articles, databases, artefacts) to create open knowledge. Furthermore, with Wikpediaâ€™s push to cite published references, there is a danger of replicating existing knowledge hierarchies. Ally reminded us that as part of the educated elite, we have a responsibility to open our mindsets to all modes of knowledge creation. Publishing in Wikipedia also provides an opportunity to reimagine feedback in teaching and learning. Feedback should be an open participatory process, and what better way for students to learn this than from editing Wikipedia.

Robin Rice,Â EDINA, asked the question what does Open Access and Open Data sharing look like? Open Access publications are increasingly becoming the norm, but weâ€™re not quite there yet with open data. Itâ€™s not clear if researchers will be cited if they make their data openly available and career rewards are uncertain. However there are huge benefits to opening access to data and citizen science initiatives; public engagement, crowd funding, data gathering and cleaning, and informed citizenry. In addition, social media can play an important role in working openly and transparently.

Bert Remijsen, talking about computational neuroscience and the problem of reproducibility, picked up this theme, adding that accountability is a big attraction of open data sharing. Bert recommended usingÂ iPython NotebookÂ Â Â for recording and sharing data and computational results and helping to make them reproducible. This promoted Anne-Marie Scott to comment on twitter:

“Imagine students creating iPython notebooks… and then sharing them as OER #openEd”

Very cool indeed.

James StewartÂ spoke about the benefits of crowdsourcing and citizen science.Â Â Despite the buzz words, this is not a new idea, thereâ€™s a long tradition of citizens engaging in science. Darwin regularly received reports and data from amateur scientists. Maintaining transparency and openness is currently a big problem for science, but openness and citizen science can help to build trust and quality. James also citedÂ Open Street MapÂ as a good example of building community around crowdsourcing data and citizen science. Crowdsourcing initiatives create a deep sense of community â€“ itâ€™s not just about the science, itâ€™s also about engagement.

After coffee (accompanied by Tunnocks caramel wafers â€“ I approve!) We had a series of presentations on the student experience and students engagement with open knowledge.

Paul Johnson and Greg Tyler, from theÂ Web, Graphics and InteractionÂ section of IS,Â Â spoke about the necessity of being more open and transparent with institutional data and the importance of providing more open data to encourage students to innovate. Hayden Bell highlighted the importance of having institutional open data directories and urged us to spend less time gathering data and more making something useful from it. Students are the source of authentic experience about being a student â€“ we should use this! Student data hacks are great, but they often have to spend longer getting and parsing the data than doing interesting stuff with it. Steph Hay also spoke about the potential of opening up student data. VLEs inform the student experience; how can we open up this data and engage with students using their own data? Anonymised data from Learn was provided atÂ Smart Data Hack 2015Â but students chose not to use it, though it is not clear why. Â Finally,Â Hans Christian Gregersen brought the day to a close with a presentation ofÂ Book.ed, one of the winning entries of the Smart Data Hack. Book.edÂ is an app that uses open data to allow students to book rooms and facilities around the university.

What really struck me aboutÂ Open.EdÂ was the breadth of vision and the wide range of open knowledge initiatives scatteredÂ across the university. Â The value of events like this is that they help to share this vision with fellow colleagues as thatâ€™s when the cross fertilisation of ideas really starts to take place.

This report first appeared on Lorna M. Campbell’s blog, Open World:Â Â lornamcampbell.wordpress.com/2015/03/11/open-ed

Dancing with Data

Posted on October 15, 2014 by Pauline Ward

I went to an interesting talk yesterday by Prof Chris Speed called â€œDancing with Dataâ€�, on how our interactions and relationships with each other, with the objects in our lives and with companies and charities are changing as a result of the data that is now being generated by those objects (particularly smartphones, but increasingly by other objects too). New phenomena such as 3D printing, airbnb, foursquare and iZettle are giving us choices we never had before, but also leading to things being done with our data which we might not have expected or known about. The relationships between individuals and our data are being re-defined as we speak. Prof Speed challenged us to think about the position of designers in this new world where push-to-pull markets are being replaced by new models. He also told us about his research collaborations with Oxfam, looking at how technology might enhance the value of the second-hand objects they sell by allowing customers to hear their stories from their previous owners. Â Â

All very thought-provoking, but what about the implications for academic research, aside from those working in the fields of Design, Economics or Sociology who must now develop new models to reflect this changing landscape? Well, the question arises, if all this data is being generated and collected by companies, are the academics (and indeed the charity sector) falling behind the curve? Here at the University of Edinburgh, my colleagues in Informatics are doing Data Science research, looking into the infrastructure and the algorithms used to analyse the kind of commercial Big Data flowing out of the smartphones in our pockets, while Prof Speed and his colleagues are looking at how design itself is being affected. But perhaps academics in all disciplines need to be tuning their antennae to this wavelength and thinking seriously about how their research can adapt to and be enhanced by the new ways we are all dancing with data.

For more about the University of Edinburghâ€™s Design Informatics research and forthcoming seminars see www.designinformatics.org. Prof Chris Speed tweets @ChrisSpeed.

Pauline Ward is a Data Library Assistant working at the University of Edinburgh and EDINA.

Data journals â€“ an open data story

Posted on June 6, 2014 by Pauline Ward

Here at the Data Library we have been thinking about how we can encourage our researchers who deposit their research data in DataShare to also submit these for peer review.

Why? We hope the impact of the research can be enhanced with the recognised added-value of peer review. Regardless whether there is a full-blown article to accompany the data.

We therefore decided recently to provide our depositors with a list of websites or organisations where they could do this.

I pulled a table together, from colleaguesâ€™ suggestions, from the PREPARDE project and the latest RDM textbook. And, very much in the Open Data spirit, I then threw the question open on Twitter:

â€œ[..]does anyone have an up-to-date list of journals providin peer review of datasets (without articles), other than PREPARDE? #opendataâ€�

â€¦and published the draft list for others to check or make comments on. This turned out to be a good move. The response from the Research Data Management community on Twitter was very heartening, and colleagues from across the globe provided some excellent enhancements for the list.

That process has given us confidence to remove the word â€˜Draftâ€™ from the title â€“ the list, this crowd-sourced resource, will need to be updated from time-to-time, but we are confident that weâ€™ve achieved reasonable coverage of the things we were looking for.

Another result of this search was the realisation that what we had gathered was in fact quite clearly a list of Data Journals. My colleague Robin Rice has now added a definition of that term to the list, and we will be providing all our depositors with a link to it:

https://www.wiki.ed.ac.uk/display/datashare/Sources+of+dataset+peer+review

EDINA Blogs

A Blogs.edina.ac.uk weblog

Author Archives: Pauline Ward

Aerial Digimap data: The mapping service where itâ€™s always sunny

Twenty’s Plenty: DataShare v2.1 Upload Upgrade

open.ed report

Dancing with Data