Repository Fringe Day Two LiveBlog

Welcome back to day two of Repository Fringe 2015! For a second day we will be sharing most of the talks via our liveblog here, or you can join the conversation at #rfringe15. We are also taking images around the event and encourage you to share your own images, blog posts, etc. Just use the hashtag and/or let us know where to find them and we’ll make sure we link to your coverage, pictures and comments.

This is a liveblog and that means there will be a few spelling errors and may be a few corrections required. We welcome your comments and, if you do have any corrections or additional links, we encourage you to post them here. 

Integration – at the heart of  – Claire Knowles, University of Edinburgh; Steve Mackey, Arkivum

Steve is leading this session, which has been billed as “storage” but is really about integration.

We are a company, which came out of the University of Southampton, and our flagship Arkivum100 service has a 100% data integrity guarantee. We sign contracts in the long term, for 25 years – most cloud services sign yearly contracts. But we also have a data escrow exit – so there is a copy on tape that enables you to retrieve your data after you have left. It uses all open source encryption which means it can be decrypted as long as you have the key.

So why use a service like Arkivum for keeping data alive for 25+ years. Well things change all the time. We add media all the time, more or less continually… We do monthly checks and maintenance updates but also annual data retrieval and integrity checks. There are companies, Sky would be an example, that has a continual technology process in place – three parallel systems – for their media storage in order to keep up with technology. There is a 3-5 year obsolescence of services, operating systems and software so we will be refreshing hardware, and software and hardware migrations.

The Arkival appliance is a CIFS/NFS rpresentation which means it integrates easily to local file systems. There is also a robust REST API. There is simple administration of users permissions, storage allocations etc. We have a GUI for file ingest status but also recovery pre-staging and security. There is also an ingest process triggerd by timeout, checksum, change, manifest – we are keen that if anything changes you are triggered to check and archive the data before you potentially lose or remove your local comment.

So the service starts with original datasets and files, we take copy for ingest, via the Arkivum Gateway on Appliance, we encrypt and also decrypt to check the process. We do check sums at all stages. Once all is checked it is validated and sent to our Archive on the Janet Network, and it is also archived to a second archive and to the escrow copy on tape.

 

We sit as the data vault, the storage layer within the bigger system which includes data repository, data asset register, and CRIS. Robin Taylor will be talking more about that bigger ecosystem.

We tend to think of data as existing in two overlapping cycles – live data and archive data. We tend to focus much more on the archive side of things, which relates to funder expectations. But there is often less focus on live data being generated by researchers – and those may be just as valuable and in need of securing as that archive data.

In a recent Jisc Research Data Spring report the concept of RDM Workflows is discussed. See “A consortial approach to building an integrated RDM system – “small and specialist”” and that specifically talks about examples of workflows, including researcher centric workflows that lays out the process for the research to take in their research data management. We have examples in the report include those created for Loughborough and Southampton.

Loughborough have a CRIS, they have DSpace, and they use FigShare for data dissemination. You can see interactions in terms of the data flow are very complex [slides will be shared but until then I can confirm this is a very complex picture] and the intention of the workflow and process of integration is to make that process simpler and more transparent for the researcher.

So, why integrate? Well we want those process to be simpler and easier to encourage adoption and also lower cost of institutional support to the research base. It’s one thing to have a tick box, it’s another to get researchers to actually use it.  We also, having been involved multiple times, have experience in the process of rolling RDM out – our work with ULCC on CHEST particularly helped us explore and develop approaches to this. So, we are checking quality and consistency in RDM across the research base. We are deploying RDM as a community driven shared service so that smaller institutions can “join forces” to benefit from having access to common RDM infrastructure.

So, in terms of integrations we work with customers with DSpace and EPrints, with customers using FigShare, and moving a little away from the repository and towards live research data we are also doing work around Sharegate (based on archiving Sharepoint), iRODS and QStar; and with ExLibris Rosetta and archivematica. We really have yet to see real preservation work being done with research data management but it’s coming, and archivematica is an established tool for preservation in the cultural heritage and museums sector.

Q1) Do you have any metrics on the files you are storing?

A1) Yes, you can generate reports from the API, or can access via the GUI. The QStar tool, and HSM tool, allows you to do a full survey of the environment that will crawl your system and let you know about file age and storage etc. And you can do a simulation of what will happen.

Q2) Can I ask about the integration with EPrints?

A2) We are currently developing a new plugin which is being driven by new requirements for much larger datasets going into EPrints and linking through. But the work we have previously done with ULCC is open source. The Plugins for EPrints are open source. Some patches created were designed by @mire so a different process but after those have been funded they are willing for those to be open source.

Q3) When repositories were set up there was a real drive for the biggest repository possible, being sure that everyone would want the most storage possible… But that is also expensive… And it can take a long time to see uptake. Is there anything you can say that is helpful for planning and practical advice about getting a service in place to start with? To achieve something practical at a more modest scale.

A3) If you use managed services you can use as little as you want. If you build your own you tend to be into a fixed capital sum… That’s a level of staffing that requires a certain scale. We start as a few terabytes…

Comment – Frank, ULCC) We have a few customers who go for the smallest possible set up for trial and error type approach. Most customers go for the complete solution, then reassess after 6 months or a year… Good deal in terms of price point.

A3) The work with Jisc has been about looking at what those requirements are. From CHEST it is clear that not all organizations want to set up at the same scale.

Unfortunately our next speaker, Pauline Ward from EDINA,  is unwell. In place of her presentation, Are your files too big? (for upload / download) we will be hearing from Robin Taylor.

Data Vault – Robin Taylor 

This is a collaborative project with University of Manchester, funded by JISC Research Data Spring.

Some time back we purchased a lot of kit and space for researchers, giving each their own allocation. But for the researcher the workflow the data is generated, that goes into a repository but they are not sure what data to keep and make available, what might be useful again, and what they may be mandated to retain. So we wanted a way for storing that data, and that needed to have some sort of interface to enable that.

Edinburgh and Manchester had common scenarios, commons usages. We are both dealing with big volumes of data, hundreds of thousands or even millions of files. It is impossible to use mechanisms of web interfaces for upload. So we need that archiving to happen in the background.

So, our solution has been to establish the Data Vault User Interface, that interacts with a Data Vault Broker/Policy Engine that interacts with Data Archive, and the Broker then interacts with the active storage. But we didn’t want to build something so bespoke that it didn’t integrate with other systems – the RSpace lab notebooks for instance. And it may be that the archive might be Arkivium, or might be Amazon, or might be tape… So our mechanism abstracts that in a way, to create a simple way for researchers to archive their data in a standard “bag-it” type way.

But we should say that preservation is not something we have been looking at. Format migration isn’t realistic at this point. At the scale we are receiving data doesn’t make that work practical. But, more important, we don’t know what the suitable time is, so instead we are storing that data for the required period of time and then seeing what happens.

Q1) You mentioned you are not looking at preservation at the moment?

A1) It’s not on our to do list at the moment. The obvious comparison would be with archivematica which enables data to be stored in more sustainable formats, because we don’t know what those formats will be… That company have a specific list of formats they can deal with… That’s not everything. But that’s not to say that down the line that won’t be something we want to look at, it’s just not what we are looking at at the moment. We are addressing researchers’ need to store data on a long term basis.

Q1) I ask because what accessibility means is important here.

A1) On the live data, which is published, there is more onus on the institution to ensure that is available. This hold back wider data collections

Q2) Has anyone ever asked for their files back?

A2) There is discussion ongoing about what a backup is, what an archive is etc. Some use this as backup but that is not what this should be. This is about data that needs to be more secure but maybe doesn’t need to have instant access – which it may not be. We have people asking about archiving, but we haven’t had requests for data back. The other question is do we delete stuff that we have archived – the researchers are best placed to do that so we will find out in due course how that works.

Q3) Is there a limit on the live versus the archive storage?

A3) Yes, every researcher has a limited quantity of active storage, but a research group can also buy extra storage if needed.  But the more you work with quotas, the more complex this gets.

Comment) I would imagine that if you charge for something, the use might be more thoughtful.

A3) There is a school of thought that charging means that usage won’t be endless, it will be more thoughtful.

Repositories Unleashing Data! Who else could be using your data? – Graham Steel, ContentMine/Open Knowledge Scotland

Graham is wearing his “ContentMine: the right to read is the right to mine!” T-shirt for his talk…

As Graham said (via social media) in advance of his  talk, his slides are online.

I am going to briefly talk about Open Data… And I thought I would start with a wee review of when I was at Repository Fringe 2011 and I learned that not all content in repositories was open access, which I was shocked by! Things have gotten better apparently, but we are still talking about Gold versus Green even in Open Access.

In terms of sharing data and information generally many of you will know about PubMed.

A few years a blogger/Diabetes Researcher, Jo Brodie and regular tweeter asked why PubMed didn’t have social media sharing buttons. I crowd-sourced opinion on the issue and sent the results off to David Lipman (who I know well) who is in overall charge of NBCI/PubMed.  And David said “what’s social media ?”.

It took about a year and three follow ups but PubMed Central added a Twitter button and by July 2014, sharing buttons were in place…

Information wants to be out there, but we have various ways in which we stop that – geographical and license restricted streaming of video for instance.

The late great Jean-Claude Bradley saw science as heading towards being led by machines… This slide is about 7 years old now but I sense matters have progressed since then !

JCB

But at times, is not that easy to access or mine data still – some publishers charge £35 to mine each “free” article – a ridiculous cost for what should be a core function.

The Open Knowledge Foundation has been working with open data since 2004… [cue many daft Data pictures about Star Trek: The Next Generation images!].

millennium falcon

We also have many open data repositories, figshare (which now has just under 2 million uploads), etc. Two weeks back I didn’t even realize many universities have data repositories but we also want Repositories Unleashing Data Everywhere [RUDE!] and we also have the new initiative, the Radical Librarians Collective…

les_carr[1]

Les Carr, University of Southampton (data.ac.uk)

SLIDES

The Budapest Open Access Initiative kind of kicked us off about ten years ago. Down in Southampton, we’ve been very involved in Open Government Data and those have many common areas of concern about transparency, sharing value, etc.

And we now have data.gov.uk which enables the sharing of data that has been collected by government. And at Southampton we have also been involved recently in understanding the data, the infrastructure, activities, equipment of academia by setting up data.ac.uk. That is a national aggregator that collects information from open data on every institution… So, if you need data on, e.g. on DNA and associated equipment, who to contact to use it etc.

This is made possible as institutions are trying to put together data on their own assets, made available as institutional open data in standard ways that can be automatically scraped. We make building info available openly, for instance, about energy uses, services available, cafes, etc. Why? Who will use this? Well this is the whole thing of other people knowing better than you what you should do with your data. So, students in our computer science department for instance, looked at building recommended route apps, e.g. between lectures. Also the cross with catering facilities – e.g. “nearest caffeine” apps! It sounds ridiculous but students really value that. And we can cross city bus data with timetables with UK Food Hygiene levels – so you can find where to get which bus to which pub to an event etc. And campus maps too!

Now we have a world of Open Platforms – we have the internet, the web, etc. But not Google – definitely not open. So… Why are closed systems bad? Well we need to move from Knowledge, to Comprehension, to Application, to Analysis, to Synthesis and to Evaluation. We have repositories at the bottom – that’s knowledge, but we are all running about worrying about REF2020 but that is about evaluation – who knows what, where is that thing, what difference does that make…

So to finish I thought I’d go to the Fringe website and this year it’s looking great – and quite like a repository! This year they include the tweets, the discussion, etc. all in one place. Repositories can learn from the Fringe. Loads of small companies desperate for attention, and a few small companies who aren’t bothered at all, they know they will find their audience.

Jisc on Repositories unleashing data – Daniela Duca, Jisc

SLIDES

I work in the research team at Jisc and we are trying to support universities in their core business and help make research process more productive. And I will talk about two projects in this area: UK Research Data Discovery Service and the second Research Data Service.

The Research Data Discovery Service (RDDS) is about making data more discoverable. This is a project which is halfway through and is with UK Data Archive and the DCC. We want to move from a pilot to a service that makes research data more discoverable.

In Phase 1 we had the pilot to evaluate the Research Data Australia developed by ANDS, with contributions from UK data archive, Archeology data centre, and NERC data centres. In Phase 2 Jisc, with support from DCC and UKDA are funding 9 more institutions to trial this service.

The second project, Research Data Usage and Metrics comes out of an interest in the spread of academic work, and in the effectiveness of data management systems and processes. We are trying to assess use and demand for metrics and we will develop a proof of concept tool using IRUS. We will be contributing to and drawing upon a wide range of international standards.

And, with that, we are dispersing into 5 super fast 17 minute breakout groups which we hope will add their comments/notes here – keep an eye on those tweets (#rfringe15) as well!

We will back on the blog at 11.15 am after the breakouts, then coffee and a demo of DMAOnline – Hardy Schwamm, Lancaster University.

And we are back, with William Nixon (University of Glasgow) chairing, and he is updating our schedule for this afternoon which sees our afternoon coffee break shortened to 15 minutes.

Neil and I will be talking about some work we have been doing on linking research outputs. I am based at the British Library working as part of a team working on research outputs.

Linking Data – Neil Chue Hong, Software Sustainability Institute; Rachael Kotarski, Project THOR

Rachael: Research is represented by many outputs. Articles are some of the easier to recognise outputs but what about samples, data, objects emerging from research – they could be 100s of things… Data citation enables reproducibility – if you don’t have the right citation, and the right information, you can’t reproduce that work.

Citation also enables acknowledgement, for instance of historical data sets and longitudinal research over many years which is proving useful in unexpected ways.

Data citation does also raise authorship issues though. A one line citation with a link is not necessarily enough. So some of the work at DataCite and the British Library has been around linking data and research objects to authors and people, with use of ORCID alongside DOIs, URLs, etc. Linking a wide range of people and objects together.

THOR is a project, which started in June, on Technical and Human Infrastructure. This is more work on research objects, subject areas, funders, organisations… really broadening the scope of what should be combined and linked together here.

So the first area here is in research – and understanding the gaps there, and how those can be addressed. And bringing funders into that. And we are also looking at integration of services etc. So one thing we did in ODIN and are bringin into THOR is about connecting your ISNI identifier to ORCID, so that there is a relationship there, so that data stays up to date. And the next part of the work is on Outreach – work on bootcamps, webinars, etc. to enable you to feed into the research work as well. And, finally, we will be looking at Sustainability, looking at how what we are doing can be self-funded beyond the end of the project, through memberships of partner organisations: CERN, DataCite, ORCID, DRYAD, EMBL-EBI, ands, PLoS, Elsevier Labs, Panomia(?). This is an EU funded project but it has international scope and an international infrastructure.

So we want to hear about what the issues are for you. Talk to us, let us know.

Linking Software: citations, roles, references and more – Neil Chue Hong

Rachael gave you the overview, I’m going into some of the detail for software, my area. So we know that software is part of the research lifecycle. That lifecycle relies on the ability to attribute and credit things and that can go a bit wrong for software. Thats because our process is a little odd… We start research, we write software, we use software, we produce results, and we public research papers. Now if we are good we may mention the software. We might release data or software after publication… rather than before…

A better process might be to start the research, to identify existing software, we might adapt or extend software, release software (maybe even publish a software paper), use software, produce results, might release data and public data paper, and then we publish research paper. That’s great but also more complex. Right now we use software and data papers as proxies for sharing our process.

But software is not that simple, the boundaries can be blurry… Is it the workflow, the software that runs the workflow, the software that references the worksflow, the software that supports the software that references the workflow, etc? What’s the useful part? Where should the DOI be for instance? It is currently at programme level but is that the right granularity? Should it be at algorithm level? At library level? Software has the concept of versioning – I’d love our research to be versioned rather than “final” but that’s a whole other talk! But the versioning concept indicates a change, it allows change… but again how do we decide on when that version occurs?

And software also has the problem of authorship – which authors have had what impact on each version of the software? Who has the largest contribution to the scientific results in a paper? So for a project I might make the most edits to a code repository – all about updating the license – so the biggest contribution but would the research community agree? Perhaps not. Now I used to give this talk and say “this is why software is nothing like data” but now I say “this is why software is exactly like data”!

So, the different things happening now to link together these bits and piece. GitHub, Zenodo, FigShare and Institutional Repo looked at “package level” one click deposit, with a citable DOI. There has been work around sword deposit which Stuart Lewis has been looking at too. So you can now archive software easily – but that’s easily – but it’s the social side that needs dealing with. So, there is a brand new working group led by Force11 on Software Citation – do get involved.

And there are projects for making the roles of authors/contributors clearer: Project Credit is looking at Gold/Silver/Bronze levels. But Contributor Badges is looking at more granular recognition. And we also have work on code as a research object, and a project codemeta that is looking at defining minimal metadata.

So that brings us to the role of repositories and the Repository Fringe community. Imperial College for instance is looking at those standards and how to include them in repositories. And that leads me to my question to you: how does the repository community support this sort of linkage?

Q1 – William Nixon) Looking at principles of citation… how do you come up with those principles?

A1 – Neil) Force11 has come up with those citation principles and those are being shared with the community… But all communities are different. So it is easy to get high level agreement, but it is hard to get agreement at implementation details. So authorship changes over time, and changes version to version. So when we create principles for citation do we create all collectively and equally, or do we go the complex route of acknowledging specific individual contributions for a particular version. This causes huge debate and controversy in the open source community about who has the appropriate credit etc. For me, what do we need to deposit? Some information might be useful later in reward lifecycle…. But if I’m lead author will that be a priority here?

Q2 – Paul Walk) My internal hippie says that altruism and public good comes into open source software and I wonder if we are at risk of messing with that sordid research system…

A2 – Neil) I would rebutt that most open source contribution and development is not altruistic. It is people being rewarded in some way – because doing things open soure gives them more back than working alone. I wouldn’t say altruism is the driving force or at least hasn’t been for some time.. It’s already part of that research type system.

Comment) For me this is such a high level of where we are, you are talking about how we recognise contribution, citation etc. but just getting things deposited is the issue for me right now… I’d love to find out more about this but just convincing management to pay for ORCID IDs for all is an issue even…

A2 – Rachael) We do need to get work out about this, show how researchers have done this and the value of those will help. It may not just be through institutions but through academic societies etc. as well..

A2 – Neil) And this is back to the social dimension and thinking about what will motivate people to deposit. And they may take notice of editors… Sharing software can positively impact citations and that will help. Releasing software in the image processing community for instance also shows citations increase – and that can be really motivating. And then there is the economic impact for universities – is there a way we can create studies to show positive reputation and economic impacts on the institution that will prove the benefit for them.

Q3) A simple question – there are many potential solutions for software data… but will we see any benefits from them until we see REF changing to value software and data to the same extent as other outputs.

A3 – Neil) I think we are seeing a change coming. It won’t be about software being valued as much as papers. It will be about credit for the right person so that they are value. What I have seen in research council meetings is that they recognise that other outputs are important. But in a research project credit tends to go to the original writer of a new algorithm perhaps, not the developer who has made substantial changes. So where credit goes matters – the user, implementer, contributor, originator, etc? If I don’t think I will get suitable credit then where is the motivation for me to deposit my software?

EC Open Data Pilot, EUDAT, OpenAIRE, FOSTER and PASTEUR4OA – Martin Donnelly, Digital Curation Centre

I was challenged yesterday by Rachael and by Daniela at Jisc to do my presentation in the form of a poem…

There once was a man from Glasgee
Who studied data policy
In a project called FOSTER
Many long hours lost were
And now he?ll show some slides to ye?

So I will be talking about four European funded projects on research data management and open access that are all part of Horizon 2020. Many of you will be part of Horizon 2020 consortia, or will be supporting researchers who are. It is useful to remind ourselves of the context by which these came about…

Open Science is situated within a context of ever greater transpareny, accessibility and accountability. It is both a bottom up issue: the OA concept was coined about 10 years back in Budapest and was led by the high energy physics community who wanted to be more open in sharing their work, and to do so more quickly.  And it has also been driven from the top through government/funder support, increasing public and commercial engagement in research. To ensure better take up and use of research that has been invested in.

Policy wise in the UK the RCUK has seven Common Principles on Data Policy, size of the RCUK funders require data management plans. That is fitting into wider international policy moves. Indeed if you thought the four year EPSRC embargo timeline was tight, South Africa just introduced a no more than 12 month requirement.

Open Access was a pilot in FP7, this ran from August 2008 until the end of FP7 in 2013. It covers parts of FP7, but it is covers all of FP8/Horizon 2020 although that is a pilot process intended to mainstream by FP9 or whatever it is known by. The EC sees real economic benefit to IA by supporting SMEs and NGOs that can’t afford subscriptions to latest research. Alma Swan and colleagues have written on the opportunity costs which provides useful context to the difference Open Access can make.

Any project with H2020 funding have to make any peer-reviewed journal article they publish in an openly available and free to access, free of charge, via a repository – regardless of how they publish and whether green or gold OA.

H2020 also features an Open Research Data pilot – likely to be requirement by FP9. It applies to data and metadata needed to validate scientific results which should be deposited in a dedicated data repository. Interestingly, whilst data management plans needs to be created 6 months into project, and towards the end, they don’t require them to be filed with the EU at the outset.

So, lastly, I want to talk about four projects funded by the EU.

Pasteur4OA aims to simplify OA mandates across the EU – so that funders don’t have conflicting policy issues. That means it is a complex technical and diplomatic process.

OpenAIRE aims to promote use and reuse of outputs from EU funded research

EUDAT offers common data services through geographically distributed resilient network of 35 European organisations. Jisc and DCC are both working on this, integrating the DCC’s DMP Online tool into those services.

The FOSTER project sis supporting different stakeholders, especially younger researchers, in adopting open access in the context of the European Research Area and to make them aware of H2020 requirements of them – with a big carrot and a small stick in a way. We want researchers to integrate open acces sprinciples and practice in their current research workflow – rather than asking them to change their way of working entirely. We are doing train the trainer type activities in this area and also facilitating adoption, reinforcement and of OA policies within and beyond the EC. Foster is doing this work through various methods, including identifying existing content that can be reused, repackaged, etc.

Jisc Workshop on Research Data Management and Research at Risk Activities, and Shared Services – Rachel Bruce, Daniela Duca, Linda Naughton, Jisc

Rachel is leading this session…

This is really a discussion session but I will start by giving you a very quick overview of some of the work in research at ris as well. But this is a fluid session – we are happy to accommodate other topics that you might want to talk about. While we give you a quick overview do think about an RDM challenge topic you might want to take the chance to talk about.

So, in terms of Research at Risk this is a co-design challenge. This is a process we take forward in Jisc for research and development, or just development end of the spectrum, but to address sector challenges. The challenges facing the sector here is about the fragmented approach to research data and infrastructure. Because of that we are probably not reaching all the goals we would wish to. Some of that relates quite closely to some of what David Prosser was saying yesterday about open access and the benefits of scale and shared services. So, we have been asked to address those issues in Research at Risk.

Within Research at Risk we have a range of activities, one of the biggest is about shared services, including in the preservation and curation gap. You have already heard about discovery and research data usage, also the Research Data Spring.

So, the challenges we want to discuss with you are:

  1. The Shared services for RDM – yesterday there was discussion around the SHERPA services for instance. (Rachel will lead this discussion)
  2. Journal research data policy registry (Linda will lead this session)
  3. Business case and funding for RDM – articulating the role of RDM (Daniela will lead this session)
  4. But also anything else you may want to discuss… (Varsha will lead this group discussion)

So, Shared Services… This is an architecture diagram we have put together to depict all of the key services to support a complete data management service, but also linking to national and international services. And I should credit Stuart Lewis at UoE and John Lewis (Sheffield?) who had done much of this mapping already. We have also undertaken a survey of respositories around potential needs of HEIs. Some responses around a possible national data repository; a call for Jisc to work with funders on data storage requirements for them to provide suitable discipline specific data storage mandate.

Linda: I will talk a bit about the Journal Research Data Policies Registry – you can find out more on our blog and website. We want to create a registry that allows us to turn back time to see what we can learn from OA practices. The aim is to develop best practice on journal policies between publishers and other stakeholders. We want to know what might make your life easier in terms of policies, and navigating research data policies. And that input into this early stage work would be very valuable.

Daniela: The business case and costings for RDM is at a very early stage but we are looking at an agreed set of guidance for the case for RDM and for costing information to support the business case in HEIs for research data management. This reflects the fact that currently approaches to funding RDM services and infrastructure vary hugely, and uncertainty remains… And I would like to talk to you about this.

Rachel: we thought we would have these discussions in groups and we will take notes on the discussions as they take place, and we will share this on our blog. We also want you to write down – on those big post it notes – the one main challenge that you think needs to be addressed which we will also take away.

So, the blog will be going quiet again for a while but we’ll try and tweet highlights from groups, and grab some images of these discussions. As Rachel has said there will also be notes going up on the Jisc Research at Risk blog after today capturing discussions… 

Cue a short pause for lunch, where there was also be a demo taking place from: DMPonline – Mary Donaldson and Mick Eadie, University of Glasgow.

Our first talk of this afternoon, introduced by William Nixon, is:

Unlocking Thesis Data – Stephen Grace, University of East London

This project is for several different audiences. For Students it is about bridging to norms of being a career research, visability and citations. Helping them to understand the scholarly communication norm that is becoming the reality of the world. But this also benefits funders, researchers, etc.

We undertook a survey (see: http://dx.doi.org/10.15123/PUB.4274) and we found several already assigning DOI’s to theses, but others looking to do more in this area. We also undertook case studies in six institutions, to help us better understand what the processes actually are. So our case studies were for University of East London; University of Southampton; LSE; UAL; University of Bristol; and University of Leicester. Really interesting to see the systems in place.

We undertook test creation of thesis DOIs with University of East London and University of Glasgow, and University of Southampton undertook this via an XML upload so a slightly more complex process. In theory all of that was quite straightforward. We were grateful for the Jisc funding for that three month project, it didn’t get continuation funding but we are keen to understand how this can happen in more institutions and to explore other questions: for instance how does research data relate to the theses, what is it’s role, is it part of the thesis, a related object etc?

So questions we have are: What systems would you use and can they create/use persistent identifiers? Guidance on what could/should/must be deposited? One record or more? Opportunities for efficiencies?

On the issue of one record or more, a Thesis we deposited at UEL was a multimedia thesis, about film making and relating to making two documentary films – they were deposited under their own DOIs. Is that a good thing or a bad thing? Is that flexibility good?

Efficiencies could be possible around cataloguing theses – that can be a repeated process for the repository copy and for the library’s copy and those seem like they should be joined up processes.

We would love your questions and comments and you can find all project outputs.

Q1) What is the funder requirement on data being deposited with theses?

A1) If students are funded by research councils, they will have expectations regardless of whether the thesis is completed.

Q2) Have you had any feedback from the (completed) students whose work has been deposited on how they have found this?

A2) I have had feedback from the student who had deposited that work on documentary films. She said as a documentary film maker there are fewer and fewer ways to exhibit those documentary films. As a non commercial filmmaker seeing her work out there and available is important and this acts as an archive and as a measure of feedback that she appreciates

Q3) On assigning ORCID IDs to students – I struggle to think of why that would be an issue?

A3) Theoretically there is no issue, we should be encouraging it.

Comment: Sometimes where there is a need to apply an embargo to a thesis because it contains content in which a publisher has copyright – it may be useful to have a DOI for the thesis and separate DOIs for the data, so that the data can be released prior to the thesis being released from embargo. [Many thanks to Philippa Stirlini for providing this edit via the comments (below)].

IRUS UK – Jo Alcock, IRUS UK

We are a national aggregation service for any UK Institutional Repositories which collects usage statistics. That includes raw download data from UK IRs for all item types within repositories. And it processes raw data into COUNTER compliant statistics. And that aggregation – of 87 IRs – enables you to get a different picture than just looking at your own repository.

IRUS-UK is funded by Jisc. Jisc project and service manage IRUS-UK and host it. Cranfield University undertake development and Evidence Base at Birmingham City University undertake user engagement and evaluation.

Behind the scenes IRUS-UK is a small piece of code that can be added to repository software and which employs the “Tracker Protocol”. We have patches for DSpace, Plug-ins for Fedora, and implementation guidelines for Fedora. It gathers basic data for each download and sends it to the IRUS-UK server. The reports are Report 1 and Report 4 COUNTER compliant. We also have an API and SUSHI-like service.

At present we have around 400k items covered by IRUS-UK. There are a number of different reports – and lots of ways to filter the data. One thing we have changed this year is that we have combined some of these related reports, but we have added a screen that enables you to filter the information. Repository Report 1 enables you to look across all repositories by month – you can view or export as Excel or CSV

As repositories you are probably more concerned with the Item Report 1 which enables you to see the number of successful item download requests by Month and Repository Identifier. You can look at Item Statistics both in tabular and graphical form. You can see, for instance, spikes in traffic that may warrant further investigation – a citation, a news article etc. Again you can export this data.

You can also access IRUS-UK Item Statistics which enable you to get a (very colourful) view of how that work is being referenced – blogged, tweeted, cited, etc.

We also have a Journal Report 1 – that allows you to see anything downloaded from that journals within the IRUS-UK community. You can view the articles, and see all of the repositories that article is in. So you can compare performance between repositories for instance.

We have also spent quite a lot of time looking at how people use IRUS-UK. We undertook a number of use cases around the provision of standards based, reliable repository statistics; reporting to institutional managers; reporting to researchers; benchmarking; and also for supporting advocacy. We have a number of people using IRUS-UK as a way to promote the repository, but also some encouraging competition through newsletters etc. And you can find out more about all of these use cases from a recent webinar that is available on our website.

So, what are the future priorities for IRUS. We want to increase the number of participating repositories in IRUS-UK. We want to implement the IRUS tracker for other repository and CRIS software. We want to expand views of daya and reports in response to user requirements – for instance potentially alt metrics etc. We also want to include supplementary data and engage in more international engagement.

If you want to contact us our website is http://irus.mimas.ac.uk; email irus@jisc.ac.uk; tweet @IRUSNEWS.

Q1) Are the IRUS-UK statistics open?

A1) They are all available via a UK Federation login. There is no reason they could not technically be shared… We have a community advisory group that have recently raised this so it is under discussion.

Q2) How do data repositories fit in, especially for text mining and data dumps?

A2) We have already got one data repository in IRUS-UK but we will likely need different reporting to reflect the very different ways those are used.

Q3) If a data set has more than one file, is that multiple downloads?

A3) Yes.

Q3) Could that be fixed?

A3) Yes, we are looking at looking at separate reporting for data repositories for just this sort of reason.

Sadly Yvonne Howard, University of Southampton, is unable to join us today due to unforeseen circumstances so her session, Educational Resources, will not be going ahead. Also the Developer Challenge has not been active so we will not have the Developer Challenge Feedback session that Paul Walk was to lead. On which note we continue our rejigged schedule…

Recording impact of research on your repository (not impact factors but impact in REF sense!) – Mick Eadie & Rose-Marie Barbeau, University of Glasgow; 

Rose-Marie: Impact is my baby. I joined Glasgow specifically to address impact and the case studies. The main thing you need to know about the impact agenda is that all of our researchers are really stressed about it. Our operating landscape has changed, and all we have heard is that it will be worth even more in future REFs. So, we don’t “do” impact, but we are about ensuring our researchers are engaging with users and measuring and recording impact. So we are doing a lot of bridging work, around that breadcrumb trail that explains how your research made it into, e.g. a policy document…

So we have a picture on our wall that outlines that sort of impact path… showing the complexity and pathways around impact. And yet even this [complex] picture appears very simple, reality is far more complicated… When I talk to academics they find that path difficult: they know what they do, they know what they have to show… so I have to help them understand how they have multiple impacts which may be multiple impacts, it might be be by quite a circuitous route. So for instance in a piece of archeological work impacted policy, made Time Team, impacted the local community… Huge impact, extension international news coverage… But this is the form for REF processes…

But my big message to researchers is that everything has changed: we need them to engage for impact and we take that work seriously. It’s easy to say you spoke to schools, to be part of the science festival. We want to capture what these academics are doing here professionally, things they may not think to show. And we want that visible on their public profile for example. And we want to know where to target support, where impact might emerge for the next REF.

So, I looked at other examples of how to capture evidence. Post REF a multitude of companies were offering solutions to universities struggling to adapt to the impact agenda. And the Jisc/Coventry-led project establishing some key principles for academic buy in – that it needed to be simple and very flexible – was very useful.

And so… Over to the library…

Mick: So Rose-Marie was looking for our help to capture some of this stuff. We thought EPrints might be useful to capture this stuff. It was already being used and our research admin staff were also quite familiar with the system, as are some of our academics. We also had experience of customising EPrints. And we have therefore added a workflow for Knowledge Exchange and Impact. We wanted this to be pretty simple – you can either share “activity” or “evidence”. There are a few other required fields, one of which is whether this should be a public record or not.

So, when an activity/evidence is added the lead academics have can be included, as can any collaborating staff. The activity details follow the REF vocabulary. We include potential impact areas for instance… And we’d like for that record to be linked to other university systems. But we are still testing this with research admin staff.

We still have a few things to do… A Summary page; some reporting searching and browsing functionality – which should be quite easy; link to other university systems (staff profiles etc); and we would like to share this with the EPrints community.

Q1) What about copyright?

A1 – Rose-Marie) Some people do already upload articles etc. as they appear. The evidence repository is hidden away – to make life easier in preparing for the next REF – but the activity is shared more publicly. Evidence is

Q2 – Les) It’s great to hear someone talking about impact in a passionate and enthuastic way! There is something really interesting in what you are doing and the intersection with preservation… In the last REF there was evidence lost that had been on the web. If you just have names and URLs, that won’t help you at the end of the day.

A2 – Rose-Marie) Yes, lack of institutional memory was the biggest issue in the last REF. I speak a lot to individuals and they are very concerned about that sort of data loss. So if we could persuade them to note things down it would jog memories and get them in that habit. If they note disappearing URLs that could be an issue, but also I will scan everything uploaded because I want to know what is going up there, to understand the pitfalls. And that lets me build on experience in the last REF. It’s a learning process. We also need to understand the size of storage we need – if everyone uploads every policy document, video etc. It will get big fast. But we do have a news service and our media team are aware of what we are doing, and trying to work with them. Chronological press listings from that media team isn’t the data structure we would hope for so we are working on this.

William) I think it is exciting! As well we don’t think it’s perfect – we just need to get started and then refine and develop that! Impact did much better than expected in the last REF, and if you can do that enthusiastically and engagingly that is really helpful.

A2 – Rose Marie) And if I can get this all onto one screen that would be brilliant. If anyone has any questions, we’d love to hear them!

Impact and Kolola – Will Fyson, University of Southampton

I work for EPrints Services but I also work for Kolola, a company I established with co-PhD students – and very much a company coming out of that last REF.

The original thinking was for a bottom up project thinking about 50 or 60 PhDs who needed to capture the work they were doing. We wanted to break down the gap between day to day research practice and the repository. The idea was to allow administrators to have a way to monitor and plan, but also to ensure that marketing and comms teams were aware of developments as well.

So, our front page presents a sort of wall of activity, and personal icons which shows those involved in the activity. These can include an image and clicking on a record takes you through to more information. And these records are generated by a form with “yes” or “no” statements to make it less confusing to capture what you have done. These aren’t too complex to answer and allow you to capture most things.

We also allow evidence to be collected, for instance outreach to a school. You can also capture how many people you have reached in this activity. We allow our community to define what sort of data should be collected for which sort of activity. And analytics allow you to view across an individual, or a group. That can be particularly useful for a large research group. You can also build a case study from this  work – useful for the REF as it allows you to build up that case study as you go.

In terms of depositing papers we can specify in the form that an EPrints deposit is required when certain types of impact activities are recorded – and highlight if that deposit has been missed. We can also export a Kolola activity to EPrints providing a link to the Kolola activity and any associated collections – so you to explore related works to a particular paper – which can be very useful.

We’ve tried to distribute a research infrastructure that is quite flexible and allow you to have different instances in an organisation that may be tailored to different needs of different departments or disciplines. But all backed up by the institutional repository.

Q1) Do you have any evidence of researchers gathering evidence as they go along?

A1) We have a few of these running along… And we do see people adding stuff, but occasionally researchers need prompting (or theatening!), for instance for foreign travel you have to be up to date logging activity in order to go! But we also saw an example of researchers getting an entry in a raffle for every activity recorded – and that meant a lot of information was captured very quickly!

(Graham Steel @McDawg taking over from Nicola Osborne for the remainder of the day)

Demo: RSpace – Richard Adams, Research Space

 

RSpace ELN presentation and demo. Getting data online as early as possible is a great idea. RSpace at the centre of user data management. Now time for a live demo (in a bit).

Lab note books can get lost due to a number of reasons. Much better is an electronic lab book. All data is timestamped. Who made what changes etc. are logged. Let’s make it easy them use. Here’s the entry screen when you first log in.  You can search for anything and it’s very easy to use. It’s easy to create a new entry. We have a basic document into which you can write content with any text editor. You can drag and drop content in very simply. Once documents have been added they appear in the gallery. Work is saved continuously and timestamped.

We also have file stores for large images and sequencing files.

NOW A LIVE DEMO.

It’s very easy to configure. Each lab has it’s own file server. Going back to workspace, we’re keen to make it really easy to find stuff. Nothing is ever lost or forgotten in workspace. You can look at revision history. You can review what changes have been made.  Now looking at a lab’s group page. You can look at but not edit other user generated content. You can invite people to join your group and collaborate with other groups. You can set permission for individual users. One question that comes up often is about how to get data out of the system. Items are tagged and contain metadata making them easier to find. To share stuff, there are 3 formats for exporting content (ZIP, XML and PDF).

The community edition is free and uses Amazon web services. We’re trying to simplify RSpace as much as possible to make it really easy to use. We are just getting round to the formal launch of the product but have a number of customers already. It’s easy to link content from the likes of DropBox. You can share content with people that are not registered with an RSpace account. Thanks for your attention.

Q1) I do lot’s of work from a number of computers.

A1) We’re developing an API to integrate such content. Not available just yet.
Closing Remarks and presentation to winner of poster competition – Kevin Ashley, Digital Curation Centre

I’m Kevin Ashley from Digital Curation Centre here in Edinburgh. Paul Walk mentioned that we’ve done RFringe events for 7 years. In the end, we abandoned the developer challenge due to a lack of uptake this year. Do people still care about it ? Kevin said there is a sense of disappointment. Do we move on or change the way we do it ? Les says I’ve had a great time, it’s been one of the best events I’ve been to for quite some time. “This has been fantastic”. Thanks Paul for your input there said Kevin.

David Prosser’s opening Keynote was a great opening for the event. There were some negative and worrying thoughts in his talk. We are good at identifying problems but not solutions. We have the attention of Governmental department in terms of open access and open data. We should maximize this opportunity before it dissapears.

Things that we talked about as experiments a few years ago have now become a reality. We’re making a lot of progress generally. Machine learning will be key, there is huge potential.

I see progress and change when I come to these events. Most in the audience had not been to RFringe before.

Prizes for the poster competition. The voting was quite tight. In third place LSHTM, Rory. Second place. Lancaster. First place, Robin Burgess and colleagues.

Thanks to all for organizing the event. Thanks for coming along. Thanks to Valerie McCutcheon for her contribution (gift handed over). Thanks to Lorna Brown for her help too. Go out and enjoy Edinburgh ! (“and Glasgow” quipped William Nixon).

 

EDINA Geo Services at GeoDATA London Showcase 2014

Early this month, EDINA Geodata Services held an exhibit at the GeoDATA Showcase 2014 event in London. This was our second time to exhibit at this event which is aimed primarily at the commercial end of the GI industry covering current data and technology topics. This follows on from other events in the series as described previously on the GoGeo Blog.

A summary of the talks can be found online.

We had a small stand, but the positive responses we got from visitors was very encouraging: from students who are currently using Digimap in their studies, to the lecturer in a university who said that Digimap was a great resource and essential to his teaching. Even more encouraging was the number of delegates and staff on other stands, with successful careers in the GI industry, who came up and said that they had used Digimap during their studies and it was a vital to their degree. It’s good to know that the future generations in the GI industry have the expectation that they will have easy access to high quality geospatial data, readily available from Digimap (at least while they are in education!).

We talked to delegates from a wide range of industries including environmental consultancies, government, data providers, local councils, defence and education as well as visiting and talking to many of the other exhibitors. We got a lot of useful feedback on what we’re doing and ideas for what we could be doing in the future including potential opportunities for collaboration. Of particular interest to delegates was the Fieldtrip GB app we were demonstrating which is a mobile data collection platform – especially once the magic word ‘free’ was mentioned, and also that there is an Open version available on Github.

Mince pies and mulled wine near the end were a welcome break from a long day, so busy that we didn’t actually get a chance to attend any of the talks, many of which looked very interesting, however it was a very useful event to attend. We look forward to next year’s event on the 3rd December 2015.

Inaugural Scottish QGIS user’s Group

QGIS UK

QGIS UK

“Today we have a guest blog post from one of the Geo-developers at EDINA.  Mike works as part of the data team and is usually up to his oxters in databases ensuring that the data offered through Digimap is both up to date and in a useful format. Over to Mike.”

Following on from successful meetings In England and Wales, on 19th March I attended the inaugural “Scottish QGIS User Group” hosted at Stirling University. My first thought revolved around  the level of interest that such a meeting would acquire, but as it turned out, it was very popular. I was also surprised at the geographical spread of the attendees, with several folks coming from Brighton (Lutra Consulting) and  Southampton (Ordnance Survey) as well as all over Scotland & northern England. Although the attendees were dominated by public sector organisations.

Talks/Presentations:

A more detailed breakdown of the presentations can be found here: http://ukqgis.wordpress.com/2014/03/25/scottish-qgis-user-group-overview/

From my own perspective, the talks on developing QGIS and Cartography in QGIS were of particular interest – demonstrating the every growing potential of QGIS. Additionally, the improvements (particularly speed enhancements)  that look to be coming soon (as highlighted in Martin Dobias’ presentation) are impressive.

As for the user group itself, it will be interesting to see where it goes from here and what direction it will take. How will future events be funded? How often should the group meetup? What location? A recommendation from myself would be to have general presentations and talks in the morning, then in the afternoon split into different streams for beginners / users / developers.

At the end of the meet-up (and a few geo-beers in the pub) there was definitely a sense that everybody got something out of the event and would like to attend more meetups in the future.

A special mention of thanks needs to go out to Ross McDonald – @mixedbredie (Angus Council) for his efforts to organise the event and additionally thinkWhere (formally Forth Valley GIS) for sponsoring the event.

Links and seful things

FOSS4G – a developers review – part4

The 4th and final EDINA developers eye view of FOSS4G 2013.  This one is from Tim Urwin who is the Digimap Service Manager.  Tim has been working at EDINA pretty much from the start of it’s internet mapping adventure and has seen software and toolkits come and go.

Who are you?
My name is Tim and I’m the senior GI Engineer at EDINA in charge of the Data Team and I’m the Operation Service Manager for Digimap. My interest in attending FOSS4G centred around three key components of the Digimap Service: WMS Servers, WMTS options and Database, although I delegated most of the latter to Mike due timetable clashes.

What did you hope to get out of the event?
My aim was to catch up on the latest state and future options of current software used by EDINA services and to find out more about the various open source WMTS options available.

Top 3 things? (Ed – no trains Tim!)

 

  • Chris Tucker’s MapStory keynote was inspirational and well-presented and it is certainly a site I’ll be tracking to see where it heads.
  • Ben Henning’s key note on think before you act for cartography was quite thought provoking.
  • Paul Ramsey’s PostGIS Frenzy talk was as funny as it was informative, and I only caught the re-run. Lots of good information combined with useful tips. (Ed- Paul’s a star, there wasn’t enough room for everyone first time round so he kindly offered to repeat the talk)
  • Honourable mention must go to the Festival of the Spoke Nerd – very, very funny
What will you investigate further?
MapCache and MapProxy WMTS software to replace our existing tile caching option and catch up with all the presentations I couldn’t attend due to timetable clashes. (Ed – remember that all the talks (hopefully) will be available on the FOSS4G YouTube channel when we get them sorted and uploaded)
One closing thought is that it was heartening to see that despite all the professional headaches that Digimap has caused me over the years that our approach to and delivery of the service has been validated as several leading data supply agencies have very similar service architectures. Built with Open Source software at the core, although there with some proprietary components for certain tasks. The primary differences being in WMS and caching software options, although they’ll be closer aligned once we upgrade to a more modern tile caching platform. Now only if we could also have their hardware – they have significantly larger number of servers :)

Oh and as I wasn’t allowed this in my Top 3 – seeing 45108 running again after 16 years of hard work by its custodian group.

FOSS4G – a developers review part 2

Photo – Addy Pope

This is the second part of EDINA’s developer review of FOSS4G 2013.  This time it is Mike Gale who will be providing his opinion on what was presented.

Who are you:

Michael Gale – GIS Engineer / Member of the EDINA’s Data Team. My job is to essentially deal with the vast quantities of GIS data we utilise at EDINA. I translate, modify, split and twist the data we receive into types and formats that our services such as Digimap can then offer to our users. I heavily use the Swiss army knife – GIS command line tools of GDAL/OGR and additionally Safe FME, Shell Scripting, Python & PostGIS.

What you hoped to get out of the event?

To discover the latest and greatest ways to utilise the tools I already use. I was keen to evaluate what advances and benefits PostGIS 2.0 could offer – particularly with 3D data, LiDAR point clouds & pgRouting. Additionally I wanted to discover new ways of integrating Python into my workflows.
Top 3 things you saw at the event (not the food or beer….)

(1) Chris Tucker keynote – MapStory.org

MapStory.org is a new website that empowers a global user community to organise knowledge about the world spatially and temporally. It is essentially a social media platform where people can crowd source geospatial data and create “MapStories” with spatio-temporally enabled narratives. The best way to figure out what that all means is to check out the website!!

(2) Cartopy & Iris – Open Source Python Tools For Analysis and Visualisation – Dr Edward Campbell (Met Office)

Cartopy is a new python mapping library for the transformation and visualisation of geospatial vector and raster data. The library offers the ability for point, line, polygon and image transformations between projections and a way to visualise data with only a few snippets of python code. Iris is a python library that specifically deals with analysing and visualising meteorological and oceanographic datasets, particularly 3D and temporal data.

(3) LiDAR in PostgreSQL with Pointcloud – Paul Ramsey

PostGIS support for LiDAR data has been non-existent until now. Paul Ramsey has created a new spatial data type for PostGIS 2.0 that now offers the ability to import huge amounts of point cloud data, and additionally analyse the information with several new postgis functions. Pretty impressive.

(4) I’ll throw a comedy one in as well: “Up all night to get Mapping”: http://www.youtube.com/watch?v=_EEVYHUQlkU

Editors note: view at your own (ears) risk.

1 thing that you are definitely going to investigate further

The IRIS and Cartopy Python libraries.

Thanks Mike.  I hope to add another couple of review next week.  My overview, with links to as many reviews as i could find, can be found HERE

 

FOSS4G – a developers review part 1

Panos – Edina Developer

As well as being part of the Local organising committee, EDINA sent a number of developers to FOSS4G.  In the first of a series of guest posts we find out what the developers thought of the event and what they will be following up.

First up is Panos. Panos graduated with an MSc in GIS from Edinburgh University 3 years ago and has been working for the geo team at EDINA since.

Who am I and in what I am interested in?

I am Panos and work in EDINA as software engineer. I maintain a service called UK Data Service Support and I am working on a project an EU FP7 project called COBWEB which focuses on mobile GIS development and sensor data. As you can see from my background I am mainly interested on mobile development, GEOSERVER and sensor data frameworks. I managed to attend most of the presentations that have to do with these topics.

What was I expecting?

I was expecting to see some more alternative mobile development solutions from the ones we use here in EDINA (Openlayers, jquery mobile, phonegap) and some more applications on sensor web. I am quite happy that I discovered some new software such as 52North and the fact that other people developed their mobile app with a similar way to us. So, let’s take them one by one:

Mobile development:

  • Most of the projects focused around OpenLayers mobile/leaflet/jquery mobile/sencha touch and phonegap.  EDINA have used a similar blend of technologies in our mobile app, Fieldtip GB. There were many similarities in how they designed their apps, the feedback they received from users, the workflow they followed and the problems they had with touch events on different devices.
  • The outcome is that they would take a similar approach but they would perhaps try an alternative to phonegap.
  • One smart approach they had on visualizing lots of vector data on a small screen was to use MapProxy to merge raster and vector data to deliver a WMS.  The touch event of the users then searches for the closest feature and the app asks for the corresponding WFS returning information for the correct feature.

GEOSERVER:

  • Geoserver 2.4.0 has some new interesting features. The most interesting for me is a monitoring system for checking what kind of users are using the app and what kind of data they are accessing. It’s a nice solution for monitoring the use you have on GEOSERVER and there is even a GUI for it.  I plan to investigate how we might implement this in the UK Data Service Support.

Sensor Web:

  • Unfortunately, the work that has taken place on this is quite limited. It’s mainly about hydrology.
  • North52 (https://github.com/52North/) seems like a promising framework that can adapt to all different scenarios about sensor data. Some people have used for covering the scenario of if someone should go for hiking by considering factors such as birch pollen, meteorology and air quality. This may be useful for COBWEB.

Following up:

I’ll definitely try to investigate the new GEOSERVER functionality and 52North framework in order to see how I can benefit from them in my new projects. I’ll keep you posted with my progress. I would also like to say that these 3 presentations that I watched are not the only one that I found interesting. There are more that are equally interesting such as leaflet, geonode, ZOO project, cartoDB, iris project and cartopy.  You should be able to watch these through ELOGeo in a couple of weeks.

FOSS4G – after the dust settles

Olympics of Geo?

FOSS4G 2013 has been and gone. What can i say, it seemed to go well. It is is hard to tell when you are so involved in organising an event as you notice all the little things that didn’t quite go as intended and you tend to be trying to do a hundred things at the same time. Archaeogeek has written an excellent post about the event from an organisers point of view so i wont repeat that here. Highlights. There are so many to choose from, seeing 200 people make, and then wear, Robin Hood hat at the ice breaker or seeing delegates sitting cross-legged on the floor when all the seats and stairs had already been filled. But here are my top 3:

  • OpenLayers 3 showcase – OpenLayers is awesome and version 3 looks like it will reinforce OpenLayers place as one of the best open source web mapping out there.  New features include map rotation with tilt features “in the pipeline”.
  • QGIS 2.0 Dufour – Quantum GIS is dead, long live QGIS.  The latest version is slicker and packs more features than before. Download it now and start exploring it. You can see some of the cool stuff in this slideshare.
  • Paul Ramsey – the man behind PostGIS did more talks than anyone else, re-running one that was so popular that we couldn’t squeeze everyone in.  His closing Plenary was a call for us to become “open source citizens”.  Certainly one of the most inspirational presentations i have seen in a long time.
  • OK, so this makes it a top 4, but it is a worthy inclusion.  Arnulf Christl winning the Sol Katz award.  Long overdue and a true hero of the OSGeo world.

and the winner is…….

So what’s next?  Well, I hope to post a number of short “reviews” written by people who attended the event which will have their own top 3 lists.  We, the organisers, hope to make all the talks available through EloGeo so that anyone can see what was presented at FOSS4G.  In the meantime, you can scroll through the 4500 tweets from the event if you have the stamina.

FOSS4G 2014 will be held in Portland. Looking forward to it already, just have to work out how to get there……

Other write-ups of the event:

A big thanks to everyone who made this possible, all the LOC team, you know who you are, the volunteers and the staff at the East Midlands Conference Centre. 

FOSS4G 2013 – 5 reasons you should attend

FOSS4G is the annual conference for anyone interested in Free and Open Source Software 4 Geospatial.  FOSS4G 2013 will be held in Nottingham between the 17th and 21st September. So what makes FOSS4G so important and why should you attend?

  1. Network – FOSS4G is the biggest gathering of developers and users of open geospatial software.  There will be over 700 people at the conference. This includes the lead developers on some of the larger open source projects such as OpenLayers and QGIS.
  2. Learn – You’ll learn a lot in a very short period of time.  No matter what your knowledge of open source geo from beginner to expert coder/developer you will learn something new at FOSS4G.  There are workshops for all levels that you can sign up to.
  3. Inspiration – You will be inspired by some of the major names in GIS and data analysis. The list of keynote speakers includes Paul Ramsey (co-founder of PostGIS), Kate Chapman (Acting Director of humanitarian team at OpenStreetMap) and Ben Hennig (Worldmapper Project).  For a full list of Keynote speakers, please refer to the FOSS4G keynote page.
  4. Double the fun – Visit AGI GeoCommunity’13 at the same time. Yes, that’s right FOSS4G and AGI GeoCommunity are happening in the same venue on the same week. This was no accident. GeoCommunity is a great event and the FOSS4G organisers wanted to bring the two audiences together. GeoCommunity’13 runs from the 16th to the 18th September.
  5. Can you afford to miss it?  – What does this mean?  Well, the conference package is quite reasonable given the number and diversity of talks on offer.  £165 for a day pass or £435 for the whole event (3 days and the FOSS4G Gala Night).  FOSS4G was last in Europe back in 2010 and it might not be back until 2017 as it moves between continents. So, if you are based in Europe attending FOSS4G might not be as easy for a number of years.

So, there are 5 pretty good reasons to attend.  I am sure there are many other reasons to come along.  To find out everything that will be going on at FOSS4G please look at the conference website and follow the event on twitter through the #FOSS4G hashtag.

FOSS4G 2013 takes place between the 17th – 21st September 2013 and will be held at the East Midlands Conference Centre, which is situated on The University of Nottingham campus. 

OSGIS 2012 – Day 2

OSGeo_logo

The second day of OSGIS 2012 saw a full day of short paper presentations and a couple of workshops.  The day started with a keynote from Prof. David Martin, University of Southampton.  David is  Director of the ESRC Census Programme and his talk looked at the data that will come out of the 2011 census. It also discussed the future of census programs in the UK.  The take-away points for David’s talk included:

  • Lots of new fields such as “do you intend to remain in UK?”
  • 16th July 2012 – age/sex distribution LADs released
  • Nov 2012 – release to the OA level which will be of interest for Geographers
  • Spring 2013 – multivariate stats and some new stuff like time dependant location data which will be interesting for disaster management/response and answering questions such as “who is where/when?”
  • Access to longitudinal data and data about individuals will still be restricted to secure labs

David made some interesting points including crediting the CDU in Manchester for making the census data far easier to access and analyse.  The data is in excel format and has the crucial area codes which we geographers love.  

He showed some analysis of work place zones which modifies the census units based on where people are during the day (work place) which should make disaster planning more efficient. It was also noted, light-heartedly, that this could be used to determine where to locate your burger van during the week.  

Next up was Ian James, Technical Architect for the Ordnance Survey. Ian’s presentation was on how the OS was embracing the open source opportunity.  The OS now use open source solutions for internal activity and client-facing interfaces.  It took a while to convince the whole organisation that open source solutions were more than capable of handling large and valuable datasets.  It is now clear that some open source solutions are in fact better than their proprietary counterparts.  However, Ian stressed that open source was not free.  There is always a cost associated with software, with open source solutions there is no up-front licence fee, but there is cost associated with training users and administrators or buying 3rd party support.

After coffee, the conference split into parallel strands, I switched rooms to catch certain presentations and my write up will reflect this.  You should be able to watch the presentations on the OSGIS 2012 website.

Matt Walker, Astun Technology demonstrated the open source system Loader, a simple GML loader written in Python that makes use of OGR 1.8.   Matt showed us how Astun were providing TMS/WMS for various clients and how they managed to run it all through Amazon web services.  Top tips from Matt included:

  • Amazon web services are great, you can even have fail-over instances, but be sure to manage your system or risk running up bills quite quickly
  • Use PGDump to increase postgres load times (4x quicker)
  • MapProxy rocks
  • UbuntuGIS makes life easy
Next up was Fernando Gonzalez who presented the possibilities of Collaborative geoprocessing with GGL2.  GGL2 is an evolution of GGL which was a scripting application for GIS.  GGL2 makes scripts much simpler, fewer lines of code makes it easier us humans to read.  GGL2 is available as a plugin for gvSIG and QGIS.  If you want to find out more about GGL2 the look at gearscape.org
EDINA’s Sandy Buchanan gave a demonstration of Cartogrammer which is an online cartogrammer application. It allows users to upload shapefiles and KML files and then create cartograms.  This is very neat and really does remove the technical barrier in producing interesting info-graphics.  The service makes us of ScapeToad and is available as an online service, a widget and an api which can be called from your own website.  We will let you know when it goes live.
Anthony Scott of Sustain gave an excellent presentation on the work he has been doing for MapAction.  If you don’t know what MapAction is or what they do, they provide mapping and GIS services areas that have suffered natural and humanitarian disasters.  Infrastructure is important if aid is to be delivered and this requires knowledge of the what is on the ground at the time, and in some cases, what is left. Take 5 minutes to look at their website and if it sounds like something you would like to support, hit the big red donate button.
Jo Cook, Astun Technology, looked at how you might use open source software and open data to do something useful.  She looked at taking GeoRSS feeds from sites such as NHS Choices and PoliceUK to extract location specific information, link it with other open data and then make this publicly available. According to Jo, you can do quite a lot with very basic python scripting. The last slide of Jo’s presentation has a list of useful resources, seek it out when it is made available on the OSGIS website.
The best presentation prize went to Ken Arroyo Ohori, TU, Delft. Ken demonstrated some code that he had written which fixed overlapping and topologically incorrect polygons.  PPREPAIR looks brilliant and is available in GitHub.  Ken plans to make it into a QGIS plugin when he has time, i think this will be really useful.  Nice aspects include being able to set a “trusted” polygon class which would be assumed to be correct if two polygons intersected.  Ken demonstrated ppgrepair’s capabilities fixing polygons along the Spanish/Portuguese Border. Because two mapping agencies have mapped the border independently, when you combine the two datasets you get horrible overlaps. Ken’s presentation was clear and informative and his ppreapir really does look useful.
The event finished with Steve Feldman of KnowWhere Consulting.  Steve has been working in GIS for many years, but is, by his own admission, not a techie.  He approaches the subject with a business hat on and it is useful to hear this perspective.   Steve reiterated the point that Open Source was not Free software.  It is commercial software with no massive up front lumps sum and no long term contract. You can pay for implementation and support.  You can fund developments that you want, rather than functionality you dont need. Steve suggested that the “Free” was a confusing term, but a member of the audience suggested that Free also related to not being tied to a contract or service provider.  You can opt in and out as you wish.
Feldman

FOSS4G 2013

Steve then took the opportunity to officially launch FOSS4G 2013, which will be held in Nottingham in September next year.  This event will be huge and is definitely one to put in the calendar now and make sure you get along to it.  There will be over 500 delegates from around the world all focused on doing more with open source geospatial tools.  In fact, better than that, volunteer to help at the event.  The local organising committee needs extra people to help make FOSS4G 2013 a success. If you want to help, pledge your support on the pledge page and someone from the loc will get back to you.
 So, another great event.  Thanks to Suchith, Jeremy and their team for making it happen.  OSGIS will not happen in 2013, but FOSS4G will more than make up for it.

 

GISRUK 2012 – Thursday

The second part of GoGeo’s review of GISRUK 2012 covers Thursday. If you want to find out what happened on Wednesday, please read this post

Thusrday saw a full programme of talks split between two parallel sessions.  I chose to go to the Landscape Visibility and Visualisation strand.

  • Steve Carver (University of Leeds) started proceedings with No High Ground: visualising Scotland’s renewable landscape using rapid viewshed assessment tools. This talk brought together new modeling software that allowed for multiple viewsheds to be analysied very quickly, with a practical and topical subject.  The SNP want Scotland to be self-sufficient with renewable energy by 2020.  An ambitious target. In 2009, 42% of Scotlands “views” were unaffected by human developments, this had declined to 28% by 2011.  Wind farms are threatening the “wildness” of Scotland and this may have implications on tourism.  Interestingly, the SNP also wants to double the income from tourism by 2020. So how can you achieve both?  By siting new wind farms in areas that do not further impact on the remaining wild areas.  This requires fast and efficient analysis of viewsheds which is what Steve and his team presented.
  • Sam Meek (University of Nottingham) was next up presenting on The influence of digital surface models choice on the visibility-based mobile geospatial application.  Sam’s research focused onan application called Zapp.  Sam is looking at how to efficiently and accuretly run visibility models on mobile devices in the field and how the results are influenced by the surface model.  In each case, all processing is done on the device. Resampling detailed DTM’s is obviously going to make processing less intensive, however this often leads to issues such as smoothing of features.  Other general issues with visibility models are stepping, where edges form in the DTM and interupt the line of sight and an over estimation of vegetation.  This research should help make navigation apps on mobiles that use visual landmarks to guide the user, more accurate and usable.
  • Possibly the strangest and most intruging paper title at GISRUK 2012 came from Neil Sang (Sweedish University of Argicultural Science) with New Horizons for the Standford Bunny – A novel method for view analysis.  The “bunny” reference was a bit of a red herring but the research did look at horizon based view analysis.  The essence was to identify horizons in a landscape to improve the speed of viewshed analysis as the horizons often persisted even when the local position changed.
  • The final paper of the session took a different direction with David Miller of The James Hutton Institute looking at Testing the publics preferences for future. This linked public policy with public consultations through the use of virtual reality environments.  The research investigated whether familiarity with the location altered the opinion of planned changes to the landscapes.  Findings showed agreement in developing amenity woodland adjacent to a village, and environmental protection, but differences arose in relation to proposals for medium-sized windfarms (note – medium-sized wind farms are defined as those that would perhaps be constructed to supply power to a farm and not commercial windfarms).

After coffee I chose to go to the Qualitative GIS session as it provided an interesting mix f papers that explored social media and enabling”the crowd”.

  • First up was Amy Fowler (Lancaster University) who asked How reliable is citized-derived scientific data?  This research looked at the prevelance of aircraft contrails using data derived through the Open Air Laboratories (OPAL) Climate Survey. Given the dynamic nature of the atmosphere, it is impossible to validate user contributed data. Amy hopes to script an automated confidence calculator to analyse nearly 9,000 observations, but initial analysis suggests that observations that have accompanying photographs tend to be more reliable.
  • Iain Dillingham (City University) looked at Characterising Locality Descriptors in crowd-sourced information.  This specifically focused on humanitarian organisations. Using the wealth of data available from the 2010 Haiti earthquake they investigated the uncertainty of location from social media. They looked at georeferencing locality descriptors in MaNIS (Mammal Network Information System).  The conclusion was that while there were similarities in the datasets, the crowd-sourced data presented significant challenges with respect to vagueness, ambiguity and precision.
  • The next presentation changed the focus somewhat, Scott Orford (Cardiff University) presented his work on Mapping interview transcript records: technical, theoretical and cartographical challenges. This research formed part of the WISERD project and aimed to geo-tag interview transcripts .  Geo-tagging was done using UNLOCK but there were several issues with getting useful results out, or reducing the noise in the data.  Interview scripts were transcribed in England and complicated Welsh placename spellings often got transcribed incorrectly.  In addition, phrases such as “Erm” were quite frequent and got parsed which then had to be removed as they did not actually relate to a place. Interesting patterns did emerge about what areas appeared to be of interest to different people in different regions of Wales, however care had to be taken in preparing the dataset and parsing it.
  • Chris Parker (Loughborough University) looked at Using VGI in design for online usability: the case of access information. Chris used a number of volunteers to collect data on accessibility to public transport. The volunteers might be considered an expert group as they were all wheel-chair users.  Comparison was made between an official map and one that used the VGI data. It was found that the public perception of quality increased when VGI data was used making it an attractive and useful option for improving the confidence of online information. However, it would be interesting to look at this issue with a more mixed crowd of volunteer, rather than just the expert user group who seemed to have been commission (but not paid) to collect specific information. I am also not too sure where the term Usability from the title fits.  Trusting the source of online data may increase it use but this is not usability which refers more to the ability of users to engage with and perform tasks on an interface.

There was a good demonstration from ESRI UK of their ArcGIS.com service.  This allows users to upload their own data, theme it and display it against one of a number of background maps. The service then allows you to publish the map and restrict the access to the map by creating groups.  Users can also embed the map into a website by copying some code that is automatically created for you. All good stuff, if you want to find out more about this then have a look at the ArcGIS.com website.

Most of Friday was given over to celebrating the career of Stan Openshaw.  I didn’t work with Stan but it is clear from the presentations that he made a significant contribution to the developing field of GIS and spatial analysis and had a huge effect on the development of many of the researchers that regularly attend GISRUK.  If you want to find out more about Stan’s career, have a look at the Stan Openshaw Collection website.

Friday’s keynote was given by Tyler Mitchel who was representing the OSGeo community.    Tyler was a key force in the development of the OSGeo group and has championed the use of open software in gis.  Tyler’s presentation focused on interoprability and standards and how they combine to allow you to create a software stack that can easily meet you GIS needs.  I will try to get a copy of the slides of Tyler’s presentation and link to them from here.