Galaxy Zoo

This morning I am at the first seminar arranged by the University of EdinburghÂ Citizen Science and Crowdsourced Data and Evidence Network. The Network brings togetherÂ those interested in citizen science and crowdsourcing from across the organisation and this event is also supported by theÂ Academic Networking Fund, IAD. Today’s seminarÂ looks at the Zooniverse crowdsourcing organisation and suite of projects with two guest speakers, and I’ll be taking live notes here. As usual, because these are live notes there may be errors, typos etc and corrections are welcomed.Â

We are starting our day with an introduction by James Stewart on the focus of the network, which will particularly focus on methodological approaches.

Grant Miller (Zooniverse): â€˜The Zooniverse – Real Science Onlineâ€™

About Grant and his talk:

â€˜The Zooniverse is the world’s largest and most successful citizen science platform. I will discuss what we have learned from building over 40 projects, and where the platform is heading in the future.â€™

(Website:Â https://www.zooniverse.org/)

Grant Miller is a recovering astrophysicist who gained his PhD from the University of St Andrews, searching for planets orbiting distant stars.Â He is now the communications lead for the Zooniverse on-line citizen science platform.

I had kind of a weird introduction into crowdsourcing and citizen science.. But the main thing I will be talking about today is about how we engage the Zooniverse community to participate and enjoy doing that and being part of our community.

Zooniverse all started with Kevin, a student at Oxford who was tasked with looking at thousands of images of the universe to find two sorts of galaxies: eliptical galaxies and spiral galaxies. He had a million to classify. He did 50,000 and then met with his supervisor and had some strong arguements: he didn’t want to spend his whole academic career classifying galaxies, and he argued that it didn’t require his training. So, by show of hands who thinks this image of a galaxy (we are looking at one of many) is an eliptical, how many think it is a spiral? The room votes that this is a spiral and it is indeed a spiral – and that’s basically how Zooniverse works. We show an image, we ask people what it is, and they choose. And people, en mass, really went for this. They went through huge amounts of imagesÂ very quickly.

Other things started to happen to… The first community around the project was the Galaxy Zoo forum. A participant called Hanny found a thing (vootwerp)… It didn’t look like the galaxies she was classifying. This was a completely new astronomical phenomenon, which was never known about. An amateur had found this through this very simple platform. People aren’t just good at recognising patterns, they also get distracted and find new things. And after discovering and publishing on this phenomenon – a huge cloud of gasÂ associated with a galaxy – a group from the community decided to make a project of looking for more of these in other Galaxy Zoo images. And this is why communities are so brilliant. On another project our community found a whole new worm under the sea. That’s the power of having this community taking part.

So, how do we do this? Well we really simplify the language of the task, make it easy for people to take part. And when Galaxy Zoo took off we found other scientists and researchers approaching us to build new projects including humanities projects, and biological projects. So we set up projects such as Snapshot Serengeti – used to indicate what you can see in images from camera traps on the Serengeti. I was working with a group of computer scientists trying to work out how to identify the object in the image, and also my 4 year old nephew… and he said in seconds, the computer scientists are still looking for a solution.

So at this point in time we now have 42 projects in the Zooniverse. Old Weather in 2010 was our first humanities project. It started as a climatology project, but because it was using historic ship logs and those include so many other types of data we found humanities researchers and historians coming on board so it has had a second life. We have other humanities projects, cancer research projects, etc. Of those projects about 30-35 are currently live. We think this will expand rapidly soon but I’ll come back to that. And last year we passed the 1 million volunteer mark, that’s registered volunteers. Mostly those are in Western Europe and North America, but we have participants in 200 countries (7 countries have not).

The community is expanding, the projects are expanding… But there is a lot of potential out there, a huge cognitive surplus we could be using. For instance Clay Shirky notes that 200 billon hours are spent watching TV by adults in the UK, it took only 100 million hours to create Wikipedia. We are only beginning to tap that potential. On January 7th last year we relaunched a project called Space Warps – we had over a million classifications an hour – when Prof Brian Cox and Dara O’Brien asked the public to do it on live TV. That meant that overnight we had discovered an object it can take astronomers years to discover. It’s good but it’s no 200 billion hours… Imagine what you could do with that much time. Every hour there are 16 years worth of human effort spent playing Angry Birds… How do we get that effort into citizen science?

So, if gamification the way to go? For those working in citizen science you could probably run a week long conference just on whether you should or should not do gamification. We have decided not to but some of the most successful – foldit and Eyewire – do use it. Those projects gave huge thought about how to ensure participants reward efforts in the right way so that people don’t just game the system. For us we are worried that that won’t work for us, not convinced we would be good enough building a game and end up with something neither game nor citizen science. But some of our projects have tried gamification and we have studied this. On Galaxy Zoo we used a leader board to start with but that caused some tension: those in the lead were doing hundreds of thousands of classifications and people felt the leaders might have cheated, others felt that they could never get there so just left. On Old Weather we enabled those participants who focused on a particular ships log could become captain – but it put off as many people as it attracted. And those who became captain had nowhere to go.

This comes back to motivation for taking part. When we do ask our volunteers frequently it comes down to those participants wanting to contribute to research. So, for instance, The Andromeda project involved images that weren’t that exciting… They were asked to circle clusters of galaxy. The task is simple, they feel they are really contributing… They finished the task in a week. This time, when we had finished we put up a message thanking participants for their contribution, saying that we had enough for the paper, but they were welcome to carry on… And that shows a rapid fall down to zero participation – they were only interested while the task at hand was useful. And that pattern reminds us not to mess with our community, they use precious spare time and theyÂ want to be doing something useful and meaningful.

Planet Hunters is a project we used to detect planets based on data. People don’t take part to discover planets, it is because they really are interested in the science. Some of our really active participants choose to download the data, write their own code, doing work at PhD level as a volunteer and sending data back… The planets discovered in that project are rare and weird – things we didn’t spot with algorithms – the first one found had 4 suns. And recently we found a seven planet solar system, the largest other than our own .

Volunteers are keen to go further, so we have a discussion area – labelled Talk – for all of our projects. That means you can comments, Twitter style, or you can use old style discussion boards for long form discussions. Those areas are also used by the scientists, the researchers, the technical teams and developers, and the community can interact with them there – the most productive findings often come from that interaction between volunteers and scientists. The talk areas of our community are really important. In fact we have a network diagram for our community we can see some of our most active participants Â – one huge green blob on this diagram is a wonderful woman called Elizabeth who posts and comments, and moderates, helps fellow volunteers come along. And we are looking at those networks, at who those lynchpins are, etc.

I said that people write their own code, do their own analysis… So can we get that on the site? We have been playing with the tools area, which we’ve tried this for Galaxy Zoo and for Snapshot Serengeti. We’ve been funded to build a broader set of tools, to map data, etc. from the website itself.

One of the other big things we are trying to do is to translate the site. For instance here is Galaxy Zoo in traditional character Mandarin. And we are doing this through crowdsourcing. You pick your site, and youÂ show words or sections for users to translate. I talked about understanding the community and their interest and motivation. You also need to understand how we allocate images etc. We have done it based on seen/not seen but have been toying with the idea of shaping what images you see based on what you have seen, or are good at, or particularly like or are good at identifying. We tried that, shaping images to suit interested folk. When we tried that it wasn’t that successful, this was on Snapshot Serengeti, and realised we hadn’t been showing them blank images… So we looked at usage data to see to what extend seeing blank images impacts classifying images. It seems that the more blank images a user sees, the more they classify. When you classify a few/lots in one go they leave the site sooner. But psychologically we aren’t sure why this is yet – to classify a blank image its one click, that’s quick… But also what is the reward there for that image – is it just as rewarding to classify a blank image. There seems to be a sweet spot here… The same team trying to automatically spot a zebra has also been looking at identifying anything being in the image… But doing that may mean they leave the site sooner so we could be shooting ourselves in the foot…

So, we’ve been thinking who should see what? And as part of that we have been trying, with some of the space image projects, putting some simulated images into the mix Â to rank/detect expert level – and looking at that in comparison to their experience/expert level within the system. We want to see if there is a smarter way to do a Zooniverse project.

The other thing that can happen is fear, a sort of classification anxiety. For instance for cancer images people can be quite scared to click the button and contribute to the research. So we are toying with showing volunteers how the consensus clustering works – so we can show people that their marking counts but that they are backed up by the wisdom of the crowds we think that may help them trust themselves. At the moment we just blog about this stuff, but how can we show this on the site.

Panoptes is our new infrastructure platform, which we’ve been building for the last year, built with 2 million dollars of funding from Google. And the first project using this appeared on Stargazing Live this year, looking for Super Novas. We discovered five Super Novas during the week long run of that programme.

Mark Hartswood (Oxford University & CSCSÂ Data andÂ Evidence network founder): â€˜Intervening in Citizen Science: From incentives to value co-creationâ€™

About Mark and his talk:

â€˜This talk reflects upon a collaboration between SmartSociety, an EU project exploring how to architect effective collectives of people and machines, and the Zooniverse,Â a leading on-line citizen science platform.

Our collaboration tackled the question of how to increase engagement of Zooniverse volunteers. In the talk I will chart how our thinking has progressed from framing volunteering in terms of motivation and incentives, and how it moved towards a much richer conceptualisation of multiple participating groups engaging in complicated relationships of value co-creation.â€™

Mark Hartswood is a Social Informatician whose main employer is Oxford University and currently working in the area of Responsible Research and Innovation.

Share/Bookmark

EDINA Blogs

A Blogs.edina.ac.uk weblog

Category Archives: Galaxy Zoo

CSCS Network – Seminar 1 Science and the citizen worker: the Zooniverse – LiveBlog