Citizen Science and Open Data

I spoke on a panel last week about citizen science and open data. The panel was one of the events put on by the University of Alberta library for Open Access week.

I went first so my presentation didn’t reflect directly on the work of others, although there was much to think about. I started by describing the key dilemma faced by many scientists who have turned to citizen science methods: how to deal with the huge amounts of data which are needed to do or are used in science today? I used GalaxyZoo as my example because I think it is one of the first projects to become mainstream by using citizen science. But I’m not trying to suggest that GZ was the first citizen science project. That is definitely not the case.

I’m just saying that GalaxyZoo came about at a propitious moment in the development of the internet and in our conceptions of what crowds could do. GalaxyZoo started in 2007, just a year or two after Chris Anderson coined the term crowdsourcing and James Suroweicki published the book The Wisdom of Crowds. Crowds were in the air in the mid-2000s. Web 2.0 was becoming a buzzword; open-source software had won legitimation in the enterprise. User generated content was becoming a powerful force in Wikipedia and other media sites. Facebook was still locked behind the ivy walls of universities, and Twitter was only a year or so past its initial release. Social tagging and folksonomies were all the rage in information science schools and journals. The crowd was at the top of the Gartner hype cycle.

There was plenty of precedence for what GalaxyZoo did. The National Phenology Network has been collecting volunteer observations since the 1950s. The Cornell Lab of Ornithology, creators of the largest citizen science project currently online - eBird, has been in business with volunteers since the 1980s. The Audubon society goes back even further to the early days of the twentieth century. The Community Collaborative Rain, Hail, and Snow network started in Colorado in the 1990s and is now a large collector of meteorological data from volunteers across the United States and Canada. All of these projects have benefited from the work of hundreds of thousands of dedicated volunteers.

After describing the background of these different projects to give the audience a sense of what citizen science is and how it works, I moved onto a discussion of open data. I used the Denton declaration as a starting point for comparing the principles of citizen science and open data. The importance of access to open data is shared by citizen science. So is the idea of accretive value, GalaxyZoo could not exist without the previous work of the Sloan Digital Sky Survey. Many citizen science researchers support transparency in scientific research for multiple reasons, including building public trust in science. So too with the idea that research data needs to be stewarded by a broad coalition of stakeholders. There are parallels between all of the principles of open data and citizen science. But there are also some subtle differences.

The Denton declaration, and a lot of the discussion I’ve read about open-data, usually takes the perspective that it is a good thing for the public to have access to data as a consumer but ends up saying very little about the public as a producer of data. Denton talks about validating with the peer community of other scientists and building trust among the public. There is an implicit us and them in such statements. I don’t think there was any malice on the part of the Denton writers or even an awareness that such a separation might be suggested. The people who work on open-data are usually from institutions, like the government or the academy, where such thinking is pervasive. Open data becomes another milestone on the road to outreach or, in the parlance of the National Science Foundation, broader impacts.

So if there is a lesson for open data to learn from citizen science it is to acknowledge the two-way street between consumers and producers of data, between the public and the researchers. If we accept this then open data and citizen science can have a profoundly better effect on the world.

I was pleasantly surprised by the rest of the presentations which talked about mapping and open data, open data in academia, and open data in the government of Alberta. All of the presentations discussed the multiple stakeholders involved and talked about how open data is about more than just giving data away.