Steve Kelling on citizen science - #asist2011

Steve Kelling from the Cornell Lab of Ornithology and the eBird project presented the second ASIST 2011 keynote on citizen science. Kelling began by describing the background of the ornithology lab and its long-term commitment to involving the public in science. The lab was founded in 1915 so the centennial is rapidly approaching. They have engaged over 200,000 citizen scientists in various projects, many of them in eBird for which Kelling serves as information science director.

There are approximately 70 million people in America who observe birds. The vision of the eBird project is to use these human sensors to identify birds, capture the observations, and then use the data to model and analyze the migration patterns of birds across the world. eBird captures only a fraction of the data that birders create, raising all sorts of interesting information science issues about identification of data, standardization, and incentives for contribution. Those who do choose to contribute data to eBird are on track to enter 7 million checklists during 2011, which works out to 5000 checklists per day, 24 million total observations, and 3 million volunteer hours.

Kelling described some of the design iterations that have occurred over the past decade to improve the eBird experience for contributors. One of the largest lessons has been the value of feedback and response. When users are able to see their own observations as points on a map, or explore various visualizations of the data, they are more likely to return to the site and continue to contribute. Top 100 lists of contributors have built up a dynamic community that thrives on friendly competition. Kelling describes the audience for eBird as a combination of nerdy bird-watchers and younger internet-engaged people.

Citizen science projects fit into a broader context of crowdsourcing and human computation. Other projects such as Fold It and Galaxy Zoo have shown the value of human based data analysis. People can be better at detecting patterns in data than a machine. They can also observe phenomenon over a wide area and for lengthy timeframes. For birds human observation is crucial because machine algorithms are too clumsy to make the nuanced judgments that go into identifying a bird.

Kelling gave an example of how the data collected by eBird is verified by a network of other birders who are automatically notified when unusual sightings are made in a geographic area. In July 2011 a rare seagull was reported on Coney Island, NY. The ensuing publicity included a report in the New York Times about birders traveling great distances to see this once in a lifetime event.

eBird uses a regional network of full-time employees to vet the data entered by volunteers. As more data comes in the algorithms can be tuned to notice anomalies. Extreme outliers are flagged for further review by experts who may contact the observer to gather more information.

One of the interesting questions from the audience highlighted the importance of structured versus unstructured data. There are many hundreds of email lists and bulletin boards where birders discuss their sightings and seek advice. eBird complements these sites but doesn’t try to mine them for data. The structured observation information collected on the eBird website is crucial for answering the research questions proposed by the scientists who manage the project. Any project of this scope and scale must balance convenience and precision. A more precise observation checklist may deter the amateur or novice birder. But the Cornell people have iterated their way beyond most of these problems and created an enviable citizen science project.