Data is a key part of information science. It comes to us in many different forms and from many sources. Over the past decade the value of data to businesses and researchers has become increasingly important and increasingly reported on in the media and elsewhere. Media reports claim that data scientist will be the sexiest job of the 21st century. “Big data” has become a byword for the future.
But there are potential drawbacks for a world awash in data. It’;s not always clear that more data helps us to understand what is happening in the world. Scientists often say that “correlation is not causation”, a phrase that tries to capture the complex cognition and construction that has to take place in order to move from noticing a pattern in the world to understanding what causes that pattern. Machine learning and other advanced data techniques are incredibly good at noticing patterns, and sometimes predicting future patterns, but explanations for why those patterns occur is sometimes missing.
Big data is also a challenge to our notions of privacy. Large corporations, such as Google and Facebook, collect massive amounts of data about our everyday activity. Governments collect similar amounts of information. A year rarely passes without a major scandal about privacy and surveillance, like the Facebook emotional contagion study in 2015 or the Snowden surveillance disclosures in 2013.
In the more specific context of the university, research libraries are struggling with their role in the process of research data management. At the University of Alberta I’ve been working with a variety of people on raising the awareness of researchers about better ways to manage their research data, how to share their data with others, and how to develop data management plans to satisfy the growing demands of funders. During my PhD program I was part of the sociocultural working group for the DataONE project. We studied the different ways social groups, norms, and behaviors affect the attitudes of scientists towards storing and sharing research data.
My current focus is on the following questions:
- How is research data management being adapted in different disciplines?
- Can the lessons learned in the physical sciences be applied to the social sciences and humanities?
- What are the ethical requirements for working with data from the web?
- How do we make judgments about the quality of big data?