This article examines key ethical issues that are continuing to emerge from the task of archiving data scraped from online sources such as social media sites, blogs, and forums, particularly pertaining to online harassment and hostile groups. Given the proliferation of digital social data, an understanding of ethics and data stewardship that evolves alongside the shifting landscape of digital societies is indeed essential.
Our study involves a primary research archive that is comprised of data scraped from our project concerning the case study of Gamergate, which involved numerous instances of hate speech in various online communities. Doing this type of qualitative research presents advantages for humanities and social science research because it is possible to generate large and rich corpora about subjects of human interest. However, such data scraping has also raised ethical issues around treating social media authors as research subjects and, moreover, as subjects who have provided informed consent. Once researchers consider content creators on these sites as human research subjects, what would best efforts adhering to the directive to ‘do no harm’ look like?
While we realize the impossibility for definite rules to exist, we do consider the possibilities for how one can best care for the stakeholders using the challenges in their particular contexts. In this case, the stakeholders included Twitter authors, targets of online harassment, researchers, students, archivists, and the larger academic community. Also under consideration is how the Ethics of Care may be extended to the research community, and especially student researchers in their exposure to toxic material.