Research Summary

I made this brief video about my research in January 2018.

Studying Web Archiving

My doctoral dissertation studies web archiving, focusing on the practices of collecting and creating a record of materials found on the World Wide Web to preserve digital cultural heritage. Ongoing access to content on the web is at risk as networked resources are changed or removed unpredictably over time. In order to combat loss of essential information, institutions like the Internet Archive and other founding members of the International Internet Preservation Consortium (IIPC) have been collecting web materials for over a decade. 

Recently the number of web archiving initiatives in libraries, archives, and research institutions has increased, and archived web materials are becoming the focus of historical research in the Internet era [1]. A key challenge is understanding the selective nature of these collections, and how scholars studying archived web material may evaluate and account for what is or is not included in the data. Previous work has tested the coverage of existing web archives collections against materials from the live web, but these focus on quantitative approaches that fail to address why, when, or how archival absences occur, and the implications for interpretation of these materials, or their use as evidence. As new methods for working with these collections emerge, a key barrier for research with web archives is understanding and interpreting the various social and technical factors that influence what is captured and preserved for the future [2]. Documentation for these different aspects of the web archiving process is also often opaque or inconsistent.

My doctoral research asks: how web archives are made and used? Specifically, I study the combination of people and technology involved in the process including: the curatorial choices made in creating web archives collections; how these choices are shaped by the design of sociotechnical systems and infrastructures; and how this influences subsequent use and interpretation. In studying these processes, I hope to understand how can these curatorial choices be identified and described in order to make web archives more transparent to research users and enable diverse forms of digital scholarship. I also explore how web archiving practice can draw from the concepts, theory, and ongoing debates about provenance in archival theory in order to better understand and document these choices [3].

I use a multiple case study of select exemplary, innovative, and emerging practices to explores the different choices and contexts of web archiving. The first case is centered on the NetLab research group at Aarhus University and their use of the RoyaI Danish Library’s Netarkivet collection. The second focuses on the Archives Unleashed project, studying how the team of researchers from the University of Waterloo and York University are developing infrastructure and ‘toolkit’ to support the use of web archives collections from research libraries across Canada.


[1] The increase is reflected in recent surveys on web archiving by the National Digital Stewardship Alliance. Examples of historical research using web archives as sources can be found in Niels Brügger and Ralph Schroeder, eds., The Web as History: Using Web Archives to Understand the Past and the Present. (London: UCL Press, 2017).

[2] Recent work addressing these different social and technical aspects of web archiving include: Jessica Ogden, Susan Halford, and Leslie Carr, “Observing Web Archives: The Case for an Ethnographic Study of Web Archiving,” in Proceedings of the 2017 ACM on Web Science Conference (WebSci’17, Troy, New York, USA: ACM Press, 2017), 299–308; Ed Summers and Ricardo Punzalan, “Bots, Seeds and People: Web Archives as Infrastructure,” in Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW’17, ACM Press, 2017), 821–34; Peter Webster, “Users, Technologies, Organisations: Towards a Cultural History of World Web Archiving,” in Web 25, ed. Niels Brügger, Digital Formations, vol. 112 (New York: Peter Lang, 2017), 175–90; Anat Ben-David and Adam Amram, “The Internet Archive and the Socio-Technical Construction of Historical Facts,” Internet Histories 2, no. 1–2 (2018): 179–201. Recent work by Lozana Rossenova also highlights approaches to surface the constructed nature of the archive through access interfaces – see for example her discussion on access interfaces as Windows and Mirrors during the “Curation and Power” panel at the National Forum for Ethics and Archiving the Web, 2018 (starting ~6:00 of the livestream).

[3] Douglas provides a clear and succinct overview of how concepts of provenance have developed in archival theory over the past century in the chapter: Jennifer Douglas, “Origins and Beyond: The Ongoing Evolution of Archival Ideas about Provenance,” in Currents of Archival Thinking, ed. Heather MacNeil and Terry Eastwood, Second edition (Santa Barbara, California ; Denver, Colorado: Libraries Unlimited, an imprint of ABC-CLIO, LLC, 2017).