On Sourcery, or the enclosure(?) of remote access
Archivists and archival users alike have been thinking a lot about remote access to archival collections, and I was lucky to engage with colleagues on this at the Lighting the Way: A National Forum on Archival Discovery and Delivery that we convened in February 2020. A lot of this touches on past work by colleagues at UC Irvine on the notion of a “virtual reading room”, which is a controlled online space to provide access to digital materials with copyright or donor restrictions. This phrase and concept resonated further with Forum presenters like Heather Smedberg from UC San Diego and Greg Cram from NYPL, and these folks are refining these concepts further in a Lighting the Way Working Meeting group dedicated to the concept.
Of course the possibility of remote access raises real concerns about sustainability, ethics, and a whole host of other issues, and it’s become a lot more important as we’re still stuck in an unprecedented global health crisis, which has had its own material realities on people’s lives and livelihoods. This has impacted both the lives of archivists and academic researchers, too, with the latter lamenting how this impacts their research. While archivists tend speak to frustration about demands for access. This can have significant impact on a doctoral student’s ability to advance, or an early career or adjunct faculty member to work towards securing tenure.
It is between these two contexts that I find myself thinking about Sourcery, in development at Greenhouse Studios at the University of Connecticut, and a project of the Corporation for Digital Scholarship, which also develops and manages Zotero and Tropy. Sourcery endeavors to make document requests “easy” by “[giving] archival staff and patrons one easy platform for requesting and receiving scans”. It reduces costs for researchers by reducing travel costs, and hopefully, the environmental impact of travel. As described by Tom Scheinfeldt in a panel at the Spring 2021 CNI Membership Meeting:
Sourcery is a sort of very specific solution to a very specific problem that scholars face more often than archivists may realize, which is access to known archival sources right a [particular] source that you know, is in the archive. … The scholar here in question needs access to a source that he already knows sits at the National Archives in London. And the question is, how does he get it – he’s not in London, not sure where he is, but he’s not in London – how does he get this source that he knows is there? (08:15-09:09)
Scheinfeldt acknowledges a few questions in this panel that are underscored more implicitly by the Sourcery website: 1) how does this intersect with existing rights and policies; 2) what are the integration points with systems and metadata managed by archives and libraries; 3) how does this impact the visibility of archival labor; and 4) what does entering the gig economy mean for librarians and archivists? Questions 1 and 2 are of great interest to me given Lighting the Way and the work on virtual reading rooms, but that’s perhaps a separate post and line of discussion. What I want to do in the rest of this post is to focus on 3 and 4, given my recent keynote at the SCA AGM.
The visibility of archival labor has much to do with the perception of archival labor. As I shared in the keynote, and as acknowledged by Scheinfeldt and his copanelists Dan Cohen, Barbara Rockenbach, and Greg Colati, this also needs to be understood in the context of resource and staffing availability within archives and libraries, which is further complicated by Cohen’s description (and to be fair, problematization) of a “gold standard” for archival access that’s had to shift because of the pandemic (41:22-43:13). What I think Scheinfeld, Cohen, Rockenbach, Colati, and the moderator and CNI executive director Cliff Lynch ignore is the context of the “multi-decade cycle of poverty” as described by Eira Tansey in her article ”Archives Without Archivists”. Tansey contextualizes this through both looking at number of reports from the 1980s onwards, notably Levy and Robles’ The Image of Archivists: Resource Allocators’ Perceptions. Levy and Robles describe stereotypes of archives and archival work in way that made me go white like a ghost:
A stereotype of archivists and their work clearly emerges in the study of resource allocators. Archivists are viewed as quiet professionals, carrying out a practically frivolous activity. … Archival work is plagued by stereotypes, too. … Although resource allocators think they know what archives are, they are wrong. (“Archivists’ Resource Allocators: The Next Step”)
Tansey also provides tangible examples of underfunded archives programs: the Target corporate archives, the State Archives of Georgia, and Lincoln University, a historically Black university that closed its special collections and archives in 2010. It is impossible for me to engage with the Sourcery and remote access panel without considering the positionality of its speakers. Cohen and Rockenbach are university librarians/library deans at Northeastern University and Yale University respectively; Scheinfeldt and Colati run programs at University of Connecticut; Lynch, as moderator, is an executive director of an organization that holds membership meetings with a limited number of slots where lots of deans, associate deans, and CIOs are in attendance. Their audience is, largely, academic library and IT administrators, although sometimes they let some of folks in the “unwashed masses” (26:44) attend if we’re presenting. The reality in particular is that that Cohen and Rockenbach are resource allocators, and Lynch is in a considerable position of influence himself on resource allocators. It’s easy to say that Sourcery obfuscates archival labor further, but it also obfuscates how we resolve issues of how archival work is perceived and resourced in times of increasing austerity. Colati argues in the panel (55:52) that our perceived notions of quality should go down, such as digitization at lower resolution, but I wonder if that just speaks to further cycles of poverty and introduce a form of technical debt to operations that still misunderstands the total cost of stewardship as defined by Chela Scott Weber, Martha O’Hara Conway, Nicholas Martin, Gioia Stevens, and Brigette Kamsler.
As for the gig economy, I recognize that Scheinfeldt and the Greenhouse Studios team are thinking about this carefully, as he’s generously engaged with me on Twitter about this. While he acknowledges that Sourcery is or will be open source, so far only the frontend web app and its associated components are - that’s less substantial to the rest of my concerns and critique. At it’s worst, and as I described in the SCA AGM keynote, my fear is that Sourcery potentially introduces a “quadruple threat of enclosure”:
- In an ungenerous view, linked with Colati’s suggestion above, it introduces the notion that archival digitization can be viewed as piecework to serve researchers with specific demands. This has two possible impacts. First, it incentivizes avoiding at looking at archives holistically and systematically. While I recognize that many archives need to continue patron-driven digitization, it’s still understood and contextualized amongst other selection criteria that many institutions have overhauled to address concerns of diversity, equity and inclusion. Secondly, it suggests that digitization and the “last mile problem” archival delivery can be compensated as piecework, which is among my biggest fears.
- Sourcery uses an “algorithm tuned to local conditions” to determine how much a researcher pays for a request. While Scheinfeldt indicates that Sourcery is open source, I haven’t been able to find that algorithm in the code yet (there is likely a distinct backend codebase not yet open), but it is described as using “the scope of the request, the number of local Sourcerers [i.e., Sourcery workers] available at the time, the urgency, and other variables.” Locality suggests that the application needs some type of position information from Sourcerers. While Sourcerers are arguably opting in to work and participate, they are also opting to a form of surveillance that tracks their location. Whether this happens in realtime, or is disclosed to Sourcery in other ways, is yet unclear.
- Sourcery, in Lynch’s words, allows “credentialed people who essentially can monetize their credentials” (25:09-26:00) by participating in the platform. While Scheinfeldt, Rockenbach, and Cohen engage with this on questions of openness and have lofty goals of democratizing access, this does play to equity concerns faced by the archival profession about access itself, such as the problematic access policies of the Huntington Library juxtaposed with the fact that they hold the papers of Octavia Butler. I referenced Cecelia Caballero’s “Mothering While Brown in White Spaces” in my talk, albeit just through a footnote; Lindy Smith engages with this piece further in her blog post about access and inclusion in the reading room from 2018.
- What becomes of the access copies? As Colati suggests, perhaps archives are well served by relying more on lower resolution imaging to fulfill these needs. As the Folger Shakespeare Library has demonstrated, reference images have enough value to warrant retention and reuse, and the Folger is allowing such reference images taken with smartphones to accrete value through making them available. Colati and Cohen suggest in the panel that “you could run [reference images] through some some machine learning or computer vision techniques” to add metadata. However, once that metadata is added, through any process, there’s the possibility that they’ll be leveraged to create ML training data. While this could improve metadata creation and the like, the training data sets, or algorithms refined thereupon, themselves could be subject to enclosure, as I suggested in the talk, and as expanded on further by Sam Popowich.
All of these remarks lead me to not to say simply that Sourcery any of these folks are “bad”; I know and respect them, and some are former coworkers. But nonetheless, I ask them, and us, to remember that what they say can, and does, influence their peers, and those of us charged with putting work into place. Sourcery, as envisioned could be viewed as akin to a supply-chain management system’s interface as described by Miriam Posner, and as such, it could have significant effects and affects on Sourcerers as “mouseworkers.” I would hope for further engagement with the Sourcery team, as well as the panelists, and I look forward to engagement and continuing conversation.