A Bird's Eye View of Archival Collections
Mitchell Whitelaw is a Senior Lecturer in the Faculty of Design and Creative Practice at the University of Canberra and the 2008 winner of the National Archives of Australia's Ian Maclean Award. According to the NAA's site, the Ian Maclean Award commemorates archivist Ian Maclean, and is awarded
to individuals interested in conducting research that will benefit the archival and historical profession in Australia and promote the important contribution that archives make to society.
Dr. Whitelaw has been keeping the world up to date on his work using his blog, The Visible Archive. His work fits well with my colleague Jeanne Kramer-Smyth's archival data visualization project, ArchivesZ, as well as the multidimensional visualization projects underway at the Humanities Advanced Technology & Information Institute at the University of Glasgow. However, his project fascinates me for a few specific reasons.
First of all, the scale of the datasets he's working with are astronomically larger than those that any other archival visualization project has tried to tackle so far. His visualizations include analysis at the Commonwealth Record Series level, which can be as large as about 10,000 linear meters of material. In addition, he'll be working directly with Series A1, which holds 20,000 items in approximately 450 linear meters.
Secondly, he's been using Processing, an open source programming language for visualization and interaction design to do a lot of the heavy lifting to create interactive visualizations like this one, depicted in the screenshot below. Processing works well for this because of its extensive third party libraries, such as proXML, which allowed him to parse the descriptive data Dr. Whitelaw received from the NAA (note: it's not EAD, thank heavens).
A visualisation of 57000 series in the collection of the National Archives of Australia. The area of each square is proportional to the number of shelf metres that series occupies, while the size of the grey void in each square is related to the number of described items in the series. So a square with a large void (thin "walls") has relatively fewer items than one with a small void (thick "walls") - or no void at all. There's a minimum wall thickness of one unit, which is why the smallest squares have no voids.
Finally, I also have to give Dr. Whitelaw a lot of credit for sharing the source code, as well — this will really jumpstart my efforts to start playing around with Processing!
Hey Mark, thanks for the writeup; and I appreciate the links to those related projects. I should apoplogise in advance for my source code though: I'm a hacker more than a coder, so it's not pretty. On the other hand as a hacker I can recommend Processing and its community without reservation! cheers - Mitchell