skip to content

Archive for 2009

  • Onward And Upward...

    It's fitting that this the hundredth (gosh, only the hundredth?) post, because I have rather important news. First, my fellow developers/producers/UX designers at The New York Public Library and I have been dealing with every minute detail on the upcoming, Drupal-based replacement to the NYPL website. You can see a live preview at I can proudly say that this project has helped both me personally and NYPL overall play nice in the open source world - we've been actively contributing code, reporting bugs, and sending patches to the Drupal project. Also, our site search is based on Solr, which always bears mention. In addition, after a working tirelessly as a developer at NYPL for the last year and a half, I have decided to move onward and upward. I am leaving the cozy environs of the still-recently renovated office space I share with my spectacular coworkers. It was not an easy decision by far, but it feels like the best one overall.
  • Clifford Lynch Clarifies Position on Open Source ILSes

    Clifford Lynch, Executive Director of the Coalition for Networked Information, has responded to the leaked SirsiDynix report that spreads horrific untruths about open source. Marshall Breeding posted Lynch's response on GuidePosts. In particular, Lynch notes the following: I don't think that I ever wrote those words down in an article; I suppose I may have said something to that effect in an interview or q&a in some conference program like ALA Top Tech, though perhaps no quite as strongly as it's expressed here. I have without question spoken out about my concerns regarding investment in open source ILS development in the last few years. IF I did say this, it feels like it's used a little out of context -- or maybe the better characterization is over-simplistically -- in the report. ... I think there are still major problems -- many of which we really don't know how to solve effectively, and which call for sustained and extensive research and development -- in various areas where ILS get involved in information discovery and the support of research and teaching.
  • SirsiDynix Report Leaked, Spreading Fear, Uncertainty and Doubt about Open Source

    Thanks to Twitter, I discovered that Wikileaks has posted a report written by SirsiDynix Vice President for Innovation Stephen Abram which spreads a fantastic amount of fear, uncertainty and doubt about both open source software in general and, more specifically, the suitability of open source integrated library systems. As the summary provided by Wikileaks states, This document was released only to a select number of existing customers of the company SirsiDynix, a proprietary library automation software vendor. It has not been released more broadly specifically because of the misinformation about open source software and possible libel per se against certain competitors contained therein ... The source states that the document should be leaked so that everyone can see to what extent SirsiDynix will attempt to spread falsehoods and smear open source and the proponents of open source. In addition, as you may have heard, the Queens Library is suing SirsiDynix for breach of contract; for what it's worth, the initial conference is scheduled for next Monday, November 2, 2009.
  • pybhl: Accessing the Biodiversity Heritage Library's Data Using OpenURL and Python

    Via Twitter, I heard about the Biodiversity Heritage Library's relatively new OpenURL Resolver, announced in their blog about a month ago. More specifically, I head about Matt Yoder's new Ruby library, rubyBHL, which exploits the BHL OpenURL Resolver to provide metadata about items in their holdings and does some additional screenscraping to return things like links to the OCRed version of the text. In typical fashion, I've ported Matt's library to Python, and have released my code. pybhl is available from my site, PyPI, and Github. Use should be fairly straightforward, as seen below: >>> import pybhl >>> import pprint >>> b = pybhl.BHLOpenURLRequest(genre='book', aulast='smith', aufirst='john', date='1900', spage='5', volume='4') >>> r = b.get_response() >>> len(['citations']) 3 >>> pprint.pprint(['citations'][1]) {u'ATitle': u'', u'Authors': [u'Smith, John Donnell,'], u'Date': u'1895', u'EPage': u'', u'Edition': u'', u'Genre': u'Journal', u'Isbn': u'', u'Issn': u'', u'ItemUrl': u'', u'Language': u'Latin', u'Lccn': u'', u'Oclc': u'10330096', u'Pages': u'', u'PublicationFrequency': u'', u'PublisherName': u'H.N. Patterson,', u'PublisherPlace': u'Oquawkae [Ill.] :', u'SPage': u'Page 5', u'STitle': u'', u'Subjects': [u'Central America', u'Guatemala', u'Plants', u''], u'Title': u'Enumeratio plantarum Guatemalensium imprimis a H.
  • Access and Description Reconsidered

    What exactly is archival access, and how does archival description make it possible? I feel like that in some form or another I've been struggling with this question throughout my career. Recently, this blog post from The Top Shelf, the blog of the University of Texas at San Antonio Archives and Special Collections Department, came across my radar, wherein they write (emphasis in original): UTSA Archives and Special Collections is among the growing number of archives to create an online presence for every one of its collections. ... We were able to utilize inventories generated by former and current collection assistants to create guides to the collection with folder-level and box-level descriptions. The project resulted in access to more than 130 collections and 2000 linear feet of materials. What defines that accessibility? I certainly don't intend to be a negative Nancy about this - adding finding aids and other descriptive metadata about collections is obviously useful. But how has it necessarily increased access to the materials themselves?
  • Perspectives of Encoded Archival Description at the Institutional, Research, and National Level

  • AIP Receives NHPRC Funding To Digitize Samuel Goudsmit Papers

    I'm happy to pass on the news that my former employer, the Niels Bohr Library & Archives of the American Institute of Physics, has received funding from the National Historical Publications and Records Commission to digitize the entirety of the Samuel Goudsmit papers. From the announcement on the Center for History of Physics/Niels Bohr Library & Archives Facebook page: Goudsmit (1902—1978) was a Dutch-educated physicist who spent his career in the US and was involved at the cutting edge of physics for over 50 years. He was an important player in the development of quantum mechanics in the 1920s and 1930s; he then served as scientific head of the Alsos Mission during World War II, which assessed the progress of the German atomic bomb project. Goudsmit became a senior scientist at Brookhaven National Laboratory and editor-in-chief of the American Physical Society. The papers consist of an estimated 66,000 documents, which include correspondence, research notebooks, lectures, reports, and captured German war documents; the collection is the most used in the library.
  • A Gentle Reminder

    On the eve of teaching my first class of my course (LIS901-08, or, Building Digital Libraries: Infrastructural and Social Aspects) at LIU's Palmer School of Information and Library Science, I'd like to remind you of the following. The syllabus is available on online, if you're curious.
  • LIS 901-08: Building Digital Libraries: Infrastructural and Social Aspects (Fall 2009) - Long Island University

    This class aims to prepare students to think proactively, creatively, and critically about planning, implementing, and evaluating digital library projects in a variety of institutions. In addition, this class is designed as a seminar to allow proactive discussion of topics between both the students and the instructor; this format allows you to learn from each other as well as from me.

  • Privacy, Censorship, and Good Records Management: Brooklyn Public Library in the Crosshairs

    Over at, Jessamyn West has a brief write up about a post on the New York Times' City Room blog about placing access restrictions on offensive material (in this case, one of Hergé's early Tintin books at the Brooklyn Public Library). More interestingly, she notes, is that the Times was given access and accordingly republished challenges from BPL patrons and other community members. Quite astutely, Jessamyn recognizes that the patrons' addresses are removed but their names and City/State information are published. If your name is, for example, [name redacted], redacting your address doesn't really protect your anonymity. I'm curious what the balance is between patron privacy and making municipal records available. It's a good question that doesn't have an incredibly straightforward answer. My first concern was about whether BPL had kept the challenge correspondence beyond the mandated dates in the New York State records schedules. After doing some digging, on the New York State Archives' website, I came across Schedule MI-1 ("
  • Online Presence and Participation

  • Linked Data and Archival Description: Confluences, Contingencies, and Conflicts

  • Everything is Bigger in Texas, Including My Talks on The Semantic Web

    I'll be at the Society of American Archivists Annual Meeting next week in Austin, Texas. It looks to be a jam-packed week for me, with a full-day Standards Committee/TSDS meeting on Tuesday, followed by THATCamp Austin in the evening, an (expanded version of my) presentation on Linked Data and Archival Description during the EAD Roundtable on Wednesday, and Thursday's session (number 101): "Building, Managing, and Participating in Online Communities: Avoiding Culture Shock Online" (with Jeanne Kramer-Smyth, Deborah Wythe, and Camille Cloutier). And to think I haven't even considered which other sessions I'm going to! Anyhow, I hope to see you there, and please make either or both of my presentations if you can.
  • Must Contextual Description Be Bound To Records Description?

    I've been struggling with the fact that (American) archival practice seems to bind contextual description (i.e., description of records creators) to records description. Much of these thoughts have been stirring in my head as a result of my class at Rare Book School. If we take a relatively hardline approach, e.g. the kind suggested by Chris Hurley ("contextual data should be developed independently of the perceived uses to which it will be put", 1, see also 2), it makes total sense to separate them entirely. In fact, it starts making me mad that the <bioghist> tag exists at all in EAD. Contextual description requires that it be written from a standpoint relative to that of the creator it describes. I guess what I keep getting hung up on is if there could be a relevant case that really merits this direct intellectual binding. I therefore appeal to you, humble readers, to provide me with your counsel. Do you think there are any such cases, and if so, why?
  • Seeking Nominations for Co-Chair, RLG Programs Roundtable

    Apologies for any duplication - we're just trying to get the word out! As co-chairs of the RLG Programs Roundtable of the Society of American Archivists, we're seeking nominees to co-chair of the Roundtable for 2009-2011. If you'd like to nominate yourself or someone else, please email M.A. Matienzo, Co-Chair, at M.A. Please submit all nominations no later than 5 PM Eastern Time on Friday, August 7. Serving in a leadership position for a Section or Roundtable is a great way to learn about SAA and its governance, contribute to new directions for the Society, and work with other archivists on interesting projects. It is also a great way to serve the Society! Your RLG Roundtable Co-Chairs, Thomas G. Knoles Marcus A. McCorison Librarian American Antiquarian Society M.A. Matienzo Applications Developer, Digital Experience Group The New York Public Library
  • The Archival, The Irreconcilable, and The Unwebbable: Three Horsemen and/or Stooges

    This week in Charlottesville has been a whirlwind exploration of standards and implementation strategies thus far during my class, Designing Archival Description Systems, at Rare Book School. My classmates and I have been under the esteemed tutelage of Daniel Pitti, who has served as the technical architect for both EAD and EAC. Interestingly, there's been a whole lot of talk about linking data, linked data, and Linked Data, date normalization, and print versus online presentation, among other things. In addition, a few things have floated past on my radar screen this week that have seemed particularly pertinent to the class. The first of these was a post by Stefano Mazzocchi of Metaweb, "On Data Reconciliation Strategies and Their Impact on the Web of Data". In Stefano's post, he wrote about the problem of a priori data reconciliation vs. a posteriori; in other words, whether you iron out the kinks, apply properties like owl:sameAs, etc., on the way in or on the way out.
  • "Summer Camp for Archivists" Sounds So Much Better

    Crossposted to NYPL Labs. I'm staying with colleagues and good friends during my week-long stint in Charlottesville, Virginia for Rare Book School. If you're here - particularly if you're in my class (Daniel Pitti's Designing Archival Description Systems) - let me know. I'm looking forward to a heady week dealing with descriptive standards, knowledge representation, and as always, doing my best to sell the archives world on Linked Data. Notes and thoughts will follow, as always, on here.
  • "Using the OCLC WorldCat APIs" now available in Python Magazine

    As of last Thursday, I have been inducted into the pantheon of published Python programmers (aye, abuse of alliteration is always acceptable). My article, "Using the OCLC WorldCat APIs," appears in the latest issue (June 2009) of Python Magazine. I'd like to thank my editor, Brandon Craig Rhodes, for helping me along in the process, not the least of which includes catching bugs that I'd overlooked. The article includes a brief history lesson about OCLC, WorldCat, and the WorldCat Affiliate APIs, a detailed introduction to worldcat, my Python module to interact with OCLC's APIs, and a brief introduction to SIMILE Exhibit, which helps generate the holdings mashup referenced earlier on my blog. Subscribers to Python Magazine have access to a copy of the code containing a functional OCLC Web Services key ("wskey") to explore the application.
  • NYART Presentation: Archives & The Semantic Web

    This last Tuesday, I spoke at the Annual Meeting of the Archivists' Roundtable of Metropolitan New York, where I gave a talk on archives and the Semantic Web. The presentation went over very well, and colleagues from both the archives field and the semantic technology field were in attendance. I did my best to keep the presentation not overtly technical and cover just enough to get archivists to think about how things could be in the future. I also have to give a big hat tip to Dan Chudnov, whose recent keynote at the Texas Conference on Digital Libraries helped me organize my thoughts. Enjoy the slides, and as always, I relish any feedback from the rest of you.
  • Archives and The Semantic Web (for Archivists)

  • Drupal For Archivists: Documenting the Asian/Pacific American Community with Drupal

    Over the course of the last academic year, I have been part of a team working on survey project aimed at identifying and describing archival collections relating to the Asian and Pacific American community in the New York City metropolitan area. The results of the fifty-plus collections we surveyed have been posted on our Drupal-powered website, which has been an excellent fit for the needs of this project and has also enabled us to engage many of the challenges the project has presented. By way of introduction, this survey project seeks to address the underrepresentation of East Coast Asian/Pacific Americans in historical scholarship and archival repositories by working with community-based organizations and individuals to survey their records and raise awareness within the community about the importance of documenting and preserving their histories. Funded by a Documentary Heritage Project grant from METRO: Metropolitan New York Library Council, the project is a collaborative effort between the Asian/Pacific/American Institute and the Tamiment Library/Robert F.
  • Using the OCLC WorldCat APIs

  • worldcat In The Wild at OCLC's WorldCat Mashathon in Amsterdam

    It's good to see other people using your code. Thanks to the OCLC Devnet Blog, I found out that Etienne Posthumus used worldcat for a demo application he built during the WorldCat Mashathon in Amsterdam last week. Even more interesting is that Etienne's application was deployed on Google App Engine. Courtesy of OCLC's Alice Sneary, there is a brief video of Etienne presenting his application to the other Mashathon attendees:
  • Batch Reindexing for Drupal + Solr

    Crossposted to NYPL Labs. Sorry for any duplication! Hey, do you use Drupal on a site with several thousand nodes? Do you also use the Apache Solr Integration module? If you're like me, you've probably needed to reindex your site but couldn't be bothered to wait for those pesky cron runs to finish — in fact, that's what led me to file a feature request on the module to begin with. Well, fret no more, because thanks to me and Greg Kallenberg, my illustrious fellow Applications Developer at NYPL DGTL, you can finally use Drupal's Batch API to reindex your site. The module is available as an attachment from that same issue node on Nota bene: this is a really rough module, with code swiped pretty shamelessly from the Example Use of the Batch API page on It works, though, and it works well enough as we tear stuff down and build it back up over and over again.
  • Archives & The Semantic Web (for Semantic Technologists)

  • DigitalNZ and Brooklyn Museum API Modules for Python

    I've been busy the last few weeks, so I didn't even really announce this to begin with! I've been playing around with some of the cultural heritage APIs that are available, some of which I learned about while I was at Museums and the Web 2009. While I was away I released code for a Python module for interacting with the Brooklyn Museum Collections API. After chatting with Virginia Gow from DigitalNZ, I also got motivated to write a Python module to interact with the DigitalNZ API. The code for both is fairly unpolished, but I'm always ready for feedback! Both modules are available as Mercurial repositories linked from my Bitbucket account. There's also a small cluster of us working on a museum API wiki to begin sorting out some of these issues. Comparably speaking, the library and archives world has it somewhat easy...
  • The Medium Is Not The Message

    "Electronic records" is a particularly awful phrase and does not even actually capture anything about the underlying records at all. As far as the term goes, it's not too far off from "machine readable records." As a profession, can we start actually thinking critically about the underlying technical issues and push for using terms that more accurately describe what it is we're dealing with? I understand it's a convenient catch-all term, but there is a large range of issues that differ with the kinds of data and systems.
  • Drupal for Archivists: A Drupal-built Archives Reference Blog

    When Mark asked me to write about our use of Drupal at the Dickinson College Archives and Special Collections, the first thing I thought about was when our Archives Reference Blog was initially launched in April 2007. I couldn't believe that it has been two years already. I am pleased to report that my colleagues at Dickinson and I are enormously happy with the results of those two years. I hope others may find this brief explanation of how and why we are using Drupal as a reference management tool to be helpful and instructive. The concept for our implementation of Drupal was a simple one. I was thinking about the fact that we help researchers everyday to locate information that they want, but that what they discover among our collections or learn from them seldom gets shared, except by those who write for publication. So, what if we shared via the web, through a simple blog format, the basic questions posed by our researchers along with a simple summary of the results?
  • Why You Should Support Linked Data

    If you don't, I'll make your data linkable.
  • Coming Soon: Drupal for Archivists

    I've been fairly quiet lately as I've been busy with this and that, but I thought I'd let everyone know that I've been beginning to put together a series of posts entitled "Drupal for Archivists." Drupal, as you may or may not know, is a flexible and extensible open source content management system. There will be a general overview of some of the important concepts, but it'll focus less on the basics of getting people up and running — there are plenty of resources out there, such as the wonderful tutorials and articles available from Lullabot. Instead, I've drafted a handful of guest bloggers to discuss how and why they're using Drupal. Keep your eyes peeled!
  • Brooklyn Museum Releases API

    The always groundbreaking Brooklyn Museum has now released an API to allow the public to interact with their collections data. I can't even tell you how happy I am about this in terms of an open data perspective. Also, this is the direction that makes the whole "detailed curation by passionate amateurs" thing possible. There are only three simple methods for accessing the data. Ideally, it would be nice to see them put their collections metadata up as linked data, but now I'm daring to dream a little. Hey, wait a minute! I think that's the perfect way to start playing around with the API. Doing some digging through the documentation, I'm seeing that all the objects and creators seem to have URIs. Take a crack at it - the registration form is ready for you.
  • Moving worldcat to Mercurial and Bitbucket

    It's official - I've moved the codebase for worldcat, my Python module for working with the OCLC WorldCat APIs, to be hosted on Bitbucket, which uses the Mercurial distributed version control system. You can find the new codebase at

  • How I Failed With Distributed Version Control Systems, Archival Metadata, and Workflow Integration

  • HOWTO Meet People and Have Fun At Code4libcon 2009

  • Make Me A Structured Vocabulary Or I'll Make One For You

    The Society of American Archivists released the Thesaurus for Use in College and University Archives as an electronic publication this week. Specifically, it was issued as a series of PDF files. Is this data stored in some sort of structured format somewhere? If so, it's not available directly from the SAA site. There's no good reason why TUCUA shouldn't be converted to structured, linkable data, expressed using SKOS, the Simple Knowledge Organization System. It's not like I need another project, but I'm sure I could write some scraper to harvest the terms out of the PDF, and while I'm at it, I could write one to also harvest the Glossary of Archival Terminology. Someone, please stop me. I really don't need another project.
  • Go FOAF Yourself

    I'm really looking forward to next week's code4lib conference in Providence, despite my utter failure to complete or implement the project on which I am presenting. In particular, I'm really looking forward to the linked data preconference. Like some of my other fellow attendees, I've hammered out a FOAF file for the preconference already so that Ed Summers' combo FOAF crawler and attendee info web app. This is what the sample output looks using my FOAF data. It's good to see we're well on our way to have an easily creatable sample type of RDF data for people to play with. At a bare minimum, you can create your FOAF data using FOAF-A-Matic and then edit it to add the assertions you need to get it to play nice with Ed's application. See you in Providence, but go FOAF yourself first.
  • Developing Metrics for Experimental Forms of Outreach

    ArchivesNext recently inquired about how archivists measure success of 2.0 initiatives. It's hard to determine some 2.0-ish initiatives will really impact statistics when you don't really define what the results you're trying to see. I'd like to open the question further — how do we begin developing metrics for things that sit on the cusp between forms of outreach? Furthermore, I'm curious to see where this information is captured — do archivists wait until the end to gather survey data, or if they working towards something like we at NYPL Labs are doing with Infomaki, our new usability tool developed by Michael Lascarides, our user analyst.
  • dEAD Reckoning #2: Mixing/Matching With Namespaces and Application Profiles

    So, it's time for another rant about my issues with EAD. This one is a pretty straightforward and short one, and comes down to the issue that I should essentially be able to mix and match metadata schemas. This is not a new idea, and I'm tired of the archives community treating it like it is one. Application profiles, as they are called, allow us to define a structured way to combine elements from different schemas, prevent addition of new and arbitrary elements, and tighten existing standards for particular use cases. However, to a certain extent, the EAD community has accepted the concept of combining XML namespaces but on a very limited level. The creation of the EAD 2002 Schema allows EAD data to be embedded into other XML documents, such as METS. However, I can't do it the other way around; for example, I can't work a MODS or MARCXML record into a finding aid. Why not? As I said in my last dEAD Reckoning rant as well as during my talk at EAD@10, the use of encoding analog attributes is misguided, confusing, and just plain annoying.
  • You're All Sheep

    Made by Twittersheep, a new project made (in part) by my acquaintance Ted Roden, a creative technologist for New York Times Research & Development.
  • A Bird's Eye View of Archival Collections

    Mitchell Whitelaw is a Senior Lecturer in the Faculty of Design and Creative Practice at the University of Canberra and the 2008 winner of the National Archives of Australia's Ian Maclean Award. According to the NAA's site, the Ian Maclean Award commemorates archivist Ian Maclean, and is awarded to individuals interested in conducting research that will benefit the archival and historical profession in Australia and promote the important contribution that archives make to society. Dr. Whitelaw has been keeping the world up to date on his work using his blog, The Visible Archive. His work fits well with my colleague Jeanne Kramer-Smyth's archival data visualization project, ArchivesZ, as well as the multidimensional visualization projects underway at the Humanities Advanced Technology & Information Institute at the University of Glasgow. However, his project fascinates me for a few specific reasons. First of all, the scale of the datasets he's working with are astronomically larger than those that any other archival visualization project has tried to tackle so far.
  • API Fun: Visualizing Holdings Locations

    In my previous post, I included a screenshot of a prototype, but glossed over what it actually does. Given an OCLC record number and a ZIP code, it plots the locations of the nearest holdings of that item on a Google Map. Pulled off in Python (as all good mashups should be), along with SIMILE Exhibit, it uses the following modules: geopy simplejson and, of course, worldcat. If you want to try it out, head on over here. The curent of the code will soon be able as part of the examples directory in the distribution for worldcat, which can be found in my Subversion repository.
  • This Is All I'm Going To Say On This Here Blogsite Concerning The Brouhaha About The Policy for Use and Transfer of WorldCat Records Because I Have Other, More Interesting And More Complex Problems To Solve (And So Do You)

    The moderated discussion hosted and sponsored by Nylink went pretty well. Also, I don't need the records to have fun with the data "” I just need robust APIs. (In fact, as I said today, I'd prefer not to have to deal with the MARC records directly.) Robust APIs would help making prototypes like this one I hacked together in a few hours into a real, usable service.
  • Lightening the load: Drupal and Python

    Man, if this isn't a "you got your peanut butter in my chocolate thing" or what! As I wrote over on the NYPL Labs blog, we've been up to our necks in Drupal at MPOW, and I've found that one of the great advantages of using it is rapid prototyping without having to write a whole lot of code. Again, that's how I feel about Python, too, but you knew that already. Once you've got a prototype built, how do you start piping stuff into it? In Drupal 6, a lot of the contrib modules to do this need work - most notably, I'm thinking about node_import, which as of yet still has no (official) CCK support for Drupal 6 and CCK 2. In addition, you could be stuck with having to write PHP code for the heavy lifting, but where's the joy in that? Well, it so happens that the glue becomes the solvent in this slow, slow dance.
  • Old Stuff, New Tricks: How Archivists Are Making Special Collections Even More Special Using Web 2.0 Technologies