skip to content

Archive for 2008

  • dEAD Reckoning #1: A FaTHEADed Failure For Faceted Terms and Headings in EAD

    A while back, I wrote a Bad MARC Rant, and I considered titling this a Bad Metadata Rant. However, as the kids say, I got mad beef with a little metadata standard called Encoded Archival Description. Accordingly, I figured I should begin a new series of posts discussing some of these issues that I have with something that is, for better or for worse, a technological fixture of our profession. This is in part prompted by thoughts that I've had as a result of participating in EAD@10 and attending the Something New for Something Old conference sponsored by the PACSCL Consortial Survey Initiative. Anyhow, onto my first bone to pick with EAD. I'm incredibly unsatisfied with the controlled access heading tag <controlaccess/>. First of all, it can occur within itself, and because of this, I fear that there will be some sort of weird instance where I have to end up parsing a series of these tags 3 levels deep. Also, it can contain a <chronlist/>, which also seems pretty strange given that I've never seen any example of events being used as controlled access terms in this way.
  • Going off the Rails: Really Rapid Prototyping With Drupal

    Previously posted on The other Labs denizens and I are going off the rails on a crazy train deeper down the rabbit hole of reimplementing the NYPL site in Drupal. As I pile my work on the fire, I've found that building things in Drupal is easier than I'd ever thought it to be. It's a scary thought, in part because I'm no fan of PHP (the language of Drupal's codebase). Really, though, doing some things can be dead simple. It's a bit of a truism in the Drupal world at this point that you can build a heck of a lot just by using the CCK and Views modules. The important part is that you can build a heck of a lot without really having to know a whole lot of code. This is what threw me off for so long - I didn't realize that I was putting too much thought into building a model like I normally would with another application framework.
  • Does SAA Need To Support Who I Am?

    There's been a whole lot of discussion in the archivoblogosphere about the perceived need for quasi-informal interest groups that are fundamentally driven by identity. While I agree with this in theory, I must register my opposition to having SAA promote, support, or provide any sort of infrastructure for such groups. Fundamentally, I am against this because I believe it poses a strong threat to the privacy of archivists.
  • deliciouscopy: a dumb solution for a dumb problem

    You'd think there was some sort of tried and true script for Delicious users to repost bookmarks from their inboxes into their accounts, especially given that there are often shared accounts where multiple people will tag things as "for:foo" to have them show up on foo's Delicious account. Well, there wasn't, until now (at least as far as I could tell). Enter deliciouscopy. It uses pydelicious, as well as the Universal Feed Parser and simplejson. It reads a user's inbox, checks to see if poster of the for:whomever tag was added to your network, and reposts accordingly, adding a via: tag for attribution. It even does some dead simple logging if you need that sort of thing. The code's all there, and GPL license blah blah blah. I hacked this together in about an hour for something at MPOW - namely to repost things to our shared account. It's based on Michael Noll's but diverges from it fairly quickly. Enjoy, and give any feedback if you must.
  • Idle Hands Are The Devil's Plaything

    I've had my hands full lately. Two weeks ago I was at the MCN conference (wherein, among other things, I have continued my dominion as Archduke of Archival Description by taking over the MCN Standards SIG chair position from The Bancroft Library's Mary Elings), and next week I'm off to Philadelphia for the PACSCL Something New for Something Old conference. I hammered out the coherent, written version of my paper I gave at EAD@10. I prepared a proposal for next February's code4lib conference in Providence (ahem, vote for mine, if you're so inclined): Building on Galen Charlton's investigations into distributed version control systems for metadata management, I offer a prototype system for managing archival finding aids in EAD (Encoded Archival Description). My prototype relies on distributed version control to help archivists maintain transparency in their work and uses post-commit hooks to initiate indexing and publishing processes. In addition, this prototype can be generalized for any XML-based metadata schema. On top of that, I'm working with a fine group of folks on the RLG Programs project to analyze EAD editing and creation tools, doing hardcore schema mapping at work, and somehow finding enough time to play a little Doukutsu Monogatari to unwind.
  • Developing Automated Repository Deposit Modules for Archivists' Toolkit?

    I'd like to gauge interest for people to help add code to Archivists' Toolkit to automate the deposit of digital objects into digital repositories. At first glance, the biggest issue is having to deal with differing deposit APIs for each repository, but using something like SWORD would make sense to bridge this gap. Any and all feedback is welcome!
  • Python WorldCat Module v0.1.2 Now Available

    In preparation for the upcoming WorldCat Hackathon starting this Friday, I've made a few changes to worldcat, my Python module for interacting with OCLC's APIs. Most notably, I've added iterators for SRU and OpenSearch requests, which (like the rest of the module) painfully need documentation. It's available either via download from my site or via PyPI; please submit bug reports to the issue tracker as they arise. EDIT: I've bumped up the version number another micro number to 0.1.1 as I've just added the improvements mentioned by Xiaoming Liu on the WorldCat DevNet Blog (LCCN query support, support for tab-delimited and CSV responses for xISSNRequests, and support for PHP object responses for all xIDRequests). EDIT: Thanks to Thomas Dukleth, I was told that code for the Hackathon was to be licensed under the BSD License. Accordingly, I've now dual licensed the module under both GPL and BSD.
  • V8-Powered Libraries and the Happiness Engines that Run Them

    Previously posted on

    Games we used to play / writte... Digital ID: 1255304. New York Public Library

    A week ago today, a few of my DEG colleagues and I went to see Liz Lawley from RIT's Lab for Social Computing give a talk entitled "Libraries as Happiness Engines." It was a modified version of a talk she gave at this year's CiL conference. The gist of the talk was that gaming in libraries means not just using established games to draw the public into the library, but also to begin implementing game mechanics into libraries that allow them to flourish as social spaces. In particular, these game mechanics include things like collecting, points, feedback, exchanges, and customization.

    I've been ruminating on this for the last week or so in a couple different ways. First of all, I've been trying to figure out how we could implement game mechanics within NYPL.

  • An Open Letter to SAA Council and the 2009 Program Committee

    I apologize for using my blog to soapbox, but I felt like this was a significant concern that I should share with my readers. If you wish to support my position, please consider sending an e-mail to SAA Council and the 2009 Program Committee Chairs. Dear 2009 Program Committee Members and SAA Council Members, I understand that we are nearing the deadlines for submission of proposals for sessions at the 2009 Annual Meeting of the Society of American Archivists. I also understand the reasons behind having an earlier deadline than past years. However, I am deeply concerned with the decision to have the deadline set to be October 8, 2008, which is Yom Kippur and the day which the Jewish High Holidays end. As is often the case, conference proposals often coalesce at the last minute, and this is further complicated by the fact that the beginning of Rosh Hashana fell on September 29, 2008. I recognize that the deadline is most likely immutable at this point, but I am asking that SAA Council and future Program Committees pay attention to when the High Holidays fall in future years.
  • The Apex of Hipster XML GeekDOM: TEI-Encoded Dylan

    Via Language Log: The Electronic Textual Cultures Lab (ETCL) at the University of Victoria has, in an effort to draw more attention to TEI, chosen to prepare an encoded version of the lyrics to Bob Dylan's "Subterranean Homesick Blues" and overlaid the resulting XML over the song's video. The resulting video is available, naturally, on YouTube. ETCL's Ray Siemens writes about the reasoning behind this on the TEI Video Widgets blog: At the last gathering of the Text Encoding Initiative Consortium, in Maryland, a few of us were discussing the ways in which TEI has eluded some specific types of social-cultural representation that are especially current today . . . things like an avatar, or something that could manifest itself as a youtube posting. A quick search of youtube did reveal a significant and strong presence of sorts, but it was that of Tei the Korean pop singer (pronounced, we're told, "˜tay'); so, our quest began there, setting out modestly to create a video widget that would balance T-E-I and Tei in the youtube world.
  • Introducing djabberdjaw

    djabberdjaw is an alpha-quality Jabber bot written in Python that uses Django as an administrative interface to manage bot and user profiles. I've included a couple of plugins out of the box that will allow you to perform queries against Z39.50 targets and OCLC's xISBN API (assuming you have the requisite modules). djabberdjaw requires Django 1.0 or later, jabberbot, and xmpppy. It's available either from PyPI (including using easy_install) or via Subversion. You can browse the Subversion repository, too.
  • ArchivesBlogs 3.0

    Thanks to Jeanne from Spellbound Blog, I was made aware of the fact that ArchivesBlogs hadn't really been doing its job. So, I ripped out its guts and put it back together. It's running the latest, shiniest versions of WordPress, FeedWordPress, and Auto Delete Posts, and now it has added Feedburner and WP Stats goodness. Let me know if you discover any peculiarities in the updated set up.
  • Slaying the Scary Monsters

    Previously posted on

    Drawings of monster and devil. Digital ID: 434322. New York Public Library

    Getting up to speed is hard anywhere, and it's especially difficult in a large, complex institution like NYPL. Other than just understanding the projects that you're given, you also are thrown headfirst into making sense of the culture, the organization, and all the unspoken and occasionally unseen things that allow you to do your job. There's no clear place to start this, so a good portion of the time you have to keep on top of that while you start thrashing away at your work. The question remains, though, how do you organize this stuff? How do you enable sensemaking in yourself and your peers?

  • Everything Old is New Again

    Goodbye, WordPress - I've been drinking more of the KoolAid. I rebuilt my personal/professional site (not this blog) in Drupal. Migrating the content was pretty easy (about 15 static pages, no posts). The functionality is astounding - I only started working on redoing it yesterday and I've already got a great infrastructure. Expect a detailed post before too long, or at least a link to a colophon on said site.
  • Cheeseburgers With Everything: Context, Content, and Connections in Archival Description

  • Matienzo, The San Francisco Treat

    I'm packing up and heading out to SFO this evening for SAA2008. Right now I'm frantically backing up my Zotero repository, making sure I have a bunch of sources to peruse on the plane as I hack away on my slides for EAD@10. You might be surprised that my idea of me jumping out of a cake in the shape of an <archdesc> tag wearing a bathing suit was not even considered, so it looks like I'll actually have to put some coherent thoughts together. I've got to make a grand entrance somehow. I'll be chairing the Description Section meeting as well, so behave yourselves, kids.
  • Bad MARC Rant #1: Leader Positions 06 and 08

    I understand why the MARC leader position 08 is a good idea in theory. In fact, MARBI Proposal 97-07 suggests: a change in definition to Leader/08 code "a" for clarification; making code "t" (Manuscript language materials) obsolete in Leader/06 and using code "a" instead; redefinitions of codes "a" and "p" in Leader/06; renaming the 008 for Books to "Textual (Nonserial); and deleting field 006 for Mixed material. I can safely say that some pretty funky stuff gets cataloged with the leader position 08 set as "a," and much of it is incorrect, at $MPOW and otherwise. What is Leader/08 actually supposed to be used for? MARBI Proposal 97-07 again states: Code a indicates that the material is described according to archival descriptive rules, which focus on the contextual relationships between items and on their provenance rather than on bibliographic detail. The specific set of rules for description may be found in 040 $e. All forms of material can be controlled archivally.
  • Python WorldCat API module now available

    I'd like to humbly announce that I've written a pre-pre-alpha Python module for working with the WorldCat Search API and the xID APIs. The code needs a fair amount of work, namely unit tests and documentation. I've released the code under the GPL. The module, called "worldcat", is available from the Python Package Index. You can also checkout a copy of the code from my Subversion repository.
  • Canonization, Archivalization, and the 'Archival Imaginary'

  • Seriously, Follow Our Lead

    OCLC's Lorcan Dempsey makes a great point as usual in his post "Making tracks": In recent presentations, I have been suggesting that libraries will need to adopt more archival skills as they manage digital collections and think about provenance, evidential integrity, and context, and that they will also need to adopt more museum perspectives as they think about how their digital collections work as educational resources, and consider exhibitions and interpretive environments. I doubt that any archivist would disagree with this. Even better, I think this offers a great opportunity to reach out and have those in allied fields really understand how and why we've done things slightly different for so long. I'm glad to see that my new employer has picked up on this holistic approach with platforms like the NYPL Blogs.
  • Now, It Can Be Told

    After a little over two years processing, referencing, and cataloging, and hacking at AIP, I'm skipping up to the City That Never Sleeps to join Jay Datema, Josh Greenberg, and company in the NYPL Labs. I'd be lying if I said I wasn't thrilled about this opportunity, and I'm ready to see where my new job will take me. The next major hurdle will be finding a place to live, so if you're privy to anything in Brooklyn, please let me know.
  • ICA Releases International Standard for Describing Functions

    The ICA's Committee of Best Practices and Standards released the first edition of the International Standard for Describing Functions (ISDF). Like much of ICA's other work in descriptive standards for archives, ISDF is designed to be used in conjunction with established standards such as ISAD(G) and ISAAR(CPF), as well as standards in preparation such as ISIAH. ISDF will assist both archivists and users to understand the contextual aspects of the creation of records of corporate bodies. Through ISDF and related standards, archivists will be able to develop improved descriptive systems that can be potentially implemented using a Linked Data model.
  • Google Message Discovery

    Amidst this week of notorious hoaxes, Google has launched Google Message Discovery as an enterprise-focused add on for its Google Apps platform. Google Message Discovery goes well beyond a simple and reliable e-mail backup system and provides three key features of interest to records managers: Content-addressable storage for electronic mail stored immediately upon sending or retrieval Creating explicit retention policies based upon time Compliance with relevant laws and best practices Straightforward discovery for any use, regardless if internal or concerning litigation Google Message Discovery, as well as other related offerings such as e-mail security, clearly has its origins in Google's acquisition of Postini last year. Postini isn't some startup with dubious or perpetually beta offerings (e.g. Dodgeball or GrandCentral); some of their better known clients include BASF and Merrill Lynch. At $25 per user per year, the service seems to be an incredible steal.
  • EPODe: Extensible Platform for Oral History Delivery

  • Easy Peasy: Using the Flickr API in Python

    Since I'm often required to hit the ground running at $MPOW on projects, I was a little concerned when I roped myself into assisting our photo archives with a Flickr project. The first goal was to get a subset of the photos uploaded, and quickly. Googling and poking around the Cheeseshop led me to Beej's FlickrAPI for Python. Little did I know that it would be dead simple to get this project going. To authenticate: def create_session(api_key, api_secret): """Creates as session using FlickrAPI.""" session = flickrapi.FlickrAPI(api_key, api_secret) (token, frob) = session.get_token_part_one(perms='write') if not token: raw_input("Hit return after authorizing this program with Flickr") session.get_token_part_two((token, frob)) return session That was less painful than the PPD test for tuberculosis. Oh, and uploading? flickr.upload(filename=fn, title=title, description=desc, tags=tags, callback=status) Using this little code plus a few other tidbits, I created an uploader that parses CSV files of image metadata exported from an Access database. And when done, the results look a little something like this.
  • Movin' and shakin' in the archives world

    ArchivesNext recently discussed Library Journal's annual list of "Movers and Shakers," pondering what a comparable list in the archival profession would look like. For those who don't know, the list recognizes "library advocates, community builders, 2.0 gurus, innovators, marketers, mentors, and problem solvers transforming libraries." After some rumination, ArchivesNext is now calling for nominations to generate a similar list. Do your civic duty and nominate either a project, an individual, or even a situation worthy of this recognition!
  • Behind The Times: Where I Finally Speak Of code4lib 2008

    OK, OK. A post about code4libcon 2008 is long overdue. The minor details: the weather was nice, food was decent, good beer was abundant, and live music was enjoyable. Onto the real meat... This time around, I felt like I got a whole lot more out of attending; I'm not sure if this is due to the changing nature of my job, increased attention, or some other factor, like neckferrets and dongles. The great majority of the talks, be they keynotes, traditional presentations, or even just lightning talks, were excellent. Furthermore, this time around I felt a whole lot more connected to the miasma - so much so, in fact, that I ended up giving two lightning talks (or three, depending on if you consider the one I gave with Gabriel Farrell on kobold_chiefain Fac-Back-OPAC). The most impressive thing overall, though, were lolcats that came out to play: Thanks to the work of Noel Peden and Dan Scott, the videos should be up soon enough.
  • Kochief: Kobold Chieftain, a.k.a. Fac-Back-OPAC

  • Archivists' Toolkit

  • HOWTO Meet People and Have Fun at Code4lib

  • and the Dream of a Web 2.0 Backup System

    I just discovered through Peter Van Garderen's blog post about it. I was entirely surprised that I'd heard nary a peep about it. Some basic examination (running a WHOIS query on the domain) shows that it's still a fairly new project. I have to say, though, that I'm entirely impressed. Those involved have given a whole lot of thought to how they're going to be doing things, as evidenced by those who have signed up to be involved and the DataPortability Charter. To wit, the Charter's principles tend to speak for themselves: We want sovereignty over the profiles, relationships, content and media we create and maintain. We want open formats, protocols and policies for identity discovery, data import, export and sync. We want to protect user rights and privacy. And, of course, the thing that made me squeal with delight like a pig in mud: 4. DataPortability will not inventing any new standards. I mean, that's probably the best news that someone like me could get.
  • Announcing, or, how I stopped worrying and learned to love Z39.50

    After more than a few late nights and long weekends, I'm proud to announce that I've completed my latest pet programming project. is a lightweight Z39.50-Web gateway, written, naturally, in Python. None of this would be possible without the following Python modules: Aaron Lav's PyZ3950, the beast of burden; Ed Summers' pymarc, the smooth-talking translator; and, quite possibly the best and most straightforward Python web framework available. I initially undertook this project as an excuse to play with PyZ3950 and to teach myself the workings of; I'd played with Django, but it seemed entirely excessive for what I was working on. First, I should mention that isn't designed to be a complete implementation of a Z39.50 gateway. There are many areas in which there is much to be desired, and it's probably not as elegant as some would like. However, that wasn't the point of the project. My ultimate goal was to create a simple client that could be used as a starting point from which to develop a complete web application.
  • No Excuses To The Power of Infinity

    I have no excuses for not updating this blog. I thought about forcing myself to comply some sort of resolution - you know, given the new year and all - but everyone knows how those turn out. Regardless, I have a whole backlog of things to post about, most notably being the countless Python programming projects I've been working on lately. Expect more posts to arise over the next few days as a result of this. Also, I have no excuses for botching up ArchivesBlogs temporarily by mucking about and wiping out some of WordPress's databases that make FeedWordPress, the plugin that grabs content for ArchivesBlogs, do its thing. The recovery was simpler than I thought it would be, but this is probably the largest amount of unplanned downtime we've had. Keep your eyes open, as a replacement for FeedWordpress may itself becoming along sooner or later.