As you may have heard, the National Archives issued a press release today announcing the release of three data sets on Data.gov: The first milestone of the Open Government Directive was met on January 22 with the release of new datasets on Data.gov. Each major government agency has uploaded at least three datasets in this initial action. The National Archives released the 2007—2009 Code of Federal Regulations and two datasets from its Archival Research Catalog. This is the first time this material is available as raw data in XML format. The Archival Research Catalog, or ARC, is NARA's primary access system for archival description, representing 68% of NARA's entire holdings. This breaks down to the following: 2,720,765 cubic feet 520 record groups 2,365 collections 102,598 series 3,265,988 file units 292,887 items In addition, there are 6,354,765,793 logical data records and 465,050 artifacts described in ARC. NARA's decision to share this data is a breakthrough for archives and people who love data.
So, it's time for another rant about my issues with EAD. This one is a pretty straightforward and short one, and comes down to the issue that I should essentially be able to mix and match metadata schemas. This is not a new idea, and I'm tired of the archives community treating it like it is one. Application profiles, as they are called, allow us to define a structured way to combine elements from different schemas, prevent addition of new and arbitrary elements, and tighten existing standards for particular use cases. However, to a certain extent, the EAD community has accepted the concept of combining XML namespaces but on a very limited level. The creation of the EAD 2002 Schema allows EAD data to be embedded into other XML documents, such as METS. However, I can't do it the other way around; for example, I can't work a MODS or MARCXML record into a finding aid. Why not?
Man, if this isn't a "you got your peanut butter in my chocolate thing" or what! As I wrote over on the NYPL Labs blog, we've been up to our necks in Drupal at MPOW, and I've found that one of the great advantages of using it is rapid prototyping without having to write a whole lot of code. Again, that's how I feel about Python, too, but you knew that already. Once you've got a prototype built, how do you start piping stuff into it? In Drupal 6, a lot of the contrib modules to do this need work - most notably, I'm thinking about node_import, which as of yet still has no (official) CCK support for Drupal 6 and CCK 2. In addition, you could be stuck with having to write PHP code for the heavy lifting, but where's the joy in that? Well, it so happens that the glue becomes the solvent in this slow, slow dance.