skip to content

Posts

  • Description Peddlers and Data.gov: Two Peas In a Pod

    As you may have heard, the National Archives issued a press release today announcing the release of three data sets on Data.gov: The first milestone of the Open Government Directive was met on January 22 with the release of new datasets on Data.gov. Each major government agency has uploaded at least three datasets in this initial action. The National Archives released the 2007—2009 Code of Federal Regulations and two datasets from its Archival Research Catalog. This is the first time this material is available as raw data in XML format. The Archival Research Catalog, or ARC, is NARA's primary access system for archival description, representing 68% of NARA's entire holdings. This breaks down to the following: 2,720,765 cubic feet 520 record groups 2,365 collections 102,598 series 3,265,988 file units 292,887 items In addition, there are 6,354,765,793 logical data records and 465,050 artifacts described in ARC. NARA's decision to share this data is a breakthrough for archives and people who love data.
  • Onward And Upward...

    It's fitting that this the hundredth (gosh, only the hundredth?) post, because I have rather important news. First, my fellow developers/producers/UX designers at The New York Public Library and I have been dealing with every minute detail on the upcoming, Drupal-based replacement to the NYPL website. You can see a live preview at http://new.nypl.org/. I can proudly say that this project has helped both me personally and NYPL overall play nice in the open source world - we've been actively contributing code, reporting bugs, and sending patches to the Drupal project. Also, our site search is based on Solr, which always bears mention. In addition, after a working tirelessly as a developer at NYPL for the last year and a half, I have decided to move onward and upward. I am leaving the cozy environs of the still-recently renovated office space I share with my spectacular coworkers. It was not an easy decision by far, but it feels like the best one overall.
  • Clifford Lynch Clarifies Position on Open Source ILSes

    Clifford Lynch, Executive Director of the Coalition for Networked Information, has responded to the leaked SirsiDynix report that spreads horrific untruths about open source. Marshall Breeding posted Lynch's response on GuidePosts. In particular, Lynch notes the following: I don't think that I ever wrote those words down in an article; I suppose I may have said something to that effect in an interview or q&a in some conference program like ALA Top Tech, though perhaps no quite as strongly as it's expressed here. I have without question spoken out about my concerns regarding investment in open source ILS development in the last few years. IF I did say this, it feels like it's used a little out of context -- or maybe the better characterization is over-simplistically -- in the report. ... I think there are still major problems -- many of which we really don't know how to solve effectively, and which call for sustained and extensive research and development -- in various areas where ILS get involved in information discovery and the support of research and teaching.
  • SirsiDynix Report Leaked, Spreading Fear, Uncertainty and Doubt about Open Source

    Thanks to Twitter, I discovered that Wikileaks has posted a report written by SirsiDynix Vice President for Innovation Stephen Abram which spreads a fantastic amount of fear, uncertainty and doubt about both open source software in general and, more specifically, the suitability of open source integrated library systems. As the summary provided by Wikileaks states, This document was released only to a select number of existing customers of the company SirsiDynix, a proprietary library automation software vendor. It has not been released more broadly specifically because of the misinformation about open source software and possible libel per se against certain competitors contained therein ... The source states that the document should be leaked so that everyone can see to what extent SirsiDynix will attempt to spread falsehoods and smear open source and the proponents of open source. In addition, as you may have heard, the Queens Library is suing SirsiDynix for breach of contract; for what it's worth, the initial conference is scheduled for next Monday, November 2, 2009.
  • pybhl: Accessing the Biodiversity Heritage Library's Data Using OpenURL and Python

    Via Twitter, I heard about the Biodiversity Heritage Library's relatively new OpenURL Resolver, announced in their blog about a month ago. More specifically, I head about Matt Yoder's new Ruby library, rubyBHL, which exploits the BHL OpenURL Resolver to provide metadata about items in their holdings and does some additional screenscraping to return things like links to the OCRed version of the text. In typical fashion, I've ported Matt's library to Python, and have released my code. pybhl is available from my site, PyPI, and Github. Use should be fairly straightforward, as seen below: >>> import pybhl >>> import pprint >>> b = pybhl.BHLOpenURLRequest(genre='book', aulast='smith', aufirst='john', date='1900', spage='5', volume='4') >>> r = b.get_response() >>> len(r.data['citations']) 3 >>> pprint.pprint(r.data['citations'][1]) {u'ATitle': u'', u'Authors': [u'Smith, John Donnell,'], u'Date': u'1895', u'EPage': u'', u'Edition': u'', u'Genre': u'Journal', u'Isbn': u'', u'Issn': u'', u'ItemUrl': u'http://www.biodiversitylibrary.org/item/15284', u'Language': u'Latin', u'Lccn': u'', u'Oclc': u'10330096', u'Pages': u'', u'PublicationFrequency': u'', u'PublisherName': u'H.N. Patterson,', u'PublisherPlace': u'Oquawkae [Ill.] :', u'SPage': u'Page 5', u'STitle': u'', u'Subjects': [u'Central America', u'Guatemala', u'Plants', u''], u'Title': u'Enumeratio plantarum Guatemalensium imprimis a H.
  • Access and Description Reconsidered

    What exactly is archival access, and how does archival description make it possible? I feel like that in some form or another I've been struggling with this question throughout my career. Recently, this blog post from The Top Shelf, the blog of the University of Texas at San Antonio Archives and Special Collections Department, came across my radar, wherein they write (emphasis in original): UTSA Archives and Special Collections is among the growing number of archives to create an online presence for every one of its collections. ... We were able to utilize inventories generated by former and current collection assistants to create guides to the collection with folder-level and box-level descriptions. The project resulted in access to more than 130 collections and 2000 linear feet of materials. What defines that accessibility? I certainly don't intend to be a negative Nancy about this - adding finding aids and other descriptive metadata about collections is obviously useful. But how has it necessarily increased access to the materials themselves?
  • AIP Receives NHPRC Funding To Digitize Samuel Goudsmit Papers

    I'm happy to pass on the news that my former employer, the Niels Bohr Library & Archives of the American Institute of Physics, has received funding from the National Historical Publications and Records Commission to digitize the entirety of the Samuel Goudsmit papers. From the announcement on the Center for History of Physics/Niels Bohr Library & Archives Facebook page: Goudsmit (1902—1978) was a Dutch-educated physicist who spent his career in the US and was involved at the cutting edge of physics for over 50 years. He was an important player in the development of quantum mechanics in the 1920s and 1930s; he then served as scientific head of the Alsos Mission during World War II, which assessed the progress of the German atomic bomb project. Goudsmit became a senior scientist at Brookhaven National Laboratory and editor-in-chief of the American Physical Society. The papers consist of an estimated 66,000 documents, which include correspondence, research notebooks, lectures, reports, and captured German war documents; the collection is the most used in the library.
  • A Gentle Reminder

    On the eve of teaching my first class of my course (LIS901-08, or, Building Digital Libraries: Infrastructural and Social Aspects) at LIU's Palmer School of Information and Library Science, I'd like to remind you of the following. The syllabus is available on online, if you're curious.
  • Privacy, Censorship, and Good Records Management: Brooklyn Public Library in the Crosshairs

    Over at librarian.net, Jessamyn West has a brief write up about a post on the New York Times' City Room blog about placing access restrictions on offensive material (in this case, one of Hergé's early Tintin books at the Brooklyn Public Library). More interestingly, she notes, is that the Times was given access and accordingly republished challenges from BPL patrons and other community members. Quite astutely, Jessamyn recognizes that the patrons' addresses are removed but their names and City/State information are published. If your name is, for example, [name redacted], redacting your address doesn't really protect your anonymity. I'm curious what the balance is between patron privacy and making municipal records available. It's a good question that doesn't have an incredibly straightforward answer. My first concern was about whether BPL had kept the challenge correspondence beyond the mandated dates in the New York State records schedules. After doing some digging, on the New York State Archives' website, I came across Schedule MI-1 ("
  • Everything is Bigger in Texas, Including My Talks on The Semantic Web

    I'll be at the Society of American Archivists Annual Meeting next week in Austin, Texas. It looks to be a jam-packed week for me, with a full-day Standards Committee/TSDS meeting on Tuesday, followed by THATCamp Austin in the evening, an (expanded version of my) presentation on Linked Data and Archival Description during the EAD Roundtable on Wednesday, and Thursday's session (number 101): "Building, Managing, and Participating in Online Communities: Avoiding Culture Shock Online" (with Jeanne Kramer-Smyth, Deborah Wythe, and Camille Cloutier). And to think I haven't even considered which other sessions I'm going to! Anyhow, I hope to see you there, and please make either or both of my presentations if you can.