skip to content

metadata

  • RightsStatements.org

    RightsStatements.org is a consortial initiative that focuses on providing a set of interoperable rights statements that make it easier for people to engage and reuse digital cultural heritage, particularly when provided by metadata aggregators. Since 2014, I have served as the co-chair of the Consortium’s Technical Working Group, which is responsible for maintaining the initiative’s vocabulary and infrastructure. We maintain our projects on GitHub.

  • Linked Data and Archives

    I’ve been involved in modeling archival information as linked data since about 2009. Most recently, this includes work with the ARLIS/NA, RBMS, and SAA Joint Task Force on Art and Rare Materials BIBFRAME Ontology Extension. and Archives and Linked Data Interest Group.

  • Metadata aggregation

    I have worked on metadata aggregation and consortial discovery projects since the beginning of my career, much of which focused on special collections and archives.

  • Hyku

    Hyku, formerly Hydra in a Box, is a next-generation repository solution based on Samvera. From I served as the Project Director (2015-2016), and Data Modeler (2016-2017) for the implementation partnership grant between DPLA, Stanford University, and DuraSpace, and funded by the Institute of Museum and Library Services.

  • Description Peddlers and Data.gov: Two Peas In a Pod

    As you may have heard, the National Archives issued a press release today announcing the release of three data sets on Data.gov: The first milestone of the Open Government Directive was met on January 22 with the release of new datasets on Data.gov. Each major government agency has uploaded at least three datasets in this initial action. The National Archives released the 2007—2009 Code of Federal Regulations and two datasets from its Archival Research Catalog. This is the first time this material is available as raw data in XML format. The Archival Research Catalog, or ARC, is NARA's primary access system for archival description, representing 68% of NARA's entire holdings. This breaks down to the following: 2,720,765 cubic feet 520 record groups 2,365 collections 102,598 series 3,265,988 file units 292,887 items In addition, there are 6,354,765,793 logical data records and 465,050 artifacts described in ARC. NARA's decision to share this data is a breakthrough for archives and people who love data.
  • pybhl: Accessing the Biodiversity Heritage Library's Data Using OpenURL and Python

    Via Twitter, I heard about the Biodiversity Heritage Library's relatively new OpenURL Resolver, announced in their blog about a month ago. More specifically, I head about Matt Yoder's new Ruby library, rubyBHL, which exploits the BHL OpenURL Resolver to provide metadata about items in their holdings and does some additional screenscraping to return things like links to the OCRed version of the text. In typical fashion, I've ported Matt's library to Python, and have released my code. pybhl is available from my site, PyPI, and Github. Use should be fairly straightforward, as seen below: >>> import pybhl >>> import pprint >>> b = pybhl.BHLOpenURLRequest(genre='book', aulast='smith', aufirst='john', date='1900', spage='5', volume='4') >>> r = b.get_response() >>> len(r.data['citations']) 3 >>> pprint.pprint(r.data['citations'][1]) {u'ATitle': u'', u'Authors': [u'Smith, John Donnell,'], u'Date': u'1895', u'EPage': u'', u'Edition': u'', u'Genre': u'Journal', u'Isbn': u'', u'Issn': u'', u'ItemUrl': u'http://www.biodiversitylibrary.org/item/15284', u'Language': u'Latin', u'Lccn': u'', u'Oclc': u'10330096', u'Pages': u'', u'PublicationFrequency': u'', u'PublisherName': u'H.N. Patterson,', u'PublisherPlace': u'Oquawkae [Ill.] :', u'SPage': u'Page 5', u'STitle': u'', u'Subjects': [u'Central America', u'Guatemala', u'Plants', u''], u'Title': u'Enumeratio plantarum Guatemalensium imprimis a H.