skip to content

pybhl: Accessing the Biodiversity Heritage Library's Data Using OpenURL and Python

Via Twitter, I heard about the Biodiversity Heritage Library's relatively new OpenURL Resolver, announced in their blog about a month ago. More specifically, I head about Matt Yoder's new Ruby library, rubyBHL, which exploits the BHL OpenURL Resolver to provide metadata about items in their holdings and does some additional screenscraping to return things like links to the OCRed version of the text.

In typical fashion, I've ported Matt's library to Python, and have released my code. pybhl is available from my site, PyPI, and Github. Use should be fairly straightforward, as seen below:

>>> import pybhl
>>> import pprint
>>> b = pybhl.BHLOpenURLRequest(genre='book',
aulast='smith', aufirst='john', date='1900',
spage='5', volume='4')
>>> r = b.get_response()
>>> len(r.data['citations'])
3
>>> pprint.pprint(r.data['citations'][1])
{u'ATitle': u'',
 u'Authors': [u'Smith, John Donnell,'],
 u'Date': u'1895',
 u'EPage': u'',
 u'Edition': u'',
 u'Genre': u'Journal',
 u'Isbn': u'',
 u'Issn': u'',
 u'ItemUrl': u'http://www.biodiversitylibrary.org/item/15284',
 u'Language': u'Latin',
 u'Lccn': u'',
 u'Oclc': u'10330096',
 u'Pages': u'',
 u'PublicationFrequency': u'',
 u'PublisherName': u'H.N. Patterson,',
 u'PublisherPlace': u'Oquawkae [Ill.] :',
 u'SPage': u'Page 5',
 u'STitle': u'',
 u'Subjects': [u'Central America', u'Guatemala', u'Plants', u''],
 u'Title': u'Enumeratio plantarum Guatemalensium imprimis a H. de Tuerckheim collectarum /quas edidit John Donnell Smith.',
 u'TitleUrl': u'http://www.biodiversitylibrary.org/bibliography/827',
 u'Url': u'http://www.biodiversitylibrary.org/page/707932',
 u'Volume': u'4'}

Let me know if you find it useful - I'd appreciate any feedback!

Comments