ohilist.py

Processing MARC into HTML

Mark A. Matienzo

What is ohilist.py?

Python script that creates static HTML list of NBL-held oral history interviews from MARC data

http://aip.org/history/oral_history/ohilist.html

Part of a group of three Python scripts used to convert MARC data into HTML for different purposes
Used by archives professional staff every few months to generate new list

Why Python?

Straightforward syntax, even for nonprogrammers
Old scripts used a number of languages (Perl, XSLT and Java for transforms, Unix shell, Windows batch)
pymarc

http://pypi.python.org/pypi/pymarc/
Does heavy lifting for all three scripts
Often faster than Perl's MARC modules
Active (but small) development community
I contributed to code to its development

Using ohilist.py

Create full dump of MARC data from Horizon, using specific export target
From command line: python ohilist.py [marcfile]
Upload HTML file to AIP webserver

Script architecture

Comprised of three files:

ohilist.py: script itself
ohitemplate.py: template for HTML
aipmarc.py: AIP extensions for pymarc

Template used to separate the layout from the rest of the code

How it works (1) - the Main Loop

for record in reader:
  if record['998'] is not None:
    if record['998']['c'] is not None: 
      collection = record['998']['c']
      if collection == 'oh':
        catdb = getCatdb(record)
        bibno = getBibno(record)
        url = 'http://www.aip.org/history/catalog/%s/%s.html' % (catdb, bibno)
        interviewee = marc8_to_unicode(record.author())
        interviewdate = '(Interview date: %s)' % getDate(record)
        interview = [interviewee, interviewdate]
        label = " ".join(interview)
        interviews.append((url, label))
        recordcounter += 1
    else: pass

How it works (2) - Getting the Date

def getDate(record):
  datelist = []
  if record['245']['f'] or record['245']['g']:
    if record['245']['f']: datelist.append(record['245']['f'])
    if record['245']['g']: datelist.append(record['245']['g'])
    return ' '.join(datelist)
  if record['260']:
    if record['260']['c']: return record['260']['c']
  if record['008'].value()[7:11].isdigit():
    datelist.append(record['008'].value()[7:11])
    if record['008'].value()[11:15].isdigit():
      datelist.append(record['008'].value()[11:15])
    if len(datelist) > 1: return '-'.join(datelist)
    else: return ''.join(datelist)
  if getBibno(record) is not None:
    sys.stderr.write('Could not derive date from bib number %s' % getBibno(record))
  else: 
    sys.stderr.write('No date or bib number in: %s' % record['245'].formatField())
  return None

How it works (3) - Sorting/Index

interviews.sort(key = lambda interviewkey: interviewkey[1].upper())

for interview in interviews:
  for letter in letters:
    initial = interview[1].upper()[0]
    if initial == letter:
      linkdata = '%s<br/>\n' % makeLink(interview[0], interview[1])
      addToIndex(ohiindex, letter, linkdata)

ohikeys = ohiindex.keys()
ohikeys.sort()
shortcutlist = [makeLink('#' + key, key) for key in ohikeys]
shortcutlinks = " ".join(shortcutlist)
listbody.append(shortcutlinks)
for key in ohikeys:
  listbody.append('<h2><a name="%s">%s</a></h2>\n' % (key, key))
  linklist = [ohilink for ohilink in ohiindex[key]]
  listbody.extend(linklist)
  listbody.append('<br/><a href="#top">Back to Top</a>\n')

Evaluation/Questions

Good, but not perfect
Would be better if it didn't need to be run manually
Nonetheless, best that we can do with what we have
E-mail: mark@127.0.0.1 @ matienzo.org

Mark A. Matienzo

ohilist.py: Processing MARC into HTML

ohilist.py

Processing MARC into HTML

Mark A. Matienzo

What is ohilist.py?

Why Python?

Using ohilist.py

Script architecture

How it works (1) - the Main Loop

How it works (2) - Getting the Date

How it works (3) - Sorting/Index

Evaluation/Questions