planning for disaster
how to set up unmaintainable indexing workflows
mark a. matienzo
center for history of of physics
american institute of physics
background
- disclaimer: opinions expressed are mine alone and don't reflect those of CHP or AIP
- CHP has an archives, but we have little space
- one of our main responsibilities is coordinating placement of collections
- even though we don't keep the collections, we maintain metadata
- ICOS: MARC data served through Horizon
- PHFAWS: EAD/HTML/PDF finding aids
background: PHFAWS
- http://aip.org/history/ead/
- goal: create searchable index of finding aids for history of physics, etc. collections
- contains both finding aids for CHP held collections and ones held by other archives
- originates in early collaborative EAD project (ca. 1999)
- we still host a few finding aids for other archives
background: indexing
- we were pushed to use Verity (AIP had lock-in already)
- it never really worked well over a period of 4 years
- 2 or 3 months ago we began to rethink the process
- BUT we still had to use verity, and we couldn't run the indexer ourselves
implementation (2)
- we were told that verity could parse this file
- in reality, it was used to create (via XSLT) a bash script to call indexer
- we still couldn't run indexer ourselves
- "solution": someone created a CGI perl script where we could "just click a button" to run bash script
end product?
credit: this old house home inspection nightmares
problems
- transforms are done manually
- if we don't remember to run transform, then indexer won't add new data
- we still can't debug indexer problems ourselves
- CGI page not behind firewall (potential DoS vulnerabilty)