Is Open Data the Point?

I've been thinking about the biblioblogosphere's reaction to Casey Bisson's decision to use the $50,000 he was awarded by the Mellon Foundation for his work on WPopac to purchase LC bibliographic data and open it up to anyone who wanted to take a crack at it. Yes, this is a "Good Thing," and valuable to the library community as a whole, but I feel like there are some things we're overlooking. Dan Chudnov and I seem to agree, but I'm not going to go so far to damn those who herald this as a "new era." It's a little premature to say where it will go, but I have to admit that I'm occasionally confused and often a little bit insulted by some of the talk surrounding this issue.

I wonder how interesting all the bibliographic data of LC is to begin with. What's in the dump paid for by the Mellon Award money? I'd guess monographs and serials, and probably audiovisual materials. What about archival records? What would anyone do with those? Those won't be interesting to the small libraries that could benefit the most from this altruistic move, and in fact I believe that the biggest problem other than maintaining the records for changes will be separating the wheat from the chaff, which is ultimately an institutional (departmental, consortial, individual ...) decision. I'd love a dump of all the archival records, but I don't know what I'd do with them all; it's much easier for me to wade through them using their OPAC for the time being when I do institutional surveys.

Dan already emphasized that much of the discussion ignores existing collaborative workflows and that catalogers around the world are busting their humps to create this data. Listening to the Talis podcast with Tim Spalding of LibraryThing and Ross Singer (among others), I was a little surprised that Tim said "Librarians are very restricted in terms of what they can do with [bibliographic data]." I can really respect that people want to experiment with bibliographic data to improve access to information or just for the fun of hacking through it, but the assumption that librarians are very restricted in what they do with it seems a little misguided. I guess some are, but my gut reaction to this comment is something that I ended up leading a discussion about at Library Camp East last September: the divide between "techies" and "librarians," something that's often a false dichotomy yet often very real. I get this feeling often when I read threads on the NGC4LIB list, too. Maybe it's not quite the same divide, but I feel that those who want to play with our catalog data aren't talking to the catalogers. I realize some are expressing fear, uncertainty and doubt that anyone can mark up the sacred cow of an OPAC, but for every one of them I feel like there are several of us that are willing to play along and even help do the work. In nearly any situation, I'd be glad to provide a dump of a catalog to whomever wanted it as long as they gave me at least a vague idea of what they wanted to do with it.



  • 💬 Ross at December 29, 2006, 22:58 UTC:

    I actually don't dispute what you're saying here. And, yes, there is a disconnect between the techies and the catalogers, but I'm not sure how opening up the data interferes with catalogers' workflows (neither you nor Dan have really made that clear).

    The problem with your last statement (and this was my point and, really, the only reason I was on the podcast at all, as far as I can tell) is, are you sure you can give anybody a dump of your catalog? Are you sure that you actually own your data?

    I don't really care much about the LC data. I don't really care much about your data. But somebody might be. I am, however, interested in data from my own catalog, Emory's catalog (we have a joint degree program) and the public libraries that serve our communities. But why should I have to explain to the myriad public libraries what my intentions are?

  • 💬 Clay at January 5, 2007, 07:44 UTC:

    It seems as if the NUCMC records will no longer be available in the distributions, as well:

    That's a really rich source of records for archivists to lose, especially for a place like AIP and its ICOS. It's a shame, because not all small archives can afford any of the OCLC/RLG services that will pick up the slack