Lightening the load: Drupal and Python
Man, if this isn't a "you got your peanut butter in my chocolate thing" or what! As I wrote over on the NYPL Labs blog, we've been up to our necks in Drupal at MPOW, and I've found that one of the great advantages of using it is rapid prototyping without having to write a whole lot of code. Again, that's how I feel about Python, too, but you knew that already.
Once you've got a prototype built, how do you start piping stuff into it? In Drupal 6, a lot of the contrib modules to do this need work - most notably, I'm thinking about node_import, which as of yet still has no (official) CCK support for Drupal 6 and CCK 2. In addition, you could be stuck with having to write PHP code for the heavy lifting, but where's the joy in that?
Well, it so happens that the glue becomes the solvent in this slow, slow dance. Using Python becomes a breeze because of the batteries-included model it subscribes to. I've been playing around with the Services module and its XMLRPC server a bit, and given that xmlrpclib was added in Python 2.2, there's pretty much no excuse not to use it. Say what you will about RESTful interfaces, but out of the other options, Services' XMLRPC server is the most robust out of the others with the possible exception of AMFPHP.
Lately, I've been tinkering with it to figure out how to ingest metadata into Drupal that's stored either in other extremely complex databases or just as hunks of XML on a file system. I've been using lxml because of its XPath support, but given the fact that a lot of this XML data is remarkably dirty, I'll probably take some time to look at BeautifulSoup's BeautifulStoneSoup parser. However, this will take some work as some of this data will need explicit handling by that parser (nestable tags and the like).
It's also not completely unheard of to mix up Drupal and Python into a tasty delight. Migraine (unfortunately somewhat aptly named) is dev-to-production migration tool written in Python. DevelopmentSeed, the folks behind Managing News, wrote about using Python to lighten the load on Drupal when performing semantic analysis, and Jeff Miccolis (the post's author), threw together a BoF session at Drupalcon Boston 2008 on Python/Drupal integration. Then there's the Drupy project, which, unfortunately, still sounds completely nutty to me. Someone else still hacked a module together to work with a Python daemon handing some workflow related business. And last but not least there's PyBridge, a seemingly abandoned project to create page-generating modules for Drupal in Python.
It's a relief to know I'm not alone in my good taste, finally...
Nice article! Did you already have a look at ElementSoup (http://codespeak.net/lxml/e..., lxml's robust HTML parser?
Not yet, but thanks for the heads up. Does ElementSoup have the same sort of XML/SGML support as BeautifulStoneSoup?