Code Snippets

Shortname: 
codesnippets

Snippet is a programming term for a small region of re-usable source code or text. Ordinarily, these are formally-defined operative units to incorporate into larger programming modules. Snippets are often used to clarify the meaning of an otherwise "cluttered" function, or to minimize the use of repeated code that is common to other functions.

Drupal

Accessing Drupal's XMLRPC/Services API from Python

Probably ripped off from somewhere on drupal.org...

import xmlrpclib
 
s = xmlrpclib.ServerProxy('http://mmatienzo/services/xmlrpc')
 
class DrupalNode:
    def __init__(self, title, body, path, ntype='page', uid=1, username='mmatienzo'):
        self.title = title
        self.body = body
        self.path = path
        self.type = ntype
        self.uid = uid
        self.nid = 67
        self.name = username
        self.promote = True
        self.taxonomy = {'3': '3'} #how do i create new taxonomy terms???
try:
    sessid, user = s.system.connect()
    n = DrupalNode('ZA WARUDO!', 'toki wo tomare', 'saworhhhjjjdss')
    s.node.save('roadroallerdawryyy', n)
 
except xmlrpclib.Fault, err:
    print "A fault occurred"
    print "Fault code: %d" % err.faultCode
    print "Fault string: %s" % err.faultString

Autoassign default taxonomy terms based on content type and CCK fields

This works similarly to the Taxonomy Defaults module, but has no UI to set this up - this is for programmatic, backend stuff. It tests to see if somethings a particular content type and then checks for the value of a given CCK field to determine the appropriate term.

<?php
function custom_taxonomy_defaults_nodeapi(&$node, $op, $teaser = NULL, $page = NULL) {
  if ($op == 'presave') {
    $taxonomy = $node->taxonomy;
    switch ($node->type) {
      case 'online_project_old':
        $vid = '15';
        $projtype = $node->field_external_resource_type[0]['value'];
        switch ($projtype) {
          case '1': # online exhibition
          case '2': # special project
            $taxonomy[$vid] = '902'; # special projects
            break;
          case '3': # index
            $taxonomy[$vid] = '901'; # research tools
            break;
          case '4': # ebooks/digital collection
            $taxonomy[$vid] = '895'; # collections
            break;
        }
        break;
    }
    if (isset($taxonomy)) {
      $node->taxonomy = $taxonomy;
    }
  }
}
?>

Programmatic CCK node creation from CSV files using node_save()

The long and the short of it, for Drupal 6 - use content_insert(), which should be fired anyway but isn't. See
node_save() with CCK fields
for more details. drupal_execute() is too robust for me - I just needed to import a SQL Server table with 4500+ rows quickly. Sample code after the jump, using parsecsv for PHP.

<?php
include_once('./includes/bootstrap.inc');
include_once('./includes/form.inc');
include_once('./modules/node/content_types.inc');
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
bootstrap_invoke_all('init');
ini_set('memory_limit', '512M');
user_authenticate('user','password');
require_once('/home/mmatienzo/parsecsv.lib.php');
$csv = new parseCSV();
$csv->auto('/home/mmatienzo/manu_collections.uniq.csv');
module_load_include('inc', 'node', 'node.pages');
 
$i = 0;
//print_r($csv->data);
foreach ($csv->data as $row) {
    $node = new stdClass();
    #$node = array('type'=>'amatimport');
    $form_state = array();
    $node->type = 'amatimport';
    $node->title = $row['ManuscriptTitle'];
    //$node->name = 'mmatienzo';
    $node->language = 'en';
    //$node->uid = '1';
    $node->field_arms_amat_id[0]['value'] = $row['Id'];
    $node->field_arms_findingaids[0]['value'] = $row['FindingAids'];
    $node->field_arms_amat_printfindingaid[0]['value'] = $row['PrintFindingAid'];
    $node->field_arms_catnyp_old[0]['value'] = $row['CatnypLink'];
    $node->field_arms_amat_project_link[0]['value'] = $row['ProjectLink'];
    $node->field_arms_amat_seealso[0]['value'] = $row['SeeAlso'];
    $node->field_arms_closed['value'] = $row['Closed'];
    $node->field_arms_amat_location[0]['value'] = $row['Location'];
    $node->field_arms_sgmcatnyp[0]['value'] = $row['SgmCatnyp'];
    $node->field_arms_sgmead[0]['value'] = $row['SgmEad'];
    $node->field_arms_seeref[0]['value'] = $row['SeeRef'];
    $node->field_arms_creator[0]['value'] = $row['ManuscriptAuthor'];
    $node->field_arms_mssdbid[0]['value'] = $row['MSS_ID'];
    $node->op = t('Save');
    if ($row['Wilson'] = 'TRUE') {
        $wilson = 1;
    } else {
        $wilson = 0;
    }
    $node->field_arms_amat_wilson['value'] = $wilson;
    content_presave($node);
    node_save($node);
    content_insert($node);
    $i++;
    print $i . '<br/>';
    #break;
?>

Programmatic file migration for existing nodes over HTTP using CCK filefield and nodeapi

Similar to the CCK migration thing I posted before I found a peculiarity dealing with filefield and nodeapi - for some reason, it didn't actually update the filefield until I ran a separate node_save()...

<?php
function archivalcollection_nodeapi(&$node, $op, $teaser = NULL, $page = NULL) {
  if ($op == 'presave') {
    switch ($node->type) {
      case 'archivalcollection':
        if ((empty($node->field_arms_pdffile[0]['fid']) || $node->field_arms_pdffile[0]['fid'] == 0) 
          && !empty($node->field_arms_amat_printfindingaid[0]['value'])) {
          $pdfurl = $node->field_arms_amat_printfindingaid[0]['value'];
          $pdfpath = file_directory_path() .'/archivalcollections/pdf';
          $tmppath = file_directory_temp();
          $tmpfile = $tmppath.'/'.basename($pdfurl);
          if (!$pdfdata = file_get_contents($pdfurl)) {
            watchdog('AMAT Migration', "nid $node->nid - Could not read $pdfurl", NULL, WATCHDOG_ERROR);
          } elseif (!$tmppdf = file_save_data($pdfdata, $tmpfile)) {
            watchdog('AMAT Migration', "nid $node->nid - Could not write to $tmpfile", NULL, WATCHDOG_ERROR);
          } elseif (!$file = field_file_save_file($tmpfile, array(), $pdfpath)) {
            watchdog('AMAT Migration', "nid $node->nid - Could not create file object for file $tmpfile", NULL, WATCHDOG_ERROR);
          } else {
            $fc = $file;
            $fid = $fc['fid'];
            $file = field_file_save($node, $file);
            $node->field_arms_pdffile = array( 0 =>
              array(
                'fid' => $fc['fid'],
                'title' => basename($fc['filename']),
                'filename' => $fc['filename'],
                'filepath' => $fc['filepath'],
                'filesize' => $fc['filesize'],
                'mimetype' => $fc['filemime'],
                'data' => array('description' => ''),
                'list' => 1,
              ),
            );
            node_save($node);
          }
        }
        // }
        break;
    }
  }
}

Shrew CCK computed field

A sample chunk of CCK computed field code for use with my Shrew module for Drupal. Shrew is a PHP library to interact with Innovative Interface online library catalog systems. You can see how this code snippet evolved over time in the revisions.

<?php
if (!$node->nid) {
node_save($node);
}
$delta = 0;
$node_field = array();
if ($bnum = $node->field_iii_single[0]['value']) {
  $record = shrew_record_get($bnum);
  if ($fields = shrew_field_get($record[$bnum]->bibliographicRecord, '520')) {
    foreach ($fields as $field) {
      $node_field[$delta]['value'] = $field;
      $delta++;
    }
  }
  else $node_field[0]['value'] = NULL;
}

UUID Computed Field for CCK

This code snippet autopopulates a Computed Field CCK field for Drupal using the the functionality provided by the UUID module. Note that this is for the 6.x-1.x-dev version of the module and requires the patch attached to this comment. It assumes that you only want to generate UUIDs once. It probably should be abstracted into another CCK module, but I'm too lazy to do that at the moment.

<?php
if (empty($node_field[0]['value'])) {
  $node_field[0]['value'] = uuid_uuid();
}

Auto-Delete Posts patch for WordPress 2.3

Note: This patch is now deprecated as a new version of the plugin has been released.

Because of changes in the database schema for WordPress 2.3, the Auto Delete Posts plugin does not work. I've patched it to work, but note that this patch makes the plugin only work with WordPress 2.3.

AttachmentSize
autodeleteposts.patch4.01 KB

MARC21 to CSV in Python

A coworker at MFPOW wanted me to generate a list of records matching certain diverse criteria from our Horizon database. I wrote the following SQL query to get the data out.

SELECT DISTINCT item.bib#
FROM item, bib WHERE bib.bib# = item.item#
AND item.collection NOT IN ('oh', 'icos', 'mi')
AND item.location = 'icos'
AND (bib.text LIKE '%audio%' OR bib.text LIKE '%video%'
OR bib.text LIKE '%cassette%' OR bib.text LIKE '%tape%'
OR bib.text LIKE '%recording%' OR bib.text LIKE '%film%')

I then slapped together this nasty little bit of Python to get it into Excel for easy viewing and formatting.

from pymarc import MARCReader, marc8_to_unicode
import csv, sys
 
r = MARCReader(file(sys.argv[1]))
w = open(sys.argv[2], 'wt')
 
try:
        writer = csv.writer(w, lineterminator='\n')
        writer.writerow(('Bib #','Main Entry','Title'))
        for record in r:
                try: creator = marc8_to_unicode(record.author())
                except: creator = u' '
                creator = creator.encode('iso-8859-1', 'ignore')
                title = marc8_to_unicode(record['245'].formatField())
                title = title.encode('iso-8859-1', 'ignore')
                writer.writerow((record['998']['b'], creator, title))
 
finally: w.close()

mail2rss.pl

mail2rss.pl is a rewritten version of another script written by Nick Gerakines.

His script was a good starting block, but Feedvalidator.org noted that the RSS it produced was invalid. I believed that the areas in which it failed are rather important and needed to be fixed so we could adhere to standards.

Disclaimer

I am not responsible for any ill effects on the privacy or security of you or any system on which you run this script. Don't place the RSS file in a publicly accessible web directory without realizing the implications of doing so.

Modifications

Nick's version of mail2rss.pl created four major issues for the feed validator. These were:

  1. A missing version attribute in the <rss> tag. This attribute is required for a feed to be valid, and changing the code to reflect this was very basic.

  2. An invalid <guid> tag. The GUID must be a full URL unless the isPermalink is set to be false. This was also an easy fix. I also changed the GUID to be an MD5 hash of the sender and the time().

  3. An invalid URI in the <link> tag. The original specified the <link> to be "1", and that just didn't sit right with me. The link needed to begin with an IANA-registered URI scheme, so I sifted through them for a while until I decided upon the mid: URI scheme for message IDs. I had to add another variable for procmail to pull the mid and a corresponding flag for the script. The mid had to be scrubbed for spaces and other extraneous characters before mid: was prepended. I have to admit that this approach is a bit kludgy, given that it produces a link to the message referenced by the mid: URI. These links won't be handled properly by most RSS readers, and I'd dare say that they won't be handled properly by any. Nonetheless, it's a valid reference to the message in question.

  4. An invalid <pubDate>, i.e. not in RFC 822 format. The original script just made a call to time, which returns the current epoch time. After a quick Google search I came across this code snippet, which processes the call to time() to output an RFC822-compliant time.\

In addition, I cleaned up the code here and there, but nothing worth mentioning overall. If you're really curious, run diff on the two scripts.

Usage

Copy the Perl script and procmail script to the appropriate server. The example procmail script calls bmf (Bayesian Mail Filter) first, which helps to weed out the spam. After moving identified spam to the appropriate mailbox, it reads data from the message and calls the Perl script.

Files