Friday, August 6, 2010

Python titbits

I was generating sql data load scripts from excel documents using python. Here’s the code snippet
import csv

with open('cases.csv', 'rU') as f:
# r = csv.DictReader(f, dialect=csv.excel)
r = csv.reader(f, dialect=csv.excel)
cases = [row[:-3] for row in r]

Few interesting aspects in the code above

  1. I am using python’s with statement to automatically close the csv file after use.

  2. I am opening the file in ‘rU’ mode - i.e. readonly and Universal mode. ‘U’ is critical for csv module to parse excel generated csv’s. Here’s the gory details

  3. Use csv.DictReader if you want data in a dictionary keyed by the column names. DictReader exposes a
    convenient property ‘fieldnames’ that has the column names if your csv has a header row.


  4. Notice the use of python’s list comprehension to populate the cases array. Also, I am chopping
    off the last 3 columns (which was empty in the data file) using python’s slice operator.



For better clarity, I changed the code to
import csv
import collections

Case = collections.namedtuple('Case',"program, title, meeting_time, meeting_date")
with open('foo.csv', 'rU') as f:
# r = csv.DictReader(f, dialect=csv.excel)
r = csv.reader(f, dialect=csv.excel)
cases = [Case(*row[:-3]) for row in r]

namedtuple is similar to ruby’s Struct class.
This is nicely encapsulated now. The rest of the code need not worry about which column
the data was in the original data file. Also note the use * in Case(*row[:-3]). I am using * to pass the parsed array as arguments to Case. In clojure, you would use apply

Now consumers of cases can use it like so
qry = '''INSERT INTO cases (program, meeting_time, meeting_date) VALUES ('%s','%s','%s');'''
for c in cases:
print qry%(c.program, c.meeting_time, c.meeting_date)

Before I wrap up, one last thing - if you want to output this to a file instead of stdout
with open('cases.sql', 'w') as f:
qry = '''INSERT INTO cases (program, meeting_time, meeting_date) VALUES ('%s','%s','%s');'''
for c in cases:
print >>f, qry%(c.program, c.meeting_time, c.meeting_date)

Notice how I am redirecting print output to the file handle ‘f’. Here’s a blog article about it.

No comments: