[Baypiggies] my slides

Shannon -jj Behrens jjinux at gmail.com
Fri Mar 27 22:49:03 CET 2009


Here are my elaborate "slides" from yesterday:

This is a random collection of topics related to Python tools.

Talk about the UNIX philosophy:
    Small tools.
    My problems tend to be too large for RAM, but not too big for one
machine.
    UNIX and batch processing are a natural fit.
    Multiple processes = multiple CPUs.
    Multiple programming languages = more flexibility.
    Pipes = concurrency without the pain.
    Scales linearly and predictably, unlike databases.
    UNIX tools that already exist are helpful and fast.

Use the optparse module to provide consistent command line APIs:
    Here's an example of the setup from the docs:
        : from optparse import OptionParser
        : parser = OptionParser()
        : parser.add_option("-f", "--file", dest="filename",
        :                   help="write report to FILE", metavar="FILE")
        : parser.add_option("-q", "--quiet",
        :                   action="store_false", dest="verbose",
default=True,
        :                   help="don't print status messages to stdout")
        : (options, args) = parser.parse_args()
    Here's an example of my own help text
        : Usage: cleancuttsv.py [options]
        :
        : Options:
        :   -h, --help            show this help message and exit
        :   --assert-head=FIELD1\tFIELD2\t...
        :                         assert that the first line of the file
matches this
        :   --delete-head         delete the first line of input
        :   -n NUM, --num-fields=NUM
        :                         assert that there are this many fields per
line
        :   --drop-blank-lines    delete blank lines instead of raising an
error
        :

sort:

http://jjinux.blogspot.com/2008/08/python-sort-uniq-c-via-subprocess.html
    sort -S 20% -T /mnt/some_other_drive ...

http://jjinux.blogspot.com/2008/08/python-memory-conservation-tip-sort.html

tsv:
    You need a consistent format.
    Downsides:
        Most UNIX tools don't understand true TSV, but only an approximation
thereof:
            My own code raises an exception in cases where it would actually
matter.
        Many UNIX tools are ignorant of encoding issues:
            Sometimes playing dumb works and sometimes it hurts.
    Using the csv module:
        : import csv
        :
        : DEFAULT_KARGS = dict(dialect='excel-tab', lineterminator='\n')
        : MYSQL_LOAD_DATA_INFILE_DESC = """\
        :     FIELDS TERMINATED BY '\t'
        :            OPTIONALLY ENCLOSED BY '"'
        :            ESCAPED BY ''
        :     LINES TERMINATED BY '\n'"""
        :
        : def create_default_reader(iterable):
        :     """Return a csv.reader with our default options."""
        :     return csv.reader(iterable, **DEFAULT_KARGS)
        : ...
    Using mysqlimport.
        : mysqlimport \
        :     --user=$MYSQL_USERNAME \
        :     --password=$MYSQL_PASSWORD \
        :     --columns=id,name \
        :     --fields-optionally-enclosed-by='"' \
        :     --fields-terminated-by='\t' \
        :     --fields-escaped-by='' \
        :     --lines-terminated-by='\n' \
        :     --local \
        :     --lock-tables \
        :     --replace \
        :     --verbose \
        :     $DATABASE ${BUILD}/sometable.tsv
        To see warnings:
            http://jjinux.blogspot.com/2009/03/mysql-encoding-hell.html

Show pdb in the context of a web app:
    : import pdb
    : from pprint import pprint
    : pdb.set_trace()
    : pprint(request.environ)
    http://localhost:5000/api/ratio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20090327/1cac708a/attachment.htm>


More information about the Baypiggies mailing list