attribute access and indirection

Ethan Furman ethan at stoneleaf.us
Thu Oct 22 09:21:13 EDT 2009


Greetings, List!

Say I have an old-fashioned dbf style table, with a single name field of
50 characters:

   names = dbf.Table(':memory:', 'name C(40)')

Then I add a bunch of names from who-knows-where:

   for name in some_iterable():
     names.append((name))

Now I want to know how many start with a 'J'...
I have two options, 1) brute force:

   matches = [rec for rec in names if rec.name[0] == 'J']

or, 2) binary search after ordering:

   names.order('name[:1]')
   matches = names.search('J', startswith=True)

So far so good.  Now it gets a little more complicated.  In my use case
I am trying to match records from one database to records from a another
database;  therefore, I do _not_ know what text I will be searching for,
only the fields I will be using.

If I only had one criteria, I'd still be okay:

   different_table.order('zipcode[:5], last_name')
   for record in original_table:
     matches = different_table.search([record.zipcode[:5], last_name])

However, I have three different sets of matches:
   'first_name[:1], last_name, city, dlvryaddrs[:4]'
   'first_name[:1], last_name[:5], dlvryaddrs[:8]'
   'first_name, last_name, city, state'

This is not a problem for the ordering, as I can just do
   for criteria in (choices):
     different_table.order(criteria)

The problem comes at the matching stage:  the .search method is
expecting a list of the pieces it is supposed to find, so what I need is
a way to apply, for example, 'first_name[:1], last_name[:5],
dlvryaddrs[:8]', to the current record to yield the text to search for.

Current code follows, more comments towards the end.

<code>
import dbf
import shutil
from collections import defaultdict
from cookbook.utils import index

source_tables = [ '/temp/kaz15514',
                   '/temp/kks15515',
                   '/temp/kmn15585',
                   '/temp/knv15516',
                   '/temp/ktx15722',
                   '/temp/kwa15584',
                   '/temp/mco15902',
                   '/temp/msq15994' ]

counts = defaultdict(int)

for i in index(source_tables):
     source_tables[i] = dbf.Table(source_tables[i])

shutil.copy('z:/orders/25105/mbk16508_02', '.')
match_back = dbf.Table('mbk16508_02')
match_back.add_fields('f1ltcta4 C(100), f1l5a8 C(100), ftltctst C(100)')

for field, criteria in \
        (('f1ltcta4', 'first_name[:1], last_name, city, dlvryaddrs[:4]'),
         ('f1l5a8', 'first_name[:1], last_name[:5], dlvryaddrs[:8]'),
         ('ftltctst', 'first_name, last_name, city, state'))
     match_back.order(criteria)
     for table in source_tables:
         counts = defaultdict(int)
         for record in match_back:
             matches = table.search(?????)
             .
             .
             .
</code>

The only idea I have at the moment is to parse the string (much like I
do in the order method), and after the string is parsed pluck out the
the needed pieces.  If that is the best and/or most practical way to do
it, I was thinking of adding __call__ to the record class.  Then,
besides being able to do:

   matches = table.search([record.zip4[:5], record.dlvryaddrs])

I could also do:

   matches = table.search(record('zip4[:5], dlvryaddrs'))

or, equivalently,
   criteria = 'this, that[:7], the_other'
   matches = table.search(record(criteria))

Any better ideas?  Am I missing anything already in the stdlib?

Any and all tips appreciated!

~Ethan~

P.S.
<shameless plug>  Python d-Base currently lives at
http://groups.google.com/group/python-dbase, and it's main purpose in
life is to ease the transition between old dbf files and newer sql
tables.  It can, however, be used for read/write access to dBase III and
VFP 6 tables, including memo fields.

Success stories and bug reports both equally welcome!  :D
</shameless plug>




More information about the Python-list mailing list