dbf files and indexes

Ethan Furman ethan at stoneleaf.us
Thu May 27 15:45:58 EDT 2010


Let's say I have two tables:

CatLovers                DogLovers
-------------------      -------------------
| name      | age |      | name      | age |
|-----------------|      |-----------------|
| Allen     |  42 |      | Alexis    |   7 |
| Jerod     |  29 |      | Michael   |  21 |
| Samuel    |  17 |      | Samuel    |  17 |
| Nickalaus |  55 |      | Lawrence  |  63 |
| Frederick |  34 |      | Frederick |  34 |
-------------------      -------------------

NumberOfPets
---------------------------
| name      | cats | dogs |
---------------------------
| Allen     |   2  |   0  |
| Alexis    |   0  |   3  |
| Michael   |   0  |   1  |
| Samuel    |   1  |   2  |
| Jerod     |   3  |   0  |
| Nickalaus |   5  |   0  |
| Lawrence  |   0  |   1  |
| Frederick |   3  |   2  |
---------------------------

(I know, I know -- coming up with examples has never been my strong 
point. ;)

catlovers = dbf.Table('CatLovers')
doglovers = dbf.Table('DogLovers')
petcount  = dbf.Table('NumberOfPets')

For the sake of this highly contrived example, let's say I'm printing a 
report that I would like in alphabetical order of those who love both 
cats and dogs...

def names(record):
     return record.name

c_idx = catlovers.create_index(key=names)
d_idx = doglovers.create_index(key=names)
p_idx = petcount.create_index(key=names)

# method 1
for record in c_idx:
     if record in d_idx:
         print record.name, record.age, \
           p_idx[record].cats, p_idx[record].dogs

*or*

# method 2
for record in c_idx:
     if d_idx.key(record) in d_idx:  # or if names(record) in d_idx:
         print record.name, record.age \
           p_idx[record].cats, p_idx[record].dogs

Which is better (referring to the _in_ statement)?  Part of the issue 
revolves around the question of is _any_ record in the CatLovers table 
really in the DogLovers index?  Obviously no -- so if you are asking the 
question in code you are really asking if a record from CatLovers has a 
matching key value in DogLovers, which means either the __contains__ 
code can apply the key function to the record (implicit, as in method 1 
above) or the calling code can do it (explicit, as in method 2 above).

I'm leaning towards method 1, even though the key function is then 
called behind the scenes, because I think it makes the calling code cleaner.

Opinions?

~Ethan~



More information about the Python-list mailing list