Python newbie asking for help

Wed Mar 24 13:44:53 EST 2004

Hi,

I'm teaching myself Python for fun.  I've picked a little project that
is interesting to me.  I've made some progress over the last couple
nights, mostly reading Guido's tutorial, but I think it would go a lot
more quickly if I had some advice from experienced Python programmers.
I think what I really need is a suggestion on the best data structure(s)
to use.  Here's what I am doing.

I have three files with csv records (lines).  We can have column headers
at the top of each file if that helps.  The first field in each record
is a name; otherwise the files mostly contain different data.  I'd like
to create unique id numbers (in a new column) so that names which appear
in more than one file have the same id number in each file.  And of
course every line in every file should end up with an id number.

The catch is that while most names will appear in every file, some will
not.  Worse yet, there are names that will have different forms in
different files (e.g.  Smith_Bill and Smith_William).  That last group I
would like to "manually" process.  Maybe by popping up a tkinter listbox
with all records that match "Smith".  Then I could select the ones which
*really* match and give them a uniform id number.

The good news is that I have already made sure that there are no
duplicate name fields in any one file.

Speed is a total non-issue here since it only has to run once for a
given set of files.  And no, this is not for merging spam lists!  :-)
I'm trying to make it easier to merge sets of projected statistics for
fantasy baseball every spring.  I do most of the processing in a
spreadsheet but I thought indexing the sets of projections would be a
fun way to learn Python.

I read each file into a list of lists (using a csv module for Python
2.2).  I've already added a new first column to hold the id number (so
name is now in the second column).  For example, lst1 looks like this:

[['', 'name1_fname', 'team', 'pos, ...],['', 'name2_fname', ...], ...]

I've got two search functions: one which looks for exact name matches
and one which looks for last name matches (which I now notice needs to
be made insensitive to capitalization).

def search_exact(l, c, v):
   result=[]
   for r in l:
      if r[c] == v:
         result.append(r)
   return result

def search_lastname(l, c, v):
   result=[]
   for r in l:
      if split(r[c],'_')[0] == v:
         result.append(r)
   return result

The main loop

id = 0
for lst in [lst1, lst2, lst3]:
   for row in lst:
      if row[0] <> '':  #if it already has an id, skip it
         continue
      else:
         match=[]
         for ls in [lst1, lst2, lst3]:
            tmp = search_exact(ls, 1, row[1])
            if tmp <> []:
               match.append(tmp)
         if len(match) == 3:
            #every file had one exact match
            #so assign an id number
            for i in match:
               i[0] = id
               id = id + 1
         else:
            match = []
            for ls in [lst1, lst2, lst3]:
               tmp = search_lst(ls, 1, split(row[1],'_')[0])
               if tmp <> []:
                  match.append(tmp)
            #pop up a tkinter window and manually select matches
            #assign an id number to selected entries
#write out modified files

The problem I ran into is that the objects in "match" are not references
to the actual data in the lstX structures but copies.  So I cannot just
change the id field the way I have things written.  Based on my reading
I thought all the operations I used worked with references but I clearly
missed something.

So, I'm getting the feeling that there is probably a *much* better way
to do this.  :-) Any suggestions?

 - Bill