Python newbie asking for help

wes weston wweston at att.net
Wed Mar 24 14:06:19 EST 2004


Bill Hamblen wrote:
> Hi,
>                                                                                 
> I'm teaching myself Python for fun.  I've picked a little project that
> is interesting to me.  I've made some progress over the last couple
> nights, mostly reading Guido's tutorial, but I think it would go a lot
> more quickly if I had some advice from experienced Python programmers.
> I think what I really need is a suggestion on the best data structure(s)
> to use.  Here's what I am doing.
>                                                                                 
> I have three files with csv records (lines).  We can have column headers
> at the top of each file if that helps.  The first field in each record
> is a name; otherwise the files mostly contain different data.  I'd like
> to create unique id numbers (in a new column) so that names which appear
> in more than one file have the same id number in each file.  And of
> course every line in every file should end up with an id number.
>                                                                                 
> The catch is that while most names will appear in every file, some will
> not.  Worse yet, there are names that will have different forms in
> different files (e.g.  Smith_Bill and Smith_William).  That last group I
> would like to "manually" process.  Maybe by popping up a tkinter listbox
> with all records that match "Smith".  Then I could select the ones which
> *really* match and give them a uniform id number.
>                                                                                 
> The good news is that I have already made sure that there are no
> duplicate name fields in any one file.
>                                                                                 
> Speed is a total non-issue here since it only has to run once for a
> given set of files.  And no, this is not for merging spam lists!  :-)
> I'm trying to make it easier to merge sets of projected statistics for
> fantasy baseball every spring.  I do most of the processing in a
> spreadsheet but I thought indexing the sets of projections would be a
> fun way to learn Python.
>                                                                                 
> I read each file into a list of lists (using a csv module for Python
> 2.2).  I've already added a new first column to hold the id number (so
> name is now in the second column).  For example, lst1 looks like this:
>                                                                                 
> [['', 'name1_fname', 'team', 'pos, ...],['', 'name2_fname', ...], ...]
>                                                                                 
> I've got two search functions: one which looks for exact name matches
> and one which looks for last name matches (which I now notice needs to
> be made insensitive to capitalization).
> 
> def search_exact(l, c, v):
>    result=[]
>    for r in l:
>       if r[c] == v:
>          result.append(r)
>    return result
>                                                                                 
> def search_lastname(l, c, v):
>    result=[]
>    for r in l:
>       if split(r[c],'_')[0] == v:
>          result.append(r)
>    return result
>                                                                                 
> The main loop
>                                                                                 
> id = 0
> for lst in [lst1, lst2, lst3]:
>    for row in lst:
>       if row[0] <> '':  #if it already has an id, skip it
>          continue
>       else:
>          match=[]
>          for ls in [lst1, lst2, lst3]:
>             tmp = search_exact(ls, 1, row[1])
>             if tmp <> []:
>                match.append(tmp)
>          if len(match) == 3:
>             #every file had one exact match
>             #so assign an id number
>             for i in match:
>                i[0] = id
>                id = id + 1
>          else:
>             match = []
>             for ls in [lst1, lst2, lst3]:
>                tmp = search_lst(ls, 1, split(row[1],'_')[0])
>                if tmp <> []:
>                   match.append(tmp)
>             #pop up a tkinter window and manually select matches
>             #assign an id number to selected entries
> #write out modified files
>                                                                                 
> The problem I ran into is that the objects in "match" are not references
> to the actual data in the lstX structures but copies.  So I cannot just
> change the id field the way I have things written.  Based on my reading
> I thought all the operations I used worked with references but I clearly
> missed something.
>                                                                                 
> So, I'm getting the feeling that there is probably a *much* better way
> to do this.  :-) Any suggestions?
>                                                                                 
>  - Bill

#for i in match:
#	i[0] = id
#        id = id + 1

for i in range(len(match)):
	match[i] = id
         id = id + 1




More information about the Python-list mailing list