Python newbie asking for help
wes weston
wweston at att.net
Wed Mar 24 14:06:19 EST 2004
Bill Hamblen wrote:
> Hi,
>
> I'm teaching myself Python for fun. I've picked a little project that
> is interesting to me. I've made some progress over the last couple
> nights, mostly reading Guido's tutorial, but I think it would go a lot
> more quickly if I had some advice from experienced Python programmers.
> I think what I really need is a suggestion on the best data structure(s)
> to use. Here's what I am doing.
>
> I have three files with csv records (lines). We can have column headers
> at the top of each file if that helps. The first field in each record
> is a name; otherwise the files mostly contain different data. I'd like
> to create unique id numbers (in a new column) so that names which appear
> in more than one file have the same id number in each file. And of
> course every line in every file should end up with an id number.
>
> The catch is that while most names will appear in every file, some will
> not. Worse yet, there are names that will have different forms in
> different files (e.g. Smith_Bill and Smith_William). That last group I
> would like to "manually" process. Maybe by popping up a tkinter listbox
> with all records that match "Smith". Then I could select the ones which
> *really* match and give them a uniform id number.
>
> The good news is that I have already made sure that there are no
> duplicate name fields in any one file.
>
> Speed is a total non-issue here since it only has to run once for a
> given set of files. And no, this is not for merging spam lists! :-)
> I'm trying to make it easier to merge sets of projected statistics for
> fantasy baseball every spring. I do most of the processing in a
> spreadsheet but I thought indexing the sets of projections would be a
> fun way to learn Python.
>
> I read each file into a list of lists (using a csv module for Python
> 2.2). I've already added a new first column to hold the id number (so
> name is now in the second column). For example, lst1 looks like this:
>
> [['', 'name1_fname', 'team', 'pos, ...],['', 'name2_fname', ...], ...]
>
> I've got two search functions: one which looks for exact name matches
> and one which looks for last name matches (which I now notice needs to
> be made insensitive to capitalization).
>
> def search_exact(l, c, v):
> result=[]
> for r in l:
> if r[c] == v:
> result.append(r)
> return result
>
> def search_lastname(l, c, v):
> result=[]
> for r in l:
> if split(r[c],'_')[0] == v:
> result.append(r)
> return result
>
> The main loop
>
> id = 0
> for lst in [lst1, lst2, lst3]:
> for row in lst:
> if row[0] <> '': #if it already has an id, skip it
> continue
> else:
> match=[]
> for ls in [lst1, lst2, lst3]:
> tmp = search_exact(ls, 1, row[1])
> if tmp <> []:
> match.append(tmp)
> if len(match) == 3:
> #every file had one exact match
> #so assign an id number
> for i in match:
> i[0] = id
> id = id + 1
> else:
> match = []
> for ls in [lst1, lst2, lst3]:
> tmp = search_lst(ls, 1, split(row[1],'_')[0])
> if tmp <> []:
> match.append(tmp)
> #pop up a tkinter window and manually select matches
> #assign an id number to selected entries
> #write out modified files
>
> The problem I ran into is that the objects in "match" are not references
> to the actual data in the lstX structures but copies. So I cannot just
> change the id field the way I have things written. Based on my reading
> I thought all the operations I used worked with references but I clearly
> missed something.
>
> So, I'm getting the feeling that there is probably a *much* better way
> to do this. :-) Any suggestions?
>
> - Bill
#for i in match:
# i[0] = id
# id = id + 1
for i in range(len(match)):
match[i] = id
id = id + 1
More information about the Python-list
mailing list