newbie:unique problem
Brian van den Broek
bvande at po-box.mcgill.ca
Thu Mar 17 15:47:48 EST 2005
Leeds, Mark said unto the world upon 2005-03-17 14:08:
> I have a function uniqueList that is below :
>
>
>
> Def uniqueList(origList):
>
>
>
> nodups= {}
>
> for temp in origList:
>
> nodups[temp] = None
>
> returns nodups.keys()
>
>
>
> When used in the following context :
>
>
>
> industryList = uniqueList(jpbarradata[group])
>
>
>
> where jpbarradata[group] might look like
>
>
>
> ["AAA BC",BBB KK","CCC TD","AAA KP","CCC TD"]
>
>
>
> ,the function works in the sense that it would return
>
>
>
> ["AAA BC","BBB KK","CCC TD",AAA KP"]
>
>
>
> because CCC TD is duplicated.
>
>
>
> But, I also want it to get rid of the AAA KP because
>
> there are two AAA's even though the last two letters
>
> are different. It doesn't matter to me which one
>
> is gotten rid of but I don't know how to change
>
> the function to handle this ? I have a feeling
>
> it's not that hard though ? Thanks.
Hi Mark,
please turn off the HTML formatting when posting. It makes your
message quite a lot bigger than need be, and, in this case anyway,
makes the plain text version doubled spaced (as above) and thus a bit
nasty to read. Thanks.
For the question:
Is order in your output important? If so, I wouldn't use a dictionary
to store the unique items. I see why you did it, but since
dictionaries don't have order, your output might get permuted.
How about this (don't take the naming as a model!):
def unique_up_to_n_char(orig_list, n):
'''-> list of elements where each is unique up to the first n chars.
'''
# Needs Python 2.4 for set type. You could use a list, too.
seen_leading_chars = set()
output_list = []
for member in orig_list:
if member[:n] not in seen_leading_chars:
seen_leading_chars.add(member[:n])
output_list.append(member)
return output_list
test_list = ["AAA BC", "BBB KK", "CCC TD", "AAA KP", "CCC TD", "AAB KP"]
print unique_up_to_n_char(test_list, 3)
print unique_up_to_n_char(test_list, 2)
which produces:
['AAA BC', 'BBB KK', 'CCC TD', 'AAB KP']
['AAA BC', 'BBB KK', 'CCC TD']
There may be still better ways. But, this is general and preserves order.
Best,
Brian vdB
More information about the Python-list
mailing list