List of Numbers

Jim Richardson warlock at eskimo.com
Sat Apr 12 22:44:37 EDT 2003


On Sat, 12 Apr 2003 21:11:39 GMT,
 Alex Martelli <aleax at aleax.it> wrote:
> Jim Richardson wrote:
> 
>> On Sat, 05 Apr 2003 20:13:45 +0100,
>>  Simon Faulkner <news at titanic.co.uk> wrote:
>>> I have a list of about 5000 numbers in a text file - up to 14 digits
>>> each.
>>> 
>>> I need to check for duplicates.
>>> 
>>> What would people suggest as a good method?
>>> 
>>> Simon
>> 
>> I'd use sort|uniq, but I don't know if that's available for MS type osen.
> 
> You can get implementations of sort and uniq for MS, but a short
> Python script is better IMHO.

yeah, but that's usually the case :) 

> 
> 
>> In python, just stuff them all in a dictionary, any repeats, will be
>> eliminated. But this is rather crude and probably slow. But it would
>> work.
> 
> Anything but slow!  Python dictionaries are quite fast.  But removing
> duplicates is not the same as 'checking for duplicates' -- Simon
> might rather want (e.g.) a list of all numbers that WERE in fact
> duplicate.  A script that plays with a Python dict is still no doubt
> the right solution, but it's hard to write one without more precise
> specifications regarding what is desired.
> 
> 

yeah, I didn't look at the check for part, I just parsed it as get rid
of... <sigh> must need a brain upgrade. 

I don't know how fast/slow the dict would be to tell the truth, it just
doesn't seem that "elegant" and elegance, is often (wrongly I know)
associated with speed.

Having said that, I have found that usually, the simpler the script, and
the closer to the pythonic "metal" it is, the faster it is. For some
reason, the folks who wrote python, are a lot better at programming that
I am :) 

-- 
Jim Richardson         http://www.eskimo.com/~warlock

Linux, because eventually, you grow up enough to be trusted with a fork()




More information about the Python-list mailing list