A little advice please? (Convert my boss to Python)

Alex Martelli aleax at aleax.it
Tue Apr 16 06:30:48 EDT 2002


Paul Rubin wrote:

> "Duncan Smith" <buzzard at urubu.freeserve.co.uk> writes:
>> So what I'm looking for is speed, and some advice so that I don't end up
>> trying too many alternatives.
> 
> If you have to do something like that over and over for zillions of
> huge files, you're best off writing in C and tuning carefully.

Not necessarily.  Python dictionaries are pretty amazing.  Duplicating
their functionality and speed is not just a question of "tuning
carefully".


> Regarding duplicates, maybe you can just sort the file with an
> external sort utility, so the duplicates will all be next to each
> other.  Then you don't have to mess with dicts.  I didn't examine your
> code closely enough to figure out if that makes sense, so maybe it
> doesn't.

Sorting is O(N log N).  Inserting N entries in a dictionary can be
pretty close to O(N), since entry insertion is darn close to an
amortized O(1).  Therefore, it's anything but obvious that sorting
should be a performance win.


Alex




More information about the Python-list mailing list