number of different lines in a file

Ben Stroud bens_dev_lists at waypointdatasolutions.com
Fri May 19 11:22:01 EDT 2006


>
>It never occured to me to use the Python dict/set approach.  Now I
>wonder if it would've worked better somehow.  Of course my file was
>26,000 X larger than the one in this problem, and definitely would
>not fit in memory.  I suspect that there were as many as a million
>duplicates for some messages in that file.  Would the generator
>version above have helped me out, I wonder?
>
>
>  
>

You could use a dbm file approach which would provide a external 
dict/set interface through Python bindings.  This would use less memory.

1.  Add records to dbm as keys
2.  dbm (if configured correctly) will only keep unique keys
3.  Count keys

Cheers,
Ben




More information about the Python-list mailing list