Orders of magnitude

PF peufeu at free.fr
Mon Mar 29 04:02:38 EST 2004



	It all boils down to how much space your keys take.
	When you look for dupes, you must hold only the keys in memory, not the 
data (it'll be a lot faster this way).

	I'd say create a bsddb with btree sort to hold all your keys. Should take 
about 20 minutues to fill it. Then scan it in sorted key order, and 
duplciates will appear next to each other.



More information about the Python-list mailing list