Orders of magnitude

Paul Rubin http
Mon Mar 29 02:53:08 EST 2004


"Robert Brewer" <fumanchu at amor.org> writes:
> I'm dedup'ing a 10-million-record dataset, trying different approaches
> for building indexes. The in-memory dicts are clearly faster, but I get
> Memory Errors (Win2k, 512 MB RAM, 4 G virtual). Any recommendations on
> other ways to build a large index without slowing down by a factor of
> 25?

Sort, then remove dups.



More information about the Python-list mailing list