Orders of magnitude
Buck Nuggets
bucknuggets at yahoo.com
Mon Mar 29 18:39:50 EST 2004
"Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.38.1080542935.20120.python-list at python.org>...
> I'm dedup'ing a 10-million-record dataset, trying different approaches
> for building indexes. The in-memory dicts are clearly faster, but I get
> Memory Errors (Win2k, 512 MB RAM, 4 G virtual). Any recommendations on
> other ways to build a large index without slowing down by a factor of
> 25?
In case you are interested in alternatives approaches...here's how I
typically do this:
step 1: sort the file using a separate sort utility (unix sort, cygwin
sort, etc)
step 2: have a python program read in rows,
compare each row to the prior,
write out only one row for each set
ks
More information about the Python-list
mailing list