"Robert Brewer" <fumanchu at amor.org> writes: > I'm dedup'ing a 10-million-record dataset, trying different approaches > for building indexes. The in-memory dicts are clearly faster, but I get > Memory Errors (Win2k, 512 MB RAM, 4 G virtual). Any recommendations on > other ways to build a large index without slowing down by a factor of > 25? Sort, then remove dups.