removing duplication from a huge list.

odeits odeits at gmail.com
Fri Feb 27 07:49:39 EST 2009


On Feb 27, 1:18 am, Stefan Behnel <stefan... at behnel.de> wrote:
> bearophileH... at lycos.com wrote:
> > odeits:
> >> How big of a list are we talking about? If the list is so big that the
> >> entire list cannot fit in memory at the same time this approach wont
> >> work e.g. removing duplicate lines from a very large file.
>
> > If the data are lines of a file, and keeping the original order isn't
> > important, then the first to try may be to use the unix (or cygwin on
> > Windows) commands sort and uniq.
>
> or preferably "sort -u", in case that's supported.
>
> Stefan

Although this is true, that is more of an answer to the question "How
do i remove duplicates from a huge list in Unix?".



More information about the Python-list mailing list