{Spam?} Re: sets and subsets

François Pinard pinard at iro.umontreal.ca
Tue Feb 17 16:30:39 EST 2004


> >from sets import Set
> >file('pruned_ips.txt', 'w').writelines(
> >        Set(file('ips.txt')) - Set(file('excluded_ips.txt')))

[Dave K]
> file('pruned_ips.txt', 'w').writelines([ip for ip in file('ips.txt')
>                               if ip not in file('excluded_ips.txt')])

The Set solution above swallows both files in memory, but executes
rather quickly.  The list comprehension solution uses much less memory,
but as the second file is wholly read for each line of the first file,
it may get prohibitive when files are not small.  For very big files,
both solutions are wrong anyway: one should likely disk-sort both files
and do a simultaneous read of the sorted results.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard




More information about the Python-list mailing list