How to remove subset from a file efficiently?

Raymond Hettinger python at rcn.com
Fri Jan 13 01:29:22 EST 2006


AJL wrote:
> How fast does this run?
>
> a = set(file('PSP0000320.dat'))
> b = set(file('CBR0000319.dat'))
> file('PSP-CBR.dat', 'w').writelines(a.difference(b))

Turning PSP into a set takes extra time, consumes unnecessary memory,
eliminates duplicates (possibly a bad thing), and loses the original
input ordering (probably a bad thing).

To jam the action into a couple lines, try this:

b = set(file('CBR0000319.dat'))
file('PSP-CBR.dat','w').writelines(itertools.ifilterfalse(b.__contains__,file('PSP0000320.dat')))

Raymond




More information about the Python-list mailing list