How to remove subset from a file efficiently?

Raymond Hettinger python at rcn.com
Sat Jan 14 07:38:01 EST 2006


> >     b = set(file('/home/sajid/python/wip/stc/2/CBR0000333'))
> >
> > file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP0000333')))
> >
> >     --
> >     $ time ./cleanup_ray.py
> >
> >     real    0m5.451s
> >     user    0m4.496s
> >     sys     0m0.428s
> >
> > (-: Damn!  That saves a bit more time!  Bravo!
> >

[bonono at gmail.com]
> Have you tried the explicit loop variant with psyco ? My experience is
> that psyco is pretty good at optimizing for loop which usually results
> in faster code than even built-in map/filter variant.
>
> Though it would just be 1 or 2 sec difference(given what you already
> have) so may not be important but could be fun.

The code is pretty tight and is now most likely I/O bound.  If so,
further speed-ups will be hard to come by (even with psyco).  The four
principal steps of reading, membership testing, filtering, and writing
are all C coded methods which are directly linked together with no
interpreter loop overhead or method lookups.  Hard to beat.




More information about the Python-list mailing list