How to remove subset from a file efficiently?

fynali iladijas at gmail.com
Sat Jan 14 01:16:11 EST 2006


    $ time fgrep -x -v -f CBR0000333 PSP0000333 > PSP-CBR.dat.fgrep

    real    0m31.551s
    user    0m16.841s
    sys     0m0.912s

    --
    $ time ./cleanup.py

    real    0m6.080s
    user    0m4.836s
    sys     0m0.408s

    --
    $ wc -l PSP-CBR.dat.fgrep PSP-CBR.dat.python
      3872421 PSP-CBR.dat.fgrep
      3872421 PSP-CBR.dat.python

Fantastic, at any rate the time is down from my initial ~4 min.!

Thank you Chris.  The fgrep approach is clean and to the point; and one
more reason to love the *nix approach to handling everyday problems.

Fredrik's set|dict approach in Python above gives me one more reason to
love Python.  And it is indeed fast, 5x!

Thank you all for all your help.

-- 
fynali




More information about the Python-list mailing list