python vs. grep

Ricardo Aráoz ricaraoz at gmail.com
Thu May 8 13:11:00 EDT 2008


Anton Slesarev wrote:
> I try to save my time not cpu cycles)
> 
> I've got file which I really need to parse:
> -rw-rw-r--  1 xxx  xxx  3381564736 May  7 09:29 bigfile
> 
> That's my results:
> 
> $ time grep "python" bigfile | wc -l
>     2470
> 
> real    0m4.744s
> user    0m2.441s
> sys     0m2.307s
> 
> And python scripts:
> 
> import sys
> 
> if len(sys.argv) != 3:
>    print 'grep.py <pattern> <file>'
>    sys.exit(1)
> 
> f = open(sys.argv[2],'r')
> 
> print ''.join((line for line in f if sys.argv[1] in line)),
> 
> $ time python grep.py "python" bigfile | wc -l
>     2470
> 
> real    0m37.225s
> user    0m34.215s
> sys     0m3.009s
> 
> Second script:
> 
> import sys
> 
> if len(sys.argv) != 3:
>    print 'grepwc.py <pattern> <file>'
>    sys.exit(1)
> 
> f = open(sys.argv[2],'r',100000000)
> 
> print sum((1 for line in f if sys.argv[1] in line)),
> 
> 
> time python grepwc.py "python" bigfile
> 2470
> 
> real    0m39.357s
> user    0m34.410s
> sys     0m4.491s
> 
> 40 sec and 5. This is really sad...
> 
> That was on freeBSD.
> 
> 
> 
> On windows cygwin.
> 
> Size of bigfile is ~50 mb
> 
> $ time grep "python" bigfile | wc -l
> 51
> 
> real    0m0.196s
> user    0m0.169s
> sys     0m0.046s
> 
> $ time python grepwc.py "python" bigfile
> 51
> 
> real    0m25.485s
> user    0m2.733s
> sys     0m0.375s
> 
> --
> http://mail.python.org/mailman/listinfo/python-list
> 


All these examples assume your regular expression will not span multiple 
lines, but this can easily be the case. How would you process the file 
with regular expressions that span multiple lines?








More information about the Python-list mailing list