python vs. grep
Ricardo Aráoz
ricaraoz at gmail.com
Thu May 8 13:11:00 EDT 2008
Anton Slesarev wrote:
> I try to save my time not cpu cycles)
>
> I've got file which I really need to parse:
> -rw-rw-r-- 1 xxx xxx 3381564736 May 7 09:29 bigfile
>
> That's my results:
>
> $ time grep "python" bigfile | wc -l
> 2470
>
> real 0m4.744s
> user 0m2.441s
> sys 0m2.307s
>
> And python scripts:
>
> import sys
>
> if len(sys.argv) != 3:
> print 'grep.py <pattern> <file>'
> sys.exit(1)
>
> f = open(sys.argv[2],'r')
>
> print ''.join((line for line in f if sys.argv[1] in line)),
>
> $ time python grep.py "python" bigfile | wc -l
> 2470
>
> real 0m37.225s
> user 0m34.215s
> sys 0m3.009s
>
> Second script:
>
> import sys
>
> if len(sys.argv) != 3:
> print 'grepwc.py <pattern> <file>'
> sys.exit(1)
>
> f = open(sys.argv[2],'r',100000000)
>
> print sum((1 for line in f if sys.argv[1] in line)),
>
>
> time python grepwc.py "python" bigfile
> 2470
>
> real 0m39.357s
> user 0m34.410s
> sys 0m4.491s
>
> 40 sec and 5. This is really sad...
>
> That was on freeBSD.
>
>
>
> On windows cygwin.
>
> Size of bigfile is ~50 mb
>
> $ time grep "python" bigfile | wc -l
> 51
>
> real 0m0.196s
> user 0m0.169s
> sys 0m0.046s
>
> $ time python grepwc.py "python" bigfile
> 51
>
> real 0m25.485s
> user 0m2.733s
> sys 0m0.375s
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
All these examples assume your regular expression will not span multiple
lines, but this can easily be the case. How would you process the file
with regular expressions that span multiple lines?
More information about the Python-list
mailing list