Python slow for filter scripts
Alex Martelli
aleax at aleax.it
Tue Oct 28 18:25:41 EST 2003
Peter Mutsaers wrote:
> Hello,
>
> Up to now I mostly wrote simple filter scripts in Perl, e.g.
>
> while(<>) {
> # do something with $_, regexp matching, replacements etc.
> print;
> }
>
> Now I learned Python and like it much more as a language.
>
> However, I tried the most simple while(<>) {print;} in Perl versus
> Python, just a copy from stdin to stdout, to see how fast the basic
> filter can be.
>
> I found that on my (linux) PC, the Python version was 4 times slower.
>
> Is that normal, does it disqualify Python for simple filter scripts?
It really depends on what you're doing. I tried the following:
cio.pl:
while(<>) {
print;
}
cio.py:
import sys
import fileinput
import shutil
emit = sys.stdout.write
def io_1(emit=emit):
for line in sys.stdin: emit(line)
def io_2(emit=emit):
for line in fileinput.input(): emit(line)
def io_3():
shutil.copyfileobj(sys.stdin, sys.stdout)
if __name__=='__main__':
import __main__
def usage():
sys.stdout = sys.stderr
print "Usage: %s N" % sys.argv[0]
print "N indicates what stdin->stdout copy function to run"
ns = [x[3:] for x in dir(__main__) if x[:3]=='io_']
ns.sort()
print "valid values for N:", ns
print "invalid args:", sys.argv[1:]
sys.exit()
if len(sys.argv) != 2: usage()
func = getattr(__main__, 'io_'+sys.argv[1], None)
if func is None: usage()
sys.argv.pop()
func()
and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing
either /dev/null or a tempfile on my own Linux box. I see...:
[alex at lancelot bo]$ ls -l /x/kjv.txt
-rw-rw-r-- 1 alex alex 4404445 Mar 29 2003 /x/kjv.txt
[alex at lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps
[alex at lancelot bo]$ time perl cio.pl </x/kjv.txt >/tmp/kjv
0.04user 0.06system 0:00.19elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps
So, Perl is taking 80 to 100 milliseconds of CPU time (elapsed is
mostly dependent on what else is going on in the machine, and thus
by %CPU available, of course). Let's see Python now:
[alex at lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps
[alex at lancelot bo]$ time python cio.py 2 </x/kjv.txt >/tmp/kjv
0.30user 0.01system 0:00.62elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps
Python with fileinput IS slower -- 270 to 300 msecs CPU, about a
factor of 3. However, that IS mostly fileinput's issue. Videat:
[alex at lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps
[alex at lancelot bo]$ time python cio.py 1 </x/kjv.txt >/tmp/kjv
0.06user 0.07system 0:00.29elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps
a plain line by line copy takes 100-130 msec -- a bit slower than Perl,
but nothing major. Can we do better yet...?
[alex at lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps
[alex at lancelot bo]$ time python cio.py 3 </x/kjv.txt >/tmp/kjv
0.02user 0.06system 0:00.16elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps
...sure! Bulk copy, 50-80 msec, FASTER than Perl. Of course, I'm sure
you can program it faster in Perl, too. After all, cat takes 20-60
msec CPU, so thee's clearly space to do better.
What kind of files do your scripts most often process? For me, a
textfile of 4.4 MB is larger than typical. How much do those few
tens of milliseconds' difference matter? You know your apps, I
don't, but I _would_ find it rather strange if they "disqualified"
either language. Anything below about a second is typically fine
with me, so even the slowest of these programs could still handle
files of about 6 MB, assuming the 50% CPU it got is pretty typical,
while still taking no more than about 1 second's elapsed time.
Of course, you can easily edit my script and play with many other
I/O methods, until you find one that best suits you. Personally,
I tend to use fileinput just because it's so handy (like perl's <>),
not caring all that much about those "wasted" milliseconds...:-)
Alex
More information about the Python-list
mailing list