Python slow for filter scripts

Tue Oct 28 18:25:41 EST 2003

Peter Mutsaers wrote:

> Hello,
> 
> Up to now I mostly wrote simple filter scripts in Perl, e.g.
> 
> while(<>) {
>   # do something with $_, regexp matching, replacements etc.
>   print;
> }
> 
> Now I learned Python and like it much more as a language.
> 
> However, I tried the most simple while(<>) {print;} in Perl versus
> Python, just a copy from stdin to stdout, to see how fast the basic
> filter can be.
> 
> I found that on my (linux) PC, the Python version was 4 times slower.
> 
> Is that normal, does it disqualify Python for simple filter scripts?

It really depends on what you're doing.  I tried the following:

cio.pl:
while(<>) {
        print;
}

cio.py:
import sys
import fileinput
import shutil

emit = sys.stdout.write

def io_1(emit=emit):
    for line in sys.stdin: emit(line)

def io_2(emit=emit):
    for line in fileinput.input(): emit(line)

def io_3():
    shutil.copyfileobj(sys.stdin, sys.stdout)

if __name__=='__main__':
    import __main__

    def usage():
        sys.stdout = sys.stderr
        print "Usage: %s N" % sys.argv[0]
        print "N indicates what stdin->stdout copy function to run"
        ns = [x[3:] for x in dir(__main__) if x[:3]=='io_']
        ns.sort()
        print "valid values for N:", ns
        print "invalid args:", sys.argv[1:]
        sys.exit()
    if len(sys.argv) != 2: usage()
    func = getattr(__main__, 'io_'+sys.argv[1], None)
    if func is None: usage()
    sys.argv.pop()
    func()

and I'm specifically reading the King James' Bible (an easily
available text so you can reproduct my results!) and writing
either /dev/null or a tempfile on my own Linux box.  I see...:

[alex at lancelot bo]$ ls -l /x/kjv.txt
-rw-rw-r--    1 alex     alex      4404445 Mar 29  2003 /x/kjv.txt

[alex at lancelot bo]$ time perl cio.pl </x/kjv.txt >/dev/null
0.07user 0.01system 0:00.11elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

[alex at lancelot bo]$ time perl cio.pl </x/kjv.txt >/tmp/kjv
0.04user 0.06system 0:00.19elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (330major+61minor)pagefaults 0swaps

So, Perl is taking 80 to 100 milliseconds of CPU time (elapsed is
mostly dependent on what else is going on in the machine, and thus
by %CPU available, of course).  Let's see Python now:

[alex at lancelot bo]$ time python cio.py 2 </x/kjv.txt >/dev/null
0.27user 0.00system 0:00.30elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

[alex at lancelot bo]$ time python cio.py 2 </x/kjv.txt >/tmp/kjv
0.30user 0.01system 0:00.62elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (448major+278minor)pagefaults 0swaps

Python with fileinput IS slower -- 270 to 300 msecs CPU, about a
factor of 3.  However, that IS mostly fileinput's issue.  Videat:

[alex at lancelot bo]$ time python cio.py 1 </x/kjv.txt >/dev/null
0.07user 0.03system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

[alex at lancelot bo]$ time python cio.py 1 </x/kjv.txt >/tmp/kjv
0.06user 0.07system 0:00.29elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+276minor)pagefaults 0swaps

a plain line by line copy takes 100-130 msec -- a bit slower than Perl,
but nothing major.  Can we do better yet...?

[alex at lancelot bo]$ time python cio.py 3 </x/kjv.txt >/dev/null
0.03user 0.02system 0:00.10elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

[alex at lancelot bo]$ time python cio.py 3 </x/kjv.txt >/tmp/kjv
0.02user 0.06system 0:00.16elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (447major+275minor)pagefaults 0swaps

...sure!  Bulk copy, 50-80 msec, FASTER than Perl.  Of course, I'm sure
you can program it faster in Perl, too.  After all, cat takes 20-60
msec CPU, so thee's clearly space to do better.

What kind of files do your scripts most often process?  For me, a
textfile of 4.4 MB is larger than typical.  How much do those few
tens of milliseconds' difference matter?  You know your apps, I
don't, but I _would_ find it rather strange if they "disqualified"
either language.  Anything below about a second is typically fine
with me, so even the slowest of these programs could still handle
files of about 6 MB, assuming the 50% CPU it got is pretty typical,
while still taking no more than about 1 second's elapsed time.

Of course, you can easily edit my script and play with many other
I/O methods, until you find one that best suits you.  Personally,
I tend to use fileinput just because it's so handy (like perl's <>),
not caring all that much about those "wasted" milliseconds...:-)

Alex