I could use some help making this Python code run faster using only Python code.

George Sakkis george.sakkis at gmail.com
Fri Sep 21 02:46:05 EDT 2007


On Sep 20, 7:13 pm, "mensana... at aol.com" <mensana... at aol.com> wrote:
> On Sep 20, 5:46 pm, Paul Hankin <paul.han... at gmail.com> wrote:
>
>
>
> > On Sep 20, 10:59 pm, Python Maniac <raych... at hotmail.com> wrote:
>
> > > I am new to Python however I would like some feedback from those who
> > > know more about Python than I do at this time.
>
> > > def scrambleLine(line):
> > >     s = ''
> > >     for c in line:
> > >         s += chr(ord(c) | 0x80)
> > >     return s
>
> > > def descrambleLine(line):
> > >     s = ''
> > >     for c in line:
> > >         s += chr(ord(c) & 0x7f)
> > >     return s
> > > ...
>
> > Well, scrambleLine will remove line-endings, so when you're
> > descrambling
> > you'll be processing the entire file at once. This is particularly bad
> > because of the way your functions work, adding a character at a time
> > to
> > s.
>
> > Probably your easiest bet is to iterate over the file using read(N)
> > for some small N rather than doing a line at a time. Something like:
>
> > process_bytes = (descrambleLine, scrambleLine)[action]
> > while 1:
> >     r = f.read(16)
> >     if not r: break
> >     ff.write(process_bytes(r))
>
> > In general, rather than building strings by starting with an empty
> > string and repeatedly adding to it, you should use ''.join(...)
>
> > For instance...
> > def descrambleLine(line):
> >   return ''.join(chr(ord(c) & 0x7f) for c in line)
>
> > def scrambleLine(line):
> >   return ''.join(chr(ord(c) | 0x80) for c in line)
>
> > It's less code, more readable and faster!
>
> I would have thought that also from what I've heard here.
>
> def scrambleLine(line):
>     s = ''
>     for c in line:
>         s += chr(ord(c) | 0x80)
>     return s
>
> def scrambleLine1(line):
>     return ''.join([chr(ord(c) | 0x80) for c in line])
>
> if __name__=='__main__':
>     from timeit import Timer
>     t = Timer("scrambleLine('abcdefghijklmnopqrstuvwxyz')", "from
> __main__ import scrambleLine")
>     print t.timeit()
>
> ##  scrambleLine
> ##  13.0013366039
> ##  12.9461998318
> ##
> ##  scrambleLine1
> ##  14.4514098748
> ##  14.3594400695
>
> How come it's not? Then I noticed you don't have brackets in
> the join statement. So I tried without them and got
>
> ##  17.6010847978
> ##  17.6111472418
>
> Am I doing something wrong?


It has to do with the input string length; try multiplying it by 10 or
100. Below is a more complete benchmark; for largish strings, the imap
version is the fastest among those using the original algorithm. Of
course using a lookup table as Diez showed is even faster. FWIW, here
are some timings (Python 2.5, WinXP):

scramble:       1.818
scramble_listcomp:      1.492
scramble_gencomp:       1.535
scramble_map:   1.377
scramble_imap:  1.332
scramble_dict:  0.817
scramble_dict_map:      0.419
scramble_dict_imap:     0.410

And the benchmark script:

from itertools import imap

def scramble(line):
    s = ''
    for c in line:
        s += chr(ord(c) | 0x80)
    return s

def scramble_listcomp(line):
    return ''.join([chr(ord(c) | 0x80) for c in line])

def scramble_gencomp(line):
    return ''.join(chr(ord(c) | 0x80) for c in line)

def scramble_map(line):
    return ''.join(map(chr, map(0x80.__or__, map(ord,line))))

def scramble_imap(line):
    return ''.join(imap(chr, imap(0x80.__or__,imap(ord,line))))


scramble_table = dict((chr(i), chr(i | 0x80)) for i in xrange(255))

def scramble_dict(line):
     s = ''
     for c in line:
         s += scramble_table[c]
     return s

def scramble_dict_map(line):
     return ''.join(map(scramble_table.__getitem__, line))

def scramble_dict_imap(line):
     return ''.join(imap(scramble_table.__getitem__, line))


if __name__=='__main__':
    funcs = [scramble, scramble_listcomp, scramble_gencomp,
             scramble_map, scramble_imap,
             scramble_dict, scramble_dict_map, scramble_dict_imap]
    s = 'abcdefghijklmnopqrstuvwxyz' * 100
    assert len(set(f(s) for f in funcs)) == 1
    from timeit import Timer
    setup = "import __main__; line = %r" % s
    for name in (f.__name__ for f in funcs):
        timer = Timer("__main__.%s(line)" % name, setup)
        print '%s:\t%.3f' % (name, min(timer.repeat(3,1000)))


George




More information about the Python-list mailing list