performance of tight loop

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Dec 13 22:29:38 EST 2010


On Mon, 13 Dec 2010 18:50:38 -0800, gry wrote:

> [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram] I
> have a little data generator that I'd like to go faster... any
> suggestions?
> maxint is usually 9223372036854775808(max 64bit int), but could
> occasionally be 99.
> width is usually 500 or 1600, rows ~ 5000.
> 
> from random import randint
> 
> def row(i, wd, mx):
>     first = ['%d' % i]
>     rest =  ['%d' % randint(1, mx) for i in range(wd - 1)] 
>     return first + rest
> ...
>     while True:
>         print "copy %s from stdin direct delimiter ',';" % table_name
>         for i in range(i,i+rows):
>             print ','.join(row(i, width, maxint))
>         print '\.'


This isn't entirely clear to me. Why is the while loop indented? I assume 
it's part of some other function that you haven't shown us, rather than 
part of the function row().

Assuming this, I would say that the overhead of I/O (the print commands) 
will likely be tens or hundreds of times greater than the overhead of the 
loop, so you're probably not likely to see much appreciable benefit. You 
might save off a few seconds from something that runs for many minutes. I 
don't see the point, really.

If the print statements are informative rather than necessary, I would 
print every tenth (say) line rather than every line. That should save 
*lots* of time.

Replacing "while True" with "while 1" may save a tiny bit of overhead. 
Whether it is significant or not is another thing.

Replacing range with xrange should also make a difference, especially if 
rows is a large number.

Moving the code from row() inline, replacing string interpolation with 
calls to str(), may also help. Making local variables of any globals may 
also help a tiny bit. But as I said, you're shaving microseconds of 
overhead and spending millseconds printing -- the difference will be tiny.

But for what it's worth, I'd try this:


    # Avoid globals in favour of locals.
    from random import randint
    _maxint = maxint
    loop = xrange(i, i+rows)  # Where does i come from?
    inner_loop = xrange(width)  # Note 1 more than before.
    while 1:
        print "copy %s from stdin direct delimiter ',';" % table_name
        for i in loop:
            row = [str(randint(1, _maxint)) for _ in inner_loop]
            row[0] = str(i)  # replace in place
            print ','.join(row)
        print '\.'



Hope it helps.



-- 
Steven



More information about the Python-list mailing list