shuffle the lines of a large file

Stefan Behnel stefan.behnel-n05pAM at web.de
Thu Mar 10 08:37:25 EST 2005


Simon Brunning wrote:
> On Tue, 8 Mar 2005 14:13:01 +0000, Simon Brunning wrote:
>     selected_lines = list(None for line_no in xrange(lines))

Just a short note on this line. If lines is really large, its much faster to use

from itertools import repeat
selected_lines = list(repeat(None, len(lines)))

which only repeats None without having to create huge numbers of integer 
objects as xrange does.

BTW, list comprehension is usually faster than list(iterator), so

[None for no in xrange(lines)]

ends up somwhere between the two.

Proof (in 2.4):

# python -m timeit 'from itertools import repeat
a = [ None for i in range(10000) ]'
100 loops, best of 3: 3.68 msec per loop

# python -m timeit 'from itertools import repeat
a = [ None for i in xrange(10000) ]'
100 loops, best of 3: 3.49 msec per loop

# python -m timeit 'from itertools import repeat
a = list(repeat(None, 10000))'
1000 loops, best of 3: 308 usec per loop

There. Factor 10. That's what I call optimization...

Stefan



More information about the Python-list mailing list