shuffle the lines of a large file
Stefan Behnel
stefan.behnel-n05pAM at web.de
Thu Mar 10 08:37:25 EST 2005
Simon Brunning wrote:
> On Tue, 8 Mar 2005 14:13:01 +0000, Simon Brunning wrote:
> selected_lines = list(None for line_no in xrange(lines))
Just a short note on this line. If lines is really large, its much faster to use
from itertools import repeat
selected_lines = list(repeat(None, len(lines)))
which only repeats None without having to create huge numbers of integer
objects as xrange does.
BTW, list comprehension is usually faster than list(iterator), so
[None for no in xrange(lines)]
ends up somwhere between the two.
Proof (in 2.4):
# python -m timeit 'from itertools import repeat
a = [ None for i in range(10000) ]'
100 loops, best of 3: 3.68 msec per loop
# python -m timeit 'from itertools import repeat
a = [ None for i in xrange(10000) ]'
100 loops, best of 3: 3.49 msec per loop
# python -m timeit 'from itertools import repeat
a = list(repeat(None, 10000))'
1000 loops, best of 3: 308 usec per loop
There. Factor 10. That's what I call optimization...
Stefan
More information about the Python-list
mailing list