shuffle the lines of a large file
Heiko Wundram
modelnine at ceosg.de
Tue Mar 8 09:49:35 EST 2005
On Tuesday 08 March 2005 15:28, Simon Brunning wrote:
> This has the advantage that every line had the same chance of being
> picked regardless of its length. There is the chance that it'll pick
> the same line more than once, though.
Problem being: if the file the OP is talking about really is 80GB in size, and
you consider a sentence to have 80 bytes on average (it's likely to have less
than that), that makes 10^9 sentences in the file. Now, multiply that with
the memory overhead of storing a list of 10^9 None(s), and reconsider,
whether that algorithm really works for the posted conditions. I don't think
that any machine I have access to even has near enough memory just to store
this list... ;)
--
--- Heiko.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050308/6e8a2a8d/attachment.sig>
More information about the Python-list
mailing list