shuffle the lines of a large file

Simon Brunning simon.brunning at gmail.com
Tue Mar 8 09:55:17 EST 2005


On Tue, 8 Mar 2005 15:49:35 +0100, Heiko Wundram <modelnine at ceosg.de> wrote:
> Problem being: if the file the OP is talking about really is 80GB in size, and
> you consider a sentence to have 80 bytes on average (it's likely to have less
> than that), that makes 10^9 sentences in the file. Now, multiply that with
> the memory overhead of storing a list of 10^9 None(s), and reconsider,
> whether that algorithm really works for the posted conditions. I don't think
> that any machine I have access to even has near enough memory just to store
> this list... ;)

Ah, but that's the clever bit; it *doesn't* store the whole list -
only the selected lines.

-- 
Cheers,
Simon B,
simon at brunningonline.net,
http://www.brunningonline.net/simon/blog/



More information about the Python-list mailing list