Loading contents behind the scenes

Diez B. Roggisch deets at nospam.web.de
Thu May 22 10:59:00 EDT 2008


s0suk3 at gmail.com wrote:

> On May 22, 8:51 am, "A.T.Hofkamp" <h... at se-162.se.wtb.tue.nl> wrote:
>> On 2008-05-22, s0s... at gmail.com <s0s... at gmail.com> wrote:
>>
>> > Hi, I wanted to know how cautious it is to do something like:
>>
>> > f = file("filename", "rb")
>> > f.read()
>>
>> > for a possibly huge file. When calling f.read(), and not doing
>> > anything with the return value, what is Python doing internally? Is it
>> > loading the content of the file into memory (regardless of whether it
>> > is discarding it immediately)?
>>
>> I am not a Python interpreter developer, but as user, yes I'd expect that
>> to happen. The method doesn't know you are not doing anything with its
>> return value.
>>
>> > In my case, what I'm doing is sending the return value through a
>> > socket:
>>
>> > sock.send(f.read())
>>
>> > Is that gonna make a difference (memory-wise)? I guess I'm just
>> > concerned with whether I can do a file.read() for any file in the
>> > system in an efficient and memory-kind way, and with low overhead in
>> > general. (For one thing, I'm not loading the contents into a
>> > variable.)
>>
>> Doesn't matter. You allocate a string in which the contents is loaded
>> (the return value of 'f.read()', and you hand over (a reference to) that
>> string to the 'send()' method.
>>
>> Note that memory is allocated by data *values*, not by *variables* in
>> Python (they are merely references to values).
>>
>> > Not that I'm saying that loading a huge file into memory will horribly
>> > crash the system, but it's good to try to program in the safest way
>> > possibly. For example, if you try something like this in the
>>
>> Depends on your system, and your biggest file.
>>
>> At a 32 bit platform, anything bigger than about 4GB (usually already at
>> around 3GB) will crash the program for the simple reason that you are
>> running out of address space to store bytes in.
>>
>> To fix, read and write blocks by specifying a block-size in the 'read()'
>> call.
> 
> I see... Thanks for the reply.
> 
> So what would be a good approach to solve that problem? The best I can
> think of is something like:

You are aware that read() takes an int-argument to limit the number of bytes
returned, and of course advances the internal seek-pointer for you?

Diez



More information about the Python-list mailing list