Loading contents behind the scenes

s0suk3 at gmail.com s0suk3 at gmail.com
Thu May 22 10:20:11 EDT 2008


On May 22, 8:51 am, "A.T.Hofkamp" <h... at se-162.se.wtb.tue.nl> wrote:
> On 2008-05-22, s0s... at gmail.com <s0s... at gmail.com> wrote:
>
> > Hi, I wanted to know how cautious it is to do something like:
>
> > f = file("filename", "rb")
> > f.read()
>
> > for a possibly huge file. When calling f.read(), and not doing
> > anything with the return value, what is Python doing internally? Is it
> > loading the content of the file into memory (regardless of whether it
> > is discarding it immediately)?
>
> I am not a Python interpreter developer, but as user, yes I'd expect that to
> happen. The method doesn't know you are not doing anything with its return
> value.
>
> > In my case, what I'm doing is sending the return value through a
> > socket:
>
> > sock.send(f.read())
>
> > Is that gonna make a difference (memory-wise)? I guess I'm just
> > concerned with whether I can do a file.read() for any file in the
> > system in an efficient and memory-kind way, and with low overhead in
> > general. (For one thing, I'm not loading the contents into a
> > variable.)
>
> Doesn't matter. You allocate a string in which the contents is loaded (the
> return value of 'f.read()', and you hand over (a reference to) that string to
> the 'send()' method.
>
> Note that memory is allocated by data *values*, not by *variables* in Python
> (they are merely references to values).
>
> > Not that I'm saying that loading a huge file into memory will horribly
> > crash the system, but it's good to try to program in the safest way
> > possibly. For example, if you try something like this in the
>
> Depends on your system, and your biggest file.
>
> At a 32 bit platform, anything bigger than about 4GB (usually already at around
> 3GB) will crash the program for the simple reason that you are running out of
> address space to store bytes in.
>
> To fix, read and write blocks by specifying a block-size in the 'read()' call.

I see... Thanks for the reply.

So what would be a good approach to solve that problem? The best I can
think of is something like:

MAX_BUF_SIZE = 100000000  # about 100 MBs

f = file("filename", "rb")
f.seek(0, 2)  # relative to EOF
length = f.tell()
bPos = 0

while bPos < length:
    f.seek(bPos)
    bPos += sock.send(f.read(MAX_BUF_SIZE))



More information about the Python-list mailing list