Loading contents behind the scenes

MRAB google at mrabarnett.plus.com
Thu May 22 13:05:42 EDT 2008


On May 22, 3:20 pm, s0s... at gmail.com wrote:
> On May 22, 8:51 am, "A.T.Hofkamp" <h... at se-162.se.wtb.tue.nl> wrote:
>
>
>
> > On 2008-05-22, s0s... at gmail.com <s0s... at gmail.com> wrote:
>
> > > Hi, I wanted to know how cautious it is to do something like:
>
> > > f = file("filename", "rb")
> > > f.read()
>
> > > for a possibly huge file. When calling f.read(), and not doing
> > > anything with the return value, what is Python doing internally? Is it
> > > loading the content of the file into memory (regardless of whether it
> > > is discarding it immediately)?
>
> > I am not a Python interpreter developer, but as user, yes I'd expect that to
> > happen. The method doesn't know you are not doing anything with its return
> > value.
>
> > > In my case, what I'm doing is sending the return value through a
> > > socket:
>
> > > sock.send(f.read())
>
> > > Is that gonna make a difference (memory-wise)? I guess I'm just
> > > concerned with whether I can do a file.read() for any file in the
> > > system in an efficient and memory-kind way, and with low overhead in
> > > general. (For one thing, I'm not loading the contents into a
> > > variable.)
>
> > Doesn't matter. You allocate a string in which the contents is loaded (the
> > return value of 'f.read()', and you hand over (a reference to) that string to
> > the 'send()' method.
>
> > Note that memory is allocated by data *values*, not by *variables* in Python
> > (they are merely references to values).
>
> > > Not that I'm saying that loading a huge file into memory will horribly
> > > crash the system, but it's good to try to program in the safest way
> > > possibly. For example, if you try something like this in the
>
> > Depends on your system, and your biggest file.
>
> > At a 32 bit platform, anything bigger than about 4GB (usually already at around
> > 3GB) will crash the program for the simple reason that you are running out of
> > address space to store bytes in.
>
> > To fix, read and write blocks by specifying a block-size in the 'read()' call.
>
> I see... Thanks for the reply.
>
> So what would be a good approach to solve that problem? The best I can
> think of is something like:
>
> MAX_BUF_SIZE = 100000000  # about 100 MBs
>
> f = file("filename", "rb")
> f.seek(0, 2)  # relative to EOF
> length = f.tell()
> bPos = 0
>
> while bPos < length:
>     f.seek(bPos)
>     bPos += sock.send(f.read(MAX_BUF_SIZE))

I would go with:

f = file("filename", "rb")
while True:
    data = f.read(MAX_BUF_SIZE)
    if not data:
        break
    sock.sendall(data)



More information about the Python-list mailing list