Manipulate Large Binary Files

George Sakkis george.sakkis at gmail.com
Wed Apr 2 13:09:37 EDT 2008


On Apr 2, 11:50 am, Derek Martin <c... at pizzashack.org> wrote:
> On Wed, Apr 02, 2008 at 10:59:57AM -0400, Derek Tracy wrote:
> > I generated code that works wonderfully for files under 2Gb in size
> > but the majority of the files I am dealing with are over the 2Gb
> > limit
>
> > ary = array.array('H', INPUT.read())
>
> You're trying to read the file all at once.  You need to break your
> reads up into smaller chunks, in a loop.  You're essentially trying to
> store more data in memory than your OS can actually access in a single
> process...
>
> Something like this (off the top of my head, I may have overlooked
> some detail, but it should at least illustrate the idea):
>
> # read a meg at a time
> buffsize = 1048576
> while true:
>         buff = INPUT.read(buffsize)
>         OUTPUT.write(buff)
>         if len(buff) != buffsize:
>                 break

Or more idiomatically:

from functools import partial
for buff in iter(partial(INPUT.read, 10 * 1024**2), ''):
    # process each 10MB buffer

George



More information about the Python-list mailing list