[Numpy-discussion] Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

David Cournapeau david at ar.media.kyoto-u.ac.jp
Mon Jul 27 06:04:56 EDT 2009


Kim Hansen wrote:
> >From my (admittedly ignorant) point of view it seems like an
> implementation detail for me, that there is a problem with some
> intermediate memory address space.
>   

Yes, it is an implementation detail, but as is 32 vs 64 bits :)

> My typical use case would be to access and process the large
> filemapped, readonly recarray in chunks of up to 1,000,000 records 100
> bytes each, or for instance pick every 1000th element of a specific
> field. That is data structures, which I can easily have in RAM while
> working at it.
>
> I think it would be cool to have an alternative (possible readonly)
> memmap implementation (filearray?), which is not just a wrapper around
> mmap.mmap (with its 32 bit address space limitation), but which
> (simply?) operates directly on the files with seek and read. I think
> that could be very usefull (well for me at least, that is). In my
> specific case, I will probably now proceed and make some poor mans
> wrapping convenience methods implementing just the specific featuires
> I need as I do not have the insight to subclass an ndarray myself and
> override the needed methods. In that manner I can go to >2GB still
> with low memory usage, but it will not be pretty.
>   

I think it would be quite complicated. One fundamental "limitation" of
numpy is that it views a contiguous chunk of memory. You can't have one
numpy array which is the union of two memory blocks with a hole in
between, so if you slice every 1000 items, the underlying memory of the
array still needs to 'view' the whole thing. I think it is not possible
to support what you want with one numpy array.

I think the simple solution really is to go 64 bits, that's exactly the
kind of things it is used for. If your machine is relatively recent, it
supports 64 bits addressing.

cheers,

David



More information about the NumPy-Discussion mailing list