mmap 2GB allocation limit on Win XP, 32-bits, Python 2.5.4

Slaunger Slaunger at gmail.com
Mon Jul 27 07:50:52 EDT 2009


On 27 Jul., 13:21, Dave Angel <da... at ieee.org> wrote:
> (forwarding this message, as the reply was off-list)
>
>
>
> Kim Hansen wrote:
> > 2009/7/24 Dave Angel <da... at ieee.org>:
>
> >> It's not a question of how much disk space there is, but how much virtual
> >> space 32 bits can address.  2**32 is about 4 gig, and Windows XP reserves
> >> about half of that for system use.  Presumably a 64 bit OS would have a much
> >> larger limit.
>
> >> Years ago I worked on Sun Sparc system which had much more limited shared
> >> memory access, due to hardware limitations.  So 2gig seems pretty good to
> >> me.
>
> >> There is supposed to be a way to tell the Windows OS to only use 1 gb of
> >> virtual space, leaving 3gb for application use.  But there are some
> >> limitations, and I don't recall what they are.  I believe it has to be done
> >> globally (probably in Boot.ini), rather than per process.  And some things
> >> didn't work in that configuration.
>
> >> DaveA
>
> > Hi Dave,
>
> > In the related post I did on the numpy discussions:
>
> >http://article.gmane.org/gmane.comp.python.numeric.general/31748
>
> > another user was kind enough to run my test program on both 32 bit and
> > 64 bit machines. On the 64 bit machine, there was no such limit, very
> > much in line with what you wrote. Adding the /3GB option in boot.ini
> > did not increase the available memory as well. Apparently, Python
> > needs to have been compiled in a way, which makes it possible to take
> > advantage of that switch and that is either not the case or I did
> > something else wrong as well.
>
> > I acknowledge the explanation concerning the address space available.
> > Being an ignorant of the inner details of the implementation of mmap,
> > it seems like somewhat an "implementation detail" to me that such an
> > address wall is hit. There may be some good arguments from a
> > programming point of view and it may be a relative high limit as
> > compared to other systems but it is certainly at the low side for my
> > application: I work with data files typically 200 GB in size
> > consisting of datapackets each having a fixed size frame and a
> > variable size payload. To handle these large files, I generate an
> > "index" file consisting of just the frames (which has all the metadata
> > I need for finding the payloads I am interested in) and "pointers" to
> > where in the large data file each payload begins. This index file can
> > be up to 1 GB in size and at times I need to have access to two of
> > those at the same time (and then i hit the address wall). I would
> > really really like to be able to access these index files in a
> > read-only manner as an array of records on a file for which I use
> > numpy.memmap (which wraps mmap.mmap) such that I can pick a single
> > element, extract, e.g., every thousand value of a specific field in
> > the record using the convenient indexing available in Python/numpy.
> > Now it seems like I have to resort to making my own encapsulation
> > layer, which seeks to the relevant place in the file, reads sections
> > as bytestrings into recarrays, etc. Well, I must just get on with
> > it...
>
> > I think it would be worthwhile specifying this 32 bit OS limitation in
> > the documentation of mmap.mmap, as I doubt I am the only one being
> > surprised about this address space limitation.
>
> > Cheers,
> > Kim
>
> I agree that some description of system limitations should be included
> in a system-specific document.  There probably is one, I haven't looked
> recently.  But I don't think it belongs in mmap documentation.
>
> Perhaps you still don't recognize what the limit is.  32 bits can only
> address 4 gigabytes of things as first-class addresses.  So roughly the
> same limit that's on mmap is also on list, dict, bytearray, or anything
> else.  If you had 20 lists taking 100 meg each, you would fill up
> memory.  If you had 10 of them, you might have enough room for a 1gb
> mmap area.  And your code takes up some of that space, as well as the
> Python interpreter, the standard library, and all the data structures
> that are normally ignored by the application developer.
>
> BTW,  there is one difference between mmap and most of the other
> allocations.  Most data is allocated out of the swapfile, while mmap is
> allocated from the specified file (unless you use -1 for fileno).  
> Consequently, if the swapfile is already clogged with all the other
> running applications, you can still take your 1.8gb or whatever of your
> virtual space, when much less than that might be available for other
> kinds of allocations.
>
> Executables and dlls are also (mostly) mapped into memory just the same
> as mmap.  So they tend not to take up much space from the swapfile.  In
> fact, with planning, a DLL needn't take up any swapfile space (well, a
> few K is always needed, realistically)..  But that's a linking issue for
> compiled languages.
>
> DaveA- Skjul tekst i anførselstegn -
>
> - Vis tekst i anførselstegn -

I do understand the 2 GB address space limitation. However, I think I
have found a solution to my original numpy.memmap problem (which spun
off to this problem), and that is PyTables, where I can address 2^64
data on a 32 bit machine using hd5 files and thus circumventing the
"implementation detail" of the intermedia 2^32 memory address problem
in the numpy.memmap/mmap.mmap implementation.

http://www.pytables.org/moin

I just watched the first tutorial video, and that seems like just what
I am after (if it works as well in practise at it appears to do).

http://showmedo.com/videos/video?name=1780000&fromSeriesID=178

Cheers,
Kim



More information about the Python-list mailing list