mmap 2GB allocation limit on Win XP, 32-bits, Python 2.5.4

Dave Angel davea at ieee.org
Mon Jul 27 07:21:44 EDT 2009


(forwarding this message, as the reply was off-list)
Kim Hansen wrote:
> 2009/7/24 Dave Angel <davea at ieee.org>:
>   
>> It's not a question of how much disk space there is, but how much virtual
>> space 32 bits can address.  2**32 is about 4 gig, and Windows XP reserves
>> about half of that for system use.  Presumably a 64 bit OS would have a much
>> larger limit.
>>
>> Years ago I worked on Sun Sparc system which had much more limited shared
>> memory access, due to hardware limitations.  So 2gig seems pretty good to
>> me.
>>
>> There is supposed to be a way to tell the Windows OS to only use 1 gb of
>> virtual space, leaving 3gb for application use.  But there are some
>> limitations, and I don't recall what they are.  I believe it has to be done
>> globally (probably in Boot.ini), rather than per process.  And some things
>> didn't work in that configuration.
>>
>> DaveA
>>
>>
>>     
> Hi Dave,
>
> In the related post I did on the numpy discussions:
>
> http://article.gmane.org/gmane.comp.python.numeric.general/31748
>
> another user was kind enough to run my test program on both 32 bit and
> 64 bit machines. On the 64 bit machine, there was no such limit, very
> much in line with what you wrote. Adding the /3GB option in boot.ini
> did not increase the available memory as well. Apparently, Python
> needs to have been compiled in a way, which makes it possible to take
> advantage of that switch and that is either not the case or I did
> something else wrong as well.
>
> I acknowledge the explanation concerning the address space available.
> Being an ignorant of the inner details of the implementation of mmap,
> it seems like somewhat an "implementation detail" to me that such an
> address wall is hit. There may be some good arguments from a
> programming point of view and it may be a relative high limit as
> compared to other systems but it is certainly at the low side for my
> application: I work with data files typically 200 GB in size
> consisting of datapackets each having a fixed size frame and a
> variable size payload. To handle these large files, I generate an
> "index" file consisting of just the frames (which has all the metadata
> I need for finding the payloads I am interested in) and "pointers" to
> where in the large data file each payload begins. This index file can
> be up to 1 GB in size and at times I need to have access to two of
> those at the same time (and then i hit the address wall). I would
> really really like to be able to access these index files in a
> read-only manner as an array of records on a file for which I use
> numpy.memmap (which wraps mmap.mmap) such that I can pick a single
> element, extract, e.g., every thousand value of a specific field in
> the record using the convenient indexing available in Python/numpy.
> Now it seems like I have to resort to making my own encapsulation
> layer, which seeks to the relevant place in the file, reads sections
> as bytestrings into recarrays, etc. Well, I must just get on with
> it...
>
> I think it would be worthwhile specifying this 32 bit OS limitation in
> the documentation of mmap.mmap, as I doubt I am the only one being
> surprised about this address space limitation.
>
> Cheers,
> Kim
>
>   
I agree that some description of system limitations should be included 
in a system-specific document.  There probably is one, I haven't looked 
recently.  But I don't think it belongs in mmap documentation.

Perhaps you still don't recognize what the limit is.  32 bits can only 
address 4 gigabytes of things as first-class addresses.  So roughly the 
same limit that's on mmap is also on list, dict, bytearray, or anything 
else.  If you had 20 lists taking 100 meg each, you would fill up 
memory.  If you had 10 of them, you might have enough room for a 1gb 
mmap area.  And your code takes up some of that space, as well as the 
Python interpreter, the standard library, and all the data structures 
that are normally ignored by the application developer.

BTW,  there is one difference between mmap and most of the other 
allocations.  Most data is allocated out of the swapfile, while mmap is 
allocated from the specified file (unless you use -1 for fileno).  
Consequently, if the swapfile is already clogged with all the other 
running applications, you can still take your 1.8gb or whatever of your 
virtual space, when much less than that might be available for other 
kinds of allocations.

Executables and dlls are also (mostly) mapped into memory just the same 
as mmap.  So they tend not to take up much space from the swapfile.  In 
fact, with planning, a DLL needn't take up any swapfile space (well, a 
few K is always needed, realistically)..  But that's a linking issue for 
compiled languages.

DaveA




More information about the Python-list mailing list