using mmap on large (> 2 Gig) files

Chetan pandyacus.xspam at xspam.sbcglobal.net
Thu Oct 26 03:25:38 EDT 2006


Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

> "sturlamolden" <sturlamolden at yahoo.no> writes:
>> However, "memory mapping" a file by means of fseek() is probably more
>> efficient than using UNIX' mmap() or Windows'
>> CreateFileMapping()/MapViewOfFile().
>
> Why on would you think that?!  It is counterintuitive.  fseek beyond
> whatever is buffered in stdio (usually no more than 1kbyte or so)
> requires a system call, while mmap is just a memory access.
And the buffer copy required with every I/O from/to the application. 

>> In Python, we don't always need the file memory mapped, we normally
>> just want to use slicing-operators, for-loops and other goodies on
>> the file object -- i.e. we just want to treat the file as a Python
>> container object. There are many ways of achieving that.
>
> Some of the time we want to share the region with other processes.
> Sometimes we just want random access to a big file on disk without
> having to do a lot of context switches seeking around in the file.
>
>> There are in any case room for improving Python's mmap object.
>
> IMO it should have some kind of IPC locking mechanism added, in
> addition to the offset stuff suggested.
The type of IPC required differs depending on who is using the shared region -
either another python process or another external program. Apart from the
spinlock primitives, other types of synchronization mechanisms are provided by
the OS. However, I do see value in providing a shared memory based spinlock
mechanism. These services can be built on top of the shared memory
infrastructure. I am not sure what kind or real world python applications use
it. 

-Chetan



More information about the Python-list mailing list