[Numpy-discussion] Issues with the memmap object
Sturla Molden
sturla at molden.no
Mon Jun 18 11:30:39 EDT 2007
After struggling with NumPy's memmap object, I examined the code and
detected three severe problems. I suggest that memmap is removed from
NumPy, at least on Windows, as it's shortcomings is severe and
undocumented.
Problem 1: I/O errors are never detected on Win32:
On Windows, i/o errors are trapped using structured exception handling
when using memory mapped objects. Neither NumPy nor Python use
structured exception handling on Win32. This means that i/o errors (such
as network or disk failure) will go undetected, and be a source of
obscure bugs.
The bugfix for this is to wrap any access attempt to an PyArrayObject's
"data" pointer with __try and __except blocks, and using an MSVC
compiler on Windows. GCC/MinGW cannot be used, as it does not support
structured exception handling. In other words,
PyArrayObject *memmap;
__try {
/* safe read/write access to memmap->data here */
}
__except( GetExceptionCode() == EXCEPTION_IN_PAGE_ERROR ?
EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH) {
/* Windows signaled an I/O error, handle the problem here */
}
Not only must NumPy itself be rewritten, but also any library getting a
data pointer from a NumPy memmap array. Fixing this will be extremely
difficult, if not impossible. The only safe way to access file data from
NumPy is numpy.fromfile() and numpy.array.tofile().
Problem 2: Mapping always starts from the beginning of the file:
Python's standard mmap object from the beginning of the file, regardless
of the size. NumPy's memmap object depends on Python's mmap through the
buffer protocol. Even though NumPy's memmap object takes an offset
parameter, the actual memory mapping starts from the beginning of the
file. Thus, virtual memory equal to the memmap object's offset parameter
will be leaked until the memmap object is deleted.
Problem 3: No 64 bit support on Windows or Linux:
On Linux, large files must be memory mapped using mmap64 (or mmap2 if 4k
boundaries are acceptable). On Windows, CreateFileMapping/MapViewOfFile
has 64 bit support, but Python's mmap does not use it (the high offset
DWORD is always zero). Only files smaller than 4 GB can be memory mapped.
Regards,
Sturla Molden
More information about the NumPy-Discussion
mailing list