[issue5888] mmap ehancement - resize with sequence notation

Josh Rosenberg report at bugs.python.org
Mon Jun 16 23:40:09 CEST 2014


Josh Rosenberg added the comment:

I see a few issues with this:

1. Changing the default behavior is a compatibility issue. I've written code that depends on exceptions being raised if slice assignment sizes don't match.
2. The performance cost is high; changing from rewriting in place to shrinking or expanding slice assignment requires (in different orders for shrink/expand) truncating the file to the correct length, memcpy-ing data proportionate to the data after the end of the slice (not proportionate to the slice size) and probably remapping the file (which causes problems if someone has a buffer attached to the existing mapping). At least with non-file backed sequences, when we do work like this it's all in memory and typically smallish; with a file, most of it has to be read from and written to disk, and I'd assume the data being worked with is "largish" (if it's reliably small, the advantages of mmap-ing are small).
3. Behavior in cases where the whole file isn't mapped is hard to intuit or define reasonably. If I map the first 1024 bytes of a 2 GB file, and I add 20 bytes in the middle of the block, what happens? Does data from the unmapped portions get moved? Overwritten? What about removing 20 bytes from the middle of the block? Do we write 0s, or copy down the data that appears after? And remember, for all but the "shrink and write 0s" option, we're moving or modifying data the user explicitly didn't mmap.

----------
nosy: +josh.rosenberg

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5888>
_______________________________________


More information about the Python-bugs-list mailing list