Manipulate Large Binary Files

Paul Rubin http
Thu Apr 3 03:03:09 EDT 2008


Derek Martin <code at pizzashack.org> writes:
> > Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
> > any ways I can optimize either solution?  

Getting 40+ MB/sec through a file system is pretty impressive. 
Sounds like a RAID?

> That said, due to normal I/O generally involving double-buffering, you
> might be able to speed things up noticably by using Memory-Mapped I/O
> (MMIO).  It depends on whether or not the implementation of the Python
> things you're using already use MMIO under the hood, and whether or
> not MMIO happens to be broken in your OS. :)

Python has the mmap module and I use it sometimes, but it's not
necessarily the right thing for something like this.  Each page you
try to read from results in own delay while the resulting page fault
is serviced, so any overlapped i/o you get comes from the OS being
nice enough to do some predictive readahead for you on sequential
access if it does that.  By coincidence there are a couple other
threads mentioning AIO which is a somewhat more powerful mechanism.




More information about the Python-list mailing list