Very Slow Disk Writes when Writing Large Data Blocks

remmm remmmav at gmail.com
Fri Jun 2 14:14:31 EDT 2017


I'm seeing slow write speeds from both Python and C code on some Windows workstations.  In particular both Python "write" and numpy "tofile" method suffers from this issue.  I'm wondering if anyone has any ideas regarding if this is a known issue, know the cause, or how to resolve the issue?  The details are below.

The slow write speed issue seems to occur when writing data in blocks of larger than 32767 512-byte disk sectors.  In terms of speed, write speed seems as expected until one gets to this 32767 limit and then the speed falls off as if all data beyond this is processed byte-by-byte.  I can't prove this is what is happening -- but speed tests generally support this theory.  These write speeds are in the range of 18 to 25 MBytes per second for spinning disks and about 50 Mbytes/sec for SSDs.  Keep in mind these numbers should be more like 120 MBytes/sec for spinning disks and 300 MBytes/sec for SSDs.

This issue seems to be system specific.  I originally saw this on my HP z640 workstation using Python 2.7 under Windows 7.  Originally it was numpy writes of large arrays in the 100GB size range that highlighted the issue, but I've since written test code using just python "write" too and get similar results using various block sizes.  I've since verified this using cygwin mingw64 C and with Visual Studio C 2013.  I've also tested this on a variety of other systems.  My laptop does not show this speed issue, and not all z640 systems seem to show this issue though I've found several that do. IT has tested this on a clean Windows 7 image and on a Windows 10 image using yet another Z640 system and they get similar results.  I've also not seen any Linux systems show this issue though I don't have any Z640's with Linux on them.  I have however run my tests on Linux Mint 17 running under VirtualBox on the same Z640 that showed the speed issue and using both Wine and native python and both showed good performance and no slowdown.

A work around for this seems to be to enable full caching for the drive in device manager with the subsequent risk of data corruption.  This suggests for example that the issue is byte-by-byte flushing of data beyond the 32767 limit and that perhaps full cashing mitigates this some how.  The other work around is to write all data in blocks of less than the 32767 limit (which is about 16Mbytes) as mentioned above. Of course reducing block size only works if you have the source code and the time and inclination to modify it.  There is an indication that some of the commercial code we use for science and engineering also may suffer from this issue.  

The impact of this issue also seems application specific.  The issue only becomes annoying when your regularity writing files of significant size (above say 10GB).  It also depends on how an application writes data, so not all applications that create large files may exhibit this issue.  As an example, Python numpy tofile method has this issue for large enough arrays and is the reason I started to investigate.

I don't really know where to go with this.  Is this a Windows issue?  Is it an RTL issue?  Is it a hardware, device driver, or bios issue?  Is there a stated OS or library limit to buffer sizes to things like C fwrite or Python write which makes this an application issue? Thoughts?

Thanks,
remmm



More information about the Python-list mailing list