Flushing buffer on file copy on linux

Antoine Pitrou solipsis at pitrou.net
Wed Aug 15 11:26:02 EDT 2012


J <dreadpiratejeff <at> gmail.com> writes:
> 
> Now, the problem I have is that linux tends to buffer data writes to a
> device, and I want to work around that.  When run in normal non-stress
> mode, the program is slow enough that the linux buffers flush and put
> the file on disk before the hash occurs.  However, when run in stress
> mode, what I'm finding is that it appears that the files are possibly
> being hashed while still in the buffer, before being flushed to disk.

Your analysis is partly wrong. It is right that the files can be hashed from 
in-memory buffers; but even if you flush the buffers to disk using standard 
techniques (such as fsync()), those buffers still exist in memory, and 
therefore the file will still be hashed from memory (for obvious efficiency 
reasons).

I don't think there's a portable solution to get away entirely with the 
in-memory buffers, but under Linux you can write "1" to the special file 
/proc/sys/vm/drop_caches:

$ sudo sh -c "echo 1 > /proc/sys/vm/drop_caches"

Or, to quote the /proc man page:

       /proc/sys/vm/drop_caches (since Linux 2.6.16)
              Writing to this file  causes  the  kernel  to  drop  clean
              caches, dentries and inodes from memory, causing that mem‐
              ory to become free.

              To free pagecache, use echo 1 >  /proc/sys/vm/drop_caches;
              to    free   dentries   and   inodes,   use   echo   2   >
              /proc/sys/vm/drop_caches; to free pagecache, dentries  and
              inodes, use echo 3 > /proc/sys/vm/drop_caches.

              Because  this  is  a  nondestructive  operation  and dirty
              objects are not freeable,  the  user  should  run  sync(8)
              first.


Regards

Antoine.


-- 
Software development and contracting: http://pro.pitrou.net





More information about the Python-list mailing list