Version Control Software

Roy Smith roy at panix.com
Sun Jun 16 09:50:19 EDT 2013


In article <mailman.3442.1371389433.3114.python-list at python.org>,
 Chris メKwpolskaモ Warrick <kwpolska at gmail.com> wrote:

> (I’m using wc -c to count the bytes in all files there are.  du is
> unaccurate with files smaller than 4096 bytes.)

It's not that du is not accurate, it's that it's measuring something 
different.  It's measuring how much disk space the file is using.  For 
most files, that's the number of characters in the file rounded up to a 
full block.  For large files, I believe it also includes the overhead of 
indirect blocks or extent trees.  And, finally, for sparse files, it 
takes into account that some logical blocks in the file may not be 
mapped to any physical storage.

So, whether you want to use "du" or "wc -c" depends on what you're 
trying to measure.  If you want to know how much disk space you're 
using, du is the right tool.  If you want to know how much data will be 
transmitted if the file is serialized (i.e. packed in a tarball or sent 
via a "{hg,git} clone" operation), then "wc-c" is what you want.

All that being said, for the vast majority of cases (and I would be 
astonished if this was not true for any real-life vcs repo), the 
difference between what wc and du tell you is not worth worrying about.  
And du is going to be a heck of a lot faster.



More information about the Python-list mailing list