File holes in Linux

Grant Edwards invalid at invalid.invalid
Wed Sep 29 16:38:14 EDT 2010


On 2010-09-29, Ned Deily <nad at acm.org> wrote:
><AANLkTinPUYzL5LaQBV-B3BUX6OzYd6+UMPXRptqH7Wcz at mail.gmail.com>,
>  Tom Potts <karaken12 at gmail.com> wrote:
>> Hi, all.  I'm not sure if this is a bug report, a feature request or what,
>> so I'm posting it here first to see what people make of it.  I was copying
>> over a large number of files using shutil, and I noticed that the final
>> files were taking up a lot more space than the originals; a bit more
>> investigation showed that files with a positive nominal filesize which
>> originally took up 0 blocks were now taking up the full amount.  It seems
>> that Python does not write back file holes as it should; here is a simple
>> program to illustrate:
>>   data = '\0' * 1000000
>>   file = open('filehole.test', 'wb')
>>   file.write(data)
>>   file.close()
>> A quick `ls -sl filehole.test' will show that the created file actually
>> takes up about 980k, rather than the 0 bytes expected.
>
> I would expect the file size to be 980k in that case.  AFAIK, simply 
> writing null bytes doesn't automatically create a sparse file on Unix-y 
> systems.

Correct.  As Ned says, you create holes by seeking past the end of the
file before writing data, not by writing 0x00 bytes.  Here's a
demonstration:

Writing 0x00 values:

  $ dd if=/dev/zero of=foo1 bs=1M count=10  
  10+0 records in
  10+0 records out
  10485760 bytes (10 MB) copied, 0.0315967 s, 332 MB/s

  $ ls -l foo1
  -rw-r--r-- 1 grante users 10485760 Sep 29 15:32 foo1

  $ du -s foo1
  10256   foo1

Seeking, then write a single byte:
  
  $ dd if=/dev/zero of=foo2 bs=1 count=1 seek=10485759
  1+0 records in
  1+0 records out
  1 byte (1 B) copied, 8.3075e-05 s, 12.0 kB/s
  
  $ ls -l foo2
  -rw-r--r-- 1 grante users 10485760 Sep 29 15:35 foo2

  $ du -s foo2
  16      foo2


-- 
Grant Edwards               grant.b.edwards        Yow! Please come home with
                                  at               me ... I have Tylenol!!
                              gmail.com            



More information about the Python-list mailing list