[Python-checkins] CVS: python/dist/src/Lib/test test_largefile.py,1.7,1.8

Tim Peters tim@zope.com
Thu, 6 Sep 2001 13:14:12 -0400


>> Dubious assumptions:
>>
>> 1. That seeking beyond the end of a file increases the size of a file.

> I haven't seen the code that alledgedly made this assumption, but the
> Unix/Posix rule is actually subtly different: seeking beyond the end
> of a file *and then writing* increases the size of the file.

That code hasn't changed; it does write after seeking beyond the end, and
did before too.

The MS C routines are inadequately documented.  The (presumably) underlying
Win32 routines are better documented, and the SetFilePointer() docs are
clear provided you stick to the Win32 API;

    Note that it is not an error to set the file pointer to a position
    beyond the end of the file.  The size of the file does not increase
    until you call the SetEndOfFile, WriteFile, or WriteFileEx function.
    A write operation increases the size of the file to the file pointer
    position plus the size of the buffer written, leaving the intervening
    bytes uninitialized.

Note that it's also clear there's no guarantee about what shows up in the
gap.

>> 2. That files so extended are magically filled with null bytes.

> Interesting.  This is worth knowing -- the "fill with null bytes" is
> holy dogma on Unix.

See last point.

On Win98SE w/ FAT32, test_largefile ran in an eyeblink.  On Win2K w/ NTFS,
it took more than a minute.  By hand on the latter, all the time is consumed
during the close:

>>> f = open('ga', 'wb')
>>> f.seek(2500000000)   # eyeblink
>>> f.write('4')         # eyeblink
>>> f.close()  # loooooooon delay here

Inspection of the file then shows it filled with 0 bytes, although I see
nothing in the docs guaranteeing that.  NTFS 5.0 introduced a "sparse file"
concept for quick manipulation of giant files w/ lots of zeroes, but it
looks like you have to ask for one of those explicitly (and I don't see a
libc way to do that -- just a new flag to Win32 DeviceIoControl()).

Whatever, C doesn't guarantee any of this stuff.