creating/modifying sparse files on linux

Bengt Richter bokr at oz.net
Wed Aug 17 21:41:49 EDT 2005


On 17 Aug 2005 11:53:39 -0700, "draghuram at gmail.com" <draghuram at gmail.com> wrote:

>
>Hi,
>
>Is there any special support for sparse file handling in python? My
>initial search didn't bring up much (not a thorough search). I wrote
>the following pice of code:
>
>options.size = 6442450944
>options.ranges = ["4096,1024","30000,314572800"]
>fd = open("testfile", "w")
>fd.seek(options.size-1)
>fd.write("a")
>for drange in options.ranges:
>    off = int(drange.split(",")[0])
>    len = int(drange.split(",")[1])
>    print "off =", off, " len =", len
>    fd.seek(off)
>    for x in range(len):
>    fd.write("a")
>
>fd.close()
>
>This piece of code takes very long time and in fact I had to kill it as
>the linux system started doing lot of swapping. Am I doing something
>wrong here? Is there a better way to create/modify sparse files?
>
>Thanks
I'm unclear as to what your goal is. Do you just need an object that provides
an interface like a file object, but internally is more efficient than an
a normal file object when you access it as above[1], or do you need to create
a real file and record all the bytes in full (with what default for gaps?)
on disk, so that it can be opened by another program and read as an ordinary file?

Some operating system file systems may have some support for virtual zero-block runs
and lazy allocation/representation of non-zero blocks in files. It's easy to imagine
the rudiments, but I don't know of such a file system, not having looked ;-)

You could write your own "sparse-file"-representation object, and maybe use pickle
for persistence. Or maybe you could use zipfiles. The kind of data you are creating above
would probably compress really well ;-)

[1] writing 314+ million identical bytes one by one is silly, of course ;-)
BTW, len is a built-in function, and using built-in names for variables
is frowned upon as a bug-prone practice.

Regards,
Bengt Richter



More information about the Python-list mailing list