Creating huge data in very less time.

Tue Mar 31 08:41:12 EDT 2009

andrea wrote:
> On 31 Mar, 12:14, "venutaurus... at gmail.com" <venutaurus... at gmail.com>
> wrote:
>> That time is reasonable. The randomness should be in such a way that
>> MD5 checksum of no two files should be the same.The main reason for
>> having such a huge data is for doing stress testing of our product.
> 
> 
> In randomness is not necessary (as I understood) you can just create
> one single file and then modify one bit of it iteratively for 1000
> times.
> It's enough to make the checksum change.
> 
> Is there a way to create a file to big withouth actually writing
> anything in python (just give me the garbage that is already on the
> disk)?

Not exactly AFAIK, but this line of thinking does remind me of 
sparse files[1] if your filesystem supports them:

   f = file('%i.txt' % i, 'wb')
   data = str(i) + '\n'
   f.seek(1024*1024*1024 - len(data))
   f.write(data)
   f.close()

On FS's that support sparse files, it's blindingly fast and 
creates a virtual file of that size without the overhead of 
writing all the bits to the file.  However, this same 
optimization may also throw off any benchmarking you do, as it 
doesn't have to read a gig off the physical media.  This may be a 
good metric for hash calculation across such files, but not a 
good metric for I/O.

-tkc

[1]
http://en.wikipedia.org/wiki/Sparse_file