Creating huge data in very less time.

Tue Mar 31 06:14:43 EDT 2009

On Mar 31, 1:15 pm, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:
> On Mon, 30 Mar 2009 22:44:41 -0700, venutaurus... at gmail.com wrote:
> > Hello all,
> >             I've a requirement where I need to create around 1000
> > files under a given folder with each file size of around 1GB. The
> > constraints here are each file should have random data and no two files
> > should be unique even if I run the same script multiple times.
>
> I don't understand what you mean. "No two files should be unique" means
> literally that only *one* file is unique, the others are copies of each
> other.
>
> Do you mean that no two files should be the same?
>
> > Moreover
> > the filenames should also be unique every time I run the script. One
> > possibility is that we can use Unix time format for the file   names
> > with some extensions.
>
> That's easy. Start a counter at 0, and every time you create a new file,
> name the file by that counter, then increase the counter by one.
>
> > Can this be done within few minutes of time. Is it
> > possble only using threads or can be done in any other way. This has to
> > be done in Windows.
>
> Is it possible? Sure. In a couple of minutes? I doubt it. 1000 files of
> 1GB each means you are writing 1TB of data to a HDD. The fastest HDDs can
> reach about 125 MB per second under ideal circumstances, so that will
> take at least 8 seconds per 1GB file or 8000 seconds in total. If you try
> to write them all in parallel, you'll probably just make the HDD waste
> time seeking backwards and forwards from one place to another.
>
> --
> Steven

That time is reasonable. The randomness should be in such a way that
MD5 checksum of no two files should be the same.The main reason for
having such a huge data is for doing stress testing of our product.