Joining Big Files

Paul McGuire ptmcg at austin.rr.com
Sat Aug 25 21:48:33 EDT 2007


On Aug 25, 8:15 pm, Paul McGuire <pt... at austin.rr.com> wrote:
> On Aug 25, 4:57 am, mosscliffe <mcl.off... at googlemail.com> wrote:
>
> > I have 4 text files each approx 50mb.
>
> <yawn> 50mb? Really?  Did you actually try this and find out it was a
> problem?
>
> Try this:
> import time
>
> start = time.clock()
> outname = "temp.dat"
> outfile = file(outname,"w")
> for inname in ['file1.dat', 'file2.dat', 'file3.dat', 'file4.dat']:
>     infile = file(inname)
>     outfile.write( infile.read() )
>     infile.close()
> outfile.close()
> end = time.clock()
>
> print end-start,"seconds"
>
> For 4 30Mb files, this takes just over 1.3 seconds on my system.  (You
> may need to open files in binary mode, depending on the contents, but
> I was in a hurry.)
>
> -- Paul

My bad, my test file was not a text file, but a binary file.
Retesting with a 50Mb text file took 24.6 seconds on my machine.

Still in your working range?  If not, then you will need to pursue
more exotic approaches.  But 25 seconds on an infrequent basis does
not sound too bad, especially since I don't think you will really get
any substantial boost from them (to benchmark this, I timed a raw
"copy" command at the OS level of the resulting 200Mb file, and this
took about 20 seconds).

Keep it simple.

-- Paul




More information about the Python-list mailing list