Break large file down into multiple files

Chris cwitts at gmail.com
Fri Feb 13 06:36:11 EST 2009


On Feb 13, 1:19 pm, Chris <cwi... at gmail.com> wrote:
> On Feb 13, 10:02 am, redbaron <ivanov.ma... at gmail.com> wrote:
>
>
>
> > > New to python.... I have a large file that I need to break up into
> > > multiple smaller files. I need to break the large file into sections
> > > where there are 65535 lines and then write those sections to seperate
> > > files.
>
> > If your lines are variable-length, then look at itertools recipes.
>
> > from itertools import izip_longest
>
> > def grouper(n, iterable, fillvalue=None):
> >     "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
> >     args = [iter(iterable)] * n
> >     return izip_longest(fillvalue=fillvalue, *args)
>
> > with open("/file","r") as f:
> >     for lines in grouper(65535,f,""):
> >         data_to_write = '\n'.join(lines).rstrip("\n")
> >         ...
> >         <write data where you need it here>
> >         ...
>
> I really would not recommend joining a large about of lines, that will
> take some times.
>
> fIn = open(input_filename, 'rb')
> chunk_size = 65535
>
> for i,line in enumerate(fIn):
>     if not i:   # First Line in the File, create a file to start
> writing to
>         filenum = '%04d'%(i%chunk_size)+1
>         fOut = open('%s.txt'%filenum, 'wb')
>     if i and not i % chunk_size:   # Once at the chunk_size close the
> old file object and create a new one
>         fOut.close()
>         filenum = '%04d'%(i%chunk_size)+1
>         fOut = open('%s.txt'%filenum, 'wb')
>     if not i % 1000:
>         fOut.flush()
>     fOut.write(line)
>
> fOut.close()
> fIn.close()

Whoops, day-dreaming mistake.  Use "filenum = '%04d'%(i/chunk_size)+1"
and not i%chunk_size.



More information about the Python-list mailing list