Break large file down into multiple files

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Fri Feb 13 05:16:47 EST 2009


En Fri, 13 Feb 2009 05:43:02 -0200, brianrpsgt1 <brianlong at cox.net>  
escribió:

> On Feb 12, 11:02 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
>> En Fri, 13 Feb 2009 04:44:54 -0200, brianrpsgt1 <brianl... at cox.net>  
>> escribió:
>>
>> > New to python.... I have a large file that I need to break upinto
>> > multiple smallerfiles. I need to break the large fileintosections
>> > where there are 65535 lines and then write thosesectionsto seperate
>> >files.  I am familiar with opening and writingfiles, however, I am
>> > struggling with creating thesectionsand writing the different
>> >sectionsto their ownfiles.
>>
>> This function copies at most n lines from fin to fout:
>>
>> def copylines(fin, fout, n):
>>    for i, line in enumerate(fin):
>>      fout.write(line)
>>      if i+1>=n: break
>>
>> Now you have to open the source file, create newfilesas needed and  
>> repeatedly call the above function until the end of source file. You'll  
>>  
>> have to enhace it bit, to know whether there are remaining lines or not.
>>
>> --
>> Gabriel Genellina
>
> Gabriel ::
>
> Thanks for the direction.  Do I simply define fin, fout and n as
> variables above the def copylines(fin, fout, n): line?
>
> Would it look like this?
>
> fin = open('C:\Path\file')
> fout = 'C:\newfile.csv')
> n = 65535

Warning: see this FAQ entry  
http://www.python.org/doc/faq/general/#why-can-t-raw-strings-r-strings-end-with-a-backslash
>
> def copylines(fin, fout, n):
>     for i, line in enumerate(fin):
>       fout.write(line)
>       if i+1>=n: break

Almost. You have to *call* the copylines function, not just define it.  
After calling it with: copylines(fin, fout, 65535), you'll have the  
*first* chunk of lines copied. So you'll need to create a second file,  
call the copylines function again, create a third file, call... You'll  
need some kind of loop, and a way to detect when to stop and break out of  
it.

The copylines function already knows what happened (whether there are more  
lines to copy or not) so you should enhace it and return such information  
to the caller.

It isn't so hard, after working out the tutorial (linked from  
http://wiki.python.org/moin/BeginnersGuide ) you'll know enough Python to  
finish this program.

If you have some previous programming experience, Dive into Python (linked  
 from the Beginners Guide above) is a good online book.

Feel free to come back when you're stuck with something.

-- 
Gabriel Genellina




More information about the Python-list mailing list