Split single file into multiple files based on patterns

satyam mukherjee dirac.sat at gmail.com
Tue Oct 23 23:44:38 EDT 2012


Thanks I will take a look...My actual data is 2.5Gb in size.
Satyam

On Tue, Oct 23, 2012 at 10:43 PM, Jason Friedman <jason at powerpull.net>wrote:

> On Tue, Oct 23, 2012 at 9:01 PM, satyam <dirac.sat at gmail.com> wrote:
> > I have a text file like this
> >
> > A1980JE39300007 2732 4195 12.527000
> > A1980JE39300007 3465 9720 22.000000
> > A1980JE39300007 1853 3278 12.500000
> > A1980JE39300007 2732 2732 187.500000
> > A1980JE39300007 19 4688 3.619000
> > A1980KK18700010 30 186 1.285000
> > A1980KK18700010 30 185 4.395000
> > A1980KK18700010 185 186 9.000000
> > A1980KK18700010 25 30 3.493000
> >
> > I want to split the file and get multiple files like A1980JE39300007.txt
> and A1980KK18700010.txt, where each file will contain column2, 3 and 4.
>
> Unless your source file is very large this should be sufficient:
>
> $ cat source
> A1980JE39300007 2732 4195 12.527000
> A1980JE39300007 3465 9720 22.000000
> A1980JE39300007 1853 3278 12.500000
> A1980JE39300007 2732 2732 187.500000
> A1980JE39300007 19 4688 3.619000
> A1980JE39300007 2995 9720 6.667000
> A1980JE39300007 1603 9720 30.000000
> A1980JE39300007 234 4195 42.416000
> A1980JE39300007 2732 9720 18.000000
> A1980KK18700010 130 303 4.985000
> A1980KK18700010 7 4915 0.435000
> A1980KK18700010 25 1620 1.722000
> A1980KK18700010 25 186 0.654000
> A1980KK18700010 50 130 3.199000
> A1980KK18700010 186 3366 4.780000
> A1980KK18700010 30 186 1.285000
> A1980KK18700010 30 185 4.395000
> A1980KK18700010 185 186 9.000000
> A1980KK18700010 25 30 3.493000
>
> $ python3
> Python 3.2.3 (default, Sep 10 2012, 18:14:40)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> for line in open("source"):
> ...     file_name, remainder = line.strip().split(None, 1)
> ...     with open(file_name + ".txt", "a") as writer:
> ...         print(remainder, file=writer)
> ...
> >>>
>
> $ ls *txt
> A1980JE39300007.txt  A1980KK18700010.txt
>
> $ cat A1980JE39300007.txt
> 2732 4195 12.527000
> 3465 9720 22.000000
> 1853 3278 12.500000
> 2732 2732 187.500000
> 19 4688 3.619000
> 2995 9720 6.667000
> 1603 9720 30.000000
> 234 4195 42.416000
> 2732 9720 18.000000
>



-- 
-------------------------------------------
WHEN LIFE GIVES U HUNDRED REASONS TO CRY,SHOW LIFE THAT U HAVE THOUSAND
REASONS TO SMILE :-)

satyam mukherjee
224-436-3672 (Mob)
847-491-7238 (Off)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20121023/715553f2/attachment.html>


More information about the Python-list mailing list