[Tutor] Script to collect values from .csv

Prasad, Ramit ramit.prasad at jpmorgan.com
Thu Jul 12 19:07:03 CEST 2012


> I have a very large .csv (correlationfile, which is 16 million lines long)
> which I want to split into smaller .csvs. The smaller csvs should be created
> be searching for a value and printing any line which contains that value - all
> these values are contained in another .csv (vertexfile). I think that I have
> an indentation problem or have made a mistake with my loops because I only get
> data in one of the output .csvs (outputfile) which is for the first one of the
> values. The other .csvs are empty.
> 
> Can somebody help me please?
> 
> Thanks so much!
> 
> Emma
> 
> import os
> path = os.getcwd()
> x = ''
> for v in vertexfile:
>     vs = v.replace('\n','')
>     outputfile = open(os.path.join(path,vs+'.csv'),'w')
>     for c in correlationfile:
>         cs = c.replace('\n','').split(',')
>         if vs == cs[0]: print vs
>     outputfile.write(x)
> outputfile.close()

Indent the outputfile.close()  to be inside the for loop.
That should fix your problem.

I would recommend working with csv module instead. No need
to worry about replacing new lines or if a comma is contained 
inside your data. Note, when using the csv module open the 
files as 'rb' and 'wb'.


Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  


More information about the Tutor mailing list