[Tutor] Writing SQL statments on multicolumn tab delimitted text file

Wed Nov 24 11:53:21 CET 2004

kumar,

If you are writing to a database, the order of the writes shouldn't 
matter. Most databases do not guarantee that insert order is preserved. 
Instead you should sort when you read from the database by including an 
'order by' clause in your query SQL.

If you still want to generate the SQL in column order, you have to read 
all the data into columns before you start generating SQL. This is 
similar to the last question you had. You can create lists of the row 
data, then use zip() to swap rows and columns.

Kent

kumar s wrote:
> Dear group, 
>  I have a large tab-delimitted file with 82 columns
> (experiment name) and 13000 rows ( gene names).
> 
>      e1     e2       e3       e4       ..... e80
> g1  23.5    34      34.5     45.3            23.5
> g2  63.5    34.6    32.4     23.4            78.90
> g3  12.2    23.4    56.3     14.6            98.20
> g4  9.9     20.1    98.3     21.1            19.4
> g5  23.5    23.3    34.3     19.5            18.5
> ..
> g13K 
> 
> I wrote a script, that writes these numbers into SQL
> statement.
> 
> for example:
> INSERT into experiment(exp_name,gene_name,exp_value)
> VALUES ('e1','g1',23.5);
> 
> However, this program prints across samples
> (horizontally):
> INSERT into experiment(exp_name,gene_name,exp_value)
> VALUES ('e1','g1',23.5);
> INSERT into experiment(exp_name,gene_name,exp_value)
> VALUES ('e2','g1',34);
> INSERT into experiment(exp_name,gene_name,exp_value)
> VALUES ('e3','g1',34.5);
> 
> 
> INSTEAD Of this:
> I WANT To print the statements vertically (down the
> list of an experiment):
> 
> INSERT into experiment(exp_name,gene_name,exp_value)
> VALUES ('e1','g1',23.5);
> INSERT into experiment(exp_name,gene_name,exp_value)
> VALUES ('e1','g2',63.5);
> INSERT into experiment(exp_name,gene_name,exp_value)
> VALUES ('e1','g3',12.2);
> 
> 
> How can I do this:
> 
> Here is my code :
> 
> 
> import string
> from string import split
> 
> file = open('sql_input.txt','r')
> read_lines = file.read()
> matrix = split(read_lines,'\n')
> first_row = matrix[0]
> samps = split(first_row,'\t')
> samples = samps[2:]
> filew = open('beer.sql','w')
> for i in range(len(matrix)-1):
>         rows = split(matrix[i+1],'\t')
>         for k in range(len(samples)):
>                 filew.write("INSERT INTO
> affy_proc_data(affy_proc_gene_name,affy_proc_probeset_id,affy_proc_sample,affy_proc_exprs)VALUES
> (" "'"+rows[0]+"'" ","
> "'"+rows[1]+"'"",""'"+samples[k]+"'"
> ","+rows[2+k]+");"'\n')
> 
> filew.close()
> 
> 
> Can any one help me suggesting, how to make this loop
> running over a column and then jump to next , instead
> of running row by row.
> 
> Thanks
> 
> kumar.
> 
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Take Yahoo! Mail with you! Get it on your mobile phone. 
> http://mobile.yahoo.com/maildemo 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>