[Tutor] Iterating over columns in a tab-delimited text and writing SQL statements

Fri Nov 5 17:34:15 CET 2004

Dear group, 
 I have a file with ~200 Columns and 20,000 rows. 

Every column designates a patient sample.  Each such
patient sample data should be made as an experiment in
my database. So, 
I have to write SQL insert statements on these. 

My data resembles likes this in its simplest form:

Gene Name  probe_name   AL21     AL11	AL12  
xyz         1000_at     272.3   345.2   -32.45
abc         1001_at     134.4    45.3   342.59
...............................................

up to 20K rows.

Now I want to create an sql statement like the
following:

INSERT into expression (gene_name,probe_name,sample,
exprs) VALUES ('xyz', '1000_at','AL21',272.3)

import string
>>> from string import split
>>> f1 = open('insert_test.txt','r')
>>> lines = f1.read()
>>> rows = split(lines,'\n')
>>> rows
['Gene Name\tprobe_name\tAL10\tAL21\tAL23',
'xyz\t1000_at\t272.3\t345.2\t-32.45',
'abc\t1001_at\t134.4\t45.3\t342.59', '']
>>> f2 = open('sql.txt','w')
>>> for i in range(len(lst1)):
	cols = split(lst1[i],'\t')
	f2.write('(INSERT INTO
expr(gene_name,probe_name,sample,expr)
VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
')''\n')

Traceback (most recent call last):
  File "<pyshell#63>", line 3, in -toplevel-
    f2.write('(INSERT INTO
expr(gene_name,probe_name,sample,expr)
VALUES('+cols[0]+','+cols[1]+','+sample+','+cols[2]+
')''\n')
IndexError: list index out of range
>>> f2.close()

Result from sql.txt:

(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(Gene Name,probe_name,AL21,AL10)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL21,272.3)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL21,134.4)

My problem:

I am unable to write a function that will take my
matrix and write statements for every sample such as
AL21, AL11,AL12. 

I know I am not good at looping over columns here.
Take column 3 first and column 4 and then column 5
like that... However 
column 1 and 2 remains same for every sample, because
they are common for evert sample column.
Something like:

(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(Gene Name,probe_name,AL21,AL10)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL21,272.3)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL21,134.4)

(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL11,345.2)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL11,45.3)

(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(xyz,1000_at,AL12,-32.5)
(INSERT INTO expr(gene_name,probe_name,sample,expr)
VALUES(abc,1001_at,AL12,345.59)

2. Also why am I getting INdex out of range ,
IndexError: ? I have no explanation to it. 

Can any one please help me. 

Thank you. 

Kumar.

__________________________________ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com