[Tutor] Iterating over columns in a tab-delimited text and writing SQL statements

Kent Johnson kent_johnson at skillsoft.com
Sat Nov 6 13:41:36 CET 2004


Oh, I forgot - my guess is the reason you get an IndexError is that you 
have a blank line in your file. I would check for the expected number of 
fields in each data line before processing it. I've modified my program 
below to do this.

<...goes to change the program to check for short lines...>

No, actually, I don't have to change my program. The semantics of slicing 
and zip() make it work fine for short lines.
data[2:] just returns an empty list if len(data) <= 2
zip(header, []) also returns an empty list
so the iteration over zip(header, data[2:]) is just empty when a data line 
is too short. For lines that are longer than 2 items but shorter than 
expected it will just use what is there.

Kent

At 12:33 AM 11/6/2004 -0500, Kent Johnson wrote:
>You need two nested loops to do this. zip() is also very handy, it lets 
>you iterate two lists at the same time. I think this does what you want:
>
>f1 = open('GeneName.txt', 'r')
>header = f1.readline().split()[3:]
>
>for line in f1:
>     data = line.split()
>     for sample, expr in zip(header, data[2:]):
>         sql = '''INSERT into expression (gene_name,probe_name,sample,exprs)
>             VALUES ('%s', '%s', '%s', %s)''' % (data[0], data[1], sample, 
> expr)
>         print sql
>
>For your sample data, it prints
>INSERT into expression (gene_name,probe_name,sample,exprs)
>             VALUES ('xyz', '1000_at', 'AL21', 272.3)
>INSERT into expression (gene_name,probe_name,sample,exprs)
>             VALUES ('xyz', '1000_at', 'AL11', 345.2)
>INSERT into expression (gene_name,probe_name,sample,exprs)
>             VALUES ('xyz', '1000_at', 'AL12', -32.45)
>INSERT into expression (gene_name,probe_name,sample,exprs)
>             VALUES ('abc', '1001_at', 'AL21', 134.4)
>INSERT into expression (gene_name,probe_name,sample,exprs)
>             VALUES ('abc', '1001_at', 'AL11', 45.3)
>INSERT into expression (gene_name,probe_name,sample,exprs)
>             VALUES ('abc', '1001_at', 'AL12', 342.59)



More information about the Tutor mailing list