[Tutor] Malformed CSV

Jan Eden lists at janeden.org
Fri Dec 2 14:50:17 CET 2005


Hi,

I need to parse a CSV file using the csv module:

"hotel","9,463","95","1.00"
"hotels","7,033","73","1.04"
"hotels hamburg","2,312","73","3.16"
"hotel hamburg","2,708","42","1.55"
"Hotels","2,854","41","1.44"
"hotel berlin","2,614","31","1.19"

The idea is to use each single keyword (field 1) as a dictionary key and sum up the clicks (field 2) and transactions (field 3):

try:
    keywords[keyword]['clicks'] += clicks
    keywords[keyword]['transactions'] += transactions
# if the keyword has not been found yet...
except KeyError:
    keywords[keyword] = { 'clicks' : clicks, 'transactions' : 
transactions }

Unfortunately, the quote characters are not properly escaped within fields:

""hotel,hamburg"","1","0","0"
""hotel,billig, in berlin tegel"","1","0","0"
""hotel+wien"","1","0","0"
""hotel+nürnberg"","1","0","0"
""hotel+london"","1","0","0"
""hotel" "budapest" "billig"","1","0","0"

which leads to the following output (example):

hotel    9,463hamburg""billig    951 in berlin tegel""

As you can see, Python added 'hamburg""' and 'billig' to the first 'hotel' row's click value (9,463), and '1' as well as ' in berlin tegel' to the transactions (95). I am aware that I need to convert real clicks/transactions to integers before adding them, but I first wanted to sort out the parsing problem.

Is there a way to deal with the incorrect quoting automatically?

Thanks,

Jan
-- 
I was gratified to be able to answer promptly, and I did. I said I didn't know. - Mark Twain


More information about the Tutor mailing list