[Tutor] Malformed CSV

Kent Johnson kent37 at tds.net
Fri Dec 2 15:29:42 CET 2005


Jan Eden wrote:
> Hi,
> 
> I need to parse a CSV file using the csv module:
> 
> "hotel","9,463","95","1.00"
> "hotels","7,033","73","1.04"
> "hotels hamburg","2,312","73","3.16"
> "hotel hamburg","2,708","42","1.55"
> "Hotels","2,854","41","1.44"
> "hotel berlin","2,614","31","1.19"
> 
> Unfortunately, the quote characters are not properly escaped within fields:
> 
> ""hotel,hamburg"","1","0","0"
> ""hotel,billig, in berlin tegel"","1","0","0"
> ""hotel+wien"","1","0","0"
> ""hotel+nürnberg"","1","0","0"
> ""hotel+london"","1","0","0"
> ""hotel" "budapest" "billig"","1","0","0"
> 
> Is there a way to deal with the incorrect quoting automatically?

I'm not entirely sure how you want to interpret the data above. One possibility is to just change the double "" to single " before processing with csv. For example:

# data is the raw data from the whole file
data = '''""hotel,hamburg"","1","0","0"
""hotel,billig, in berlin tegel"","1","0","0"
""hotel+wien"","1","0","0"
""hotel+nurnberg"","1","0","0"
""hotel+london"","1","0","0"
""hotel" "budapest" "billig"","1","0","0"'''

data = data.replace('""', '"')
data = data.splitlines()

import csv

for line in csv.reader(data):
    print line

Output is 
['hotel,hamburg', '1', '0', '0']
['hotel,billig, in berlin tegel', '1', '0', '0']
['hotel+wien', '1', '0', '0']
['hotel+nurnberg', '1', '0', '0']
['hotel+london', '1', '0', '0']
['hotel "budapest" "billig"', '1', '0', '0']

which looks pretty reasonable except for the last line, and I don't really know what you would consider correct there.

Kent
-- 
http://www.kentsjohnson.com



More information about the Tutor mailing list