[issue7185] csv reader utf-8 BOM error

Istvan Szirtes report at bugs.python.org
Sat Oct 24 10:13:55 CEST 2009


Istvan Szirtes <istvan.szirtes at gmail.com> added the comment:

Hi Everyone,

I have tried the "utf-8-sig" and it does not work in this case or 
rather I think not the csv module is wrong. The seek() does not work 
correctly in the csv file or object.

With "utf-8-sig" the file is opend correctly and the first row does not 
include the BOM problem. It is great. 
I am sorry I have not known this until now. (I am not a python expert 
yet :))

However, I have gote some misstake like this 'AFTE\ufeffVALUE".WAV' 
during my running script.

"AFTER" is a valid string in the given csv file but the BOM follows it.
This happens after when I seek up to "0" some times in the csv file.
And the string "aftevalue" LEAVE_HIGHWAY-E" is produced which is wrong.

My sollution is that I convert the csv object into a list after the 
file openeing:

        InDistancesFile = codecs.open( Root, 'r', encoding='utf-8' )
        txt = InDistancesFile.read()[1:] # to leave the BOM
        lines = txt.splitlines()[1:] # to leave the first row which is 
a header
        InDistancesObj = list(csv.reader( lines )) # convert the csv 
reader object into a simple list

Many thanks for your help,
Istvan

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7185>
_______________________________________


More information about the Python-bugs-list mailing list