_csv.Error: string with NUL bytes
Peter Otten
__peter__ at web.de
Thu May 3 15:00:15 EDT 2007
dustin at v.igoro.us wrote:
> I'm guessing that your file is in UTF-16, then -- Windows seems to do
> that a lot. It kind of makes it *not* a CSV file, but oh well. Try
>
> print open("test.csv").decode('utf-16').read().replace("\0",
> ">>>NUL<<<")
>
> I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
> way to get the CSV reader to handle such encoding without reading in the
> whole file, decoding it, and setting up a StringIO file.
Not pretty, but seems to work:
from __future__ import with_statement
import csv
import codecs
def recoding_reader(stream, from_encoding, args=(), kw={}):
intermediate_encoding = "utf8"
efrom = codecs.lookup(from_encoding)
einter = codecs.lookup(intermediate_encoding)
rstream = codecs.StreamRecoder(stream, einter.encode, efrom.decode,
efrom.streamreader, einter.streamwriter)
for row in csv.reader(rstream, *args, **kw):
yield [unicode(column, intermediate_encoding) for column in row]
def main():
file_encoding = "utf16"
# generate sample data:
data = u"\xe4hnlich,\xfcblich\r\nalpha,beta\r\ngamma,delta\r\n"
with open("tmp.txt", "wb") as f:
f.write(data.encode(file_encoding))
# read it
with open("tmp.txt", "rb") as f:
for row in recoding_reader(f, file_encoding):
print u" | ".join(row)
if __name__ == "__main__":
main()
Data from the file is recoded to UTF-8, then passed to a csv.reader() whose
output is decoded to unicode.
Peter
More information about the Python-list
mailing list