_csv.Error: string with NUL bytes
dustin at v.igoro.us
dustin at v.igoro.us
Thu May 3 13:40:33 EDT 2007
On Thu, May 03, 2007 at 10:28:34AM -0700, IAmStarsky at gmail.com wrote:
> On May 3, 10:12 am, dus... at v.igoro.us wrote:
> > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
> > > > As Larry said, this most likely means there are null bytes in the CSV file.
> >
> > > > Ciao,
> > > > Marc 'BlackJack' Rintsch
> >
> > > How would I go about identifying where it is?
> >
> > A hex editor might be easiest.
> >
> > You could also use Python:
> >
> > print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
> >
> > Dustin
>
> Hmm, interesting if I run:
>
> print open("test.csv").read().replace("\0", ">>>NUL<<<")
>
> every single character gets a >>>NUL<<< between them...
>
> What the heck does that mean?
>
> Example, here is the first field in the csv
>
> 89114608511,
>
> the above code produces:
> >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,
I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot. It kind of makes it *not* a CSV file, but oh well. Try
print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<")
I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
way to get the CSV reader to handle such encoding without reading in the
whole file, decoding it, and setting up a StringIO file.
Dustin
More information about the Python-list
mailing list