_csv.Error: string with NUL bytes

dustin at v.igoro.us dustin at v.igoro.us
Thu May 3 13:40:33 EDT 2007


On Thu, May 03, 2007 at 10:28:34AM -0700, IAmStarsky at gmail.com wrote:
> On May 3, 10:12 am, dus... at v.igoro.us wrote:
> > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
> > > > As Larry said, this most likely means there are null bytes in the CSV file.
> >
> > > > Ciao,
> > > >         Marc 'BlackJack' Rintsch
> >
> > > How would I go about identifying where it is?
> >
> > A hex editor might be easiest.
> >
> > You could also use Python:
> >
> >   print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
> >
> > Dustin
> 
> Hmm, interesting if I run:
> 
> print open("test.csv").read().replace("\0", ">>>NUL<<<")
> 
> every single character gets a >>>NUL<<< between them...
> 
> What the heck does that mean?
> 
> Example, here is the first field in the csv
> 
> 89114608511,
> 
> the above code produces:
> >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,

I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot.  It kind of makes it *not* a CSV file, but oh well.  Try 

  print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<")

I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
way to get the CSV reader to handle such encoding without reading in the
whole file, decoding it, and setting up a StringIO file.

Dustin



More information about the Python-list mailing list