_csv.Error: string with NUL bytes

John Machin sjmachin at lexicon.net
Thu May 3 18:03:43 EDT 2007


On May 4, 3:40 am, dus... at v.igoro.us wrote:
> On Thu, May 03, 2007 at 10:28:34AM -0700, IAmStar... at gmail.com wrote:
> > On May 3, 10:12 am, dus... at v.igoro.us wrote:
> > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
> > > > > As Larry said, this most likely means there are null bytes in the CSV file.
>
> > > > > Ciao,
> > > > >         Marc 'BlackJack' Rintsch
>
> > > > How would I go about identifying where it is?
>
> > > A hex editor might be easiest.
>
> > > You could also use Python:
>
> > >   print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
>
> > > Dustin
>
> > Hmm, interesting if I run:
>
> > print open("test.csv").read().replace("\0", ">>>NUL<<<")
>
> > every single character gets a >>>NUL<<< between them...
>
> > What the heck does that mean?
>
> > Example, here is the first field in the csv
>
> > 89114608511,
>
> > the above code produces:
> > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,
>
> I'm guessing that your file is in UTF-16, then -- Windows seems to do
> that a lot.

Do what a lot? Encode data in UTF-16xE without putting in a BOM or
telling the world in some other fashion what x is? Humans seem to do
that occasionally. When they use Windows software, the result is
highly likely to be encoded in UTF-16LE -- unless of course the human
deliberately chooses otherwise (e.g. the "Unicode bigendian" option in
NotePad's "Save As" dialogue). Further, the data is likely to have a
BOM prepended.

The above is consistent with BOM-free UTF-16BE.




More information about the Python-list mailing list