csv module and NULL data byte

Thu Mar 1 19:15:50 EST 2018

On 2018-03-01 23:57, John Pote wrote:
> On 01/03/2018 01:35, Tim Chase wrote:
> > While inelegant, I've "solved" this with a wrapper/generator
> >
> >    f = file(fname, …)
> >    g = (line.replace('\0', '') for line in f)  
> I wondered about something like this but thought if there's a way
> of avoiding the extra step it would keep the execution speed up.

There shouldn't be noticeable performance issues with using a
generator.  It's also lazy so it's not like it's pulling the entire
file into memory; no more than one line at a time.

> My next thought was to pass a custom encoder to the open() that 
> translates NULLs to, say, 0x01. It won't make any difference to
> change one corrupt value to a different corrupt value.
> >    reader = csv.reader(g, …)
> >    for row in reader:
> >      process(row)  

...which is pretty much exactly what my generator solution does:
putting a translating encoder between the open() and the
csv.reader() call.

-tkc