Is this the right way to write a codec error handler?

Serhiy Storchaka storchaka at gmail.com
Sat Jan 20 05:57:45 EST 2018


20.01.18 10:32, Steven D'Aprano пише:
> I want an error handler that falls back on Latin-1 for anything which
> cannot be decoded. Is this the right way to write it?
> 
> 
> def latin1_fallback(exception):
>      assert isinstance(exception, UnicodeError)
>      start, end = exception.start, exception.end
>      obj = exception.object
>      if isinstance(exception, UnicodeDecodeError):
>          return (obj[start:end].decode('latin1'), end+1)
>      elif isinstance(exception, UnicodeEncodeError):
>          return (obj[start:end].encode('latin1'), end+1)
>      else:
>          raise

Just `end` instead of `end+1`.

And it is safer to use `bytes.decode(obj[start:end], 'latin1')` or 
`str(obj[start:end], 'latin1')` instead of 
`obj[start:end].decode('latin1')`. Just for the case if obj has 
overridden decode() method.

Otherwise LGTM.




More information about the Python-list mailing list