Unicode encoding - ignoring errors

Mon Dec 29 07:10:46 EST 2008

On Mon, Dec 29, 2008 at 4:06 AM, Michal Ludvig <mludvig at logix.net.nz> wrote:
> Hi,
>
> in my script I have sys.stdout and sys.stderr redefined to output
> unicode strings in the current system encoding:
>
>        encoding = locale.getpreferredencoding()
>        sys.stdout = codecs.getwriter(encoding)(sys.stdout)
>
> However on some systems the locale doesn't let all the unicode chars be
> displayed and I eventually end up with UnicodeEncodeError exception.
>
> I know I could explicitly "sanitize" all output with:
>
>        whatever.encode(encoding, "replace")
>
> but it's quite inconvenient. I'd much prefer to embed this "replace"
> operation into the sys.stdout writer.
>
> Is there any way to set a conversion error handler in codecs.getwriter()
> or perhaps chain it with some other filter somehow? I prefer to have
> questionmarks in the output instead of experiencing crashes with
> UnicodeEncodeErrors ;-)

You really should read the fine module docs (namely,
http://docs.python.org/library/codecs.html ).

codecs.getwriter() returns a StreamWriter subclass (basically).
The constructor of said subclass has the signature:
    StreamWriter(stream[, errors])
You want the 'errors' argument.

So all you have to do is add one argument to your stdout reassignment:
sys.stdout = codecs.getwriter(encoding)(sys.stdout, 'replace')

Yay Python, for making such things easy!

Cheers,
Chris

-- 
Follow the path of the Iguana...
http://rebertia.com