logging of strings with broken encoding

Fri Jul 3 03:25:28 EDT 2009

Thomas Guettler wrote:
> Stefan Behnel schrieb:
>> Thomas Guettler wrote:
>>> My quick fix is this:
>>>
>>> class MyFormatter(logging.Formatter):
>>>     def format(self, record):
>>>         msg=logging.Formatter.format(self, record)
>>>         if isinstance(msg, str):
>>>             msg=msg.decode('utf8', 'replace')
>>>         return msg
>>>
>>> But I still think handling of non-ascii byte strings should be better.
>>> A broken logging message is better than none.
>> Erm, may I note that this is not a problem in the logging library but in
>> the code that uses it?
> 
> I know that my code passes the broken string to the logging module. But maybe
> I get the non-ascii byte string from a third party (psycopg2 sometime passes
> latin1 byte strings from postgres in error messages).

If the database contains non-ascii byte string, then you could repr()
them before logging (repr also adds some niceties such as quotes). I
think that's the best solution, unless you want to decode the byte
string (which might be an overkill, depending on the situation).

> I like Python very much because "it refused to guess". But in this case, "best effort"
> is a better approach.

One time it refused to guess, then the next time it tries best effort. I
don't think Guido liked such inconsistency.

> It worked in 2.5 and will in py3k. I think it is a bug, that it does not in 2.6.

In python 3.x, the default string is unicode string. If it works in
python 2.5, then it is a bug in 2.5