problem with logging exceptions with non-ASCII __str__ result

Vinay Sajip vinay_sajip at yahoo.co.uk
Mon Jan 14 16:31:38 EST 2008


On Jan 14, 5:46 pm, Karsten Hilbert <Karsten.Hilb... at gmx.net> wrote:
> Dear all,
>
> I have a problem withloggingan exception.
>
> environment:
>
>         Python 2.4, Debian testing
>
>         ${LANGUAGE} not set
>         ${LC_ALL} not set
>         ${LC_CTYPE} not set
>         ${LANG}=de_DE.UTF-8
>
>         activating user-default locale with <locale.setlocale(locale.LC_ALL, '')> returns: [de_DE.UTF-8]
>
>         locale.getdefaultlocale() - default (user) locale: ('de_DE', 'utf-8')
>         encoding sanity check (also check "locale.nl_langinfo(CODESET)" below):
>         sys.getdefaultencoding(): [ascii]
>         locale.getpreferredencoding(): [UTF-8]
>         locale.getlocale()[1]: [utf-8]
>         sys.getfilesystemencoding(): [UTF-8]
>
>         _logfile = codecs.open(filename = _logfile_name, mode = 'wb', encoding = 'utf8', errors = 'replace')
>
>        logging.basicConfig (
>                 format = fmt,
>                 datefmt = '%Y-%m-%d %H:%M:%S',
>                 level =logging.DEBUG,
>                 stream = _logfile
>         )
>
> I am using psycopg2 which in turn uses libpq. When trying to
> connect to the database and providing faulty authentication
> information:
>
>         try:
>                 ... try to connect ...
>         except StandardError, e:
>                 _log.error(u"login attempt %s/%s failed:", attempt+1, max_attempts)
>
>                 print "exception type  :", type(e)
>                 print "exception dir   :", dir(e)
>                 print "exception args  :", e.args
>                 msg = e.args[0]
>                 print "msg type        :", type(msg)
>                 print "msg.decode(utf8):", msg.decode('utf8')
>                 t,v,tb = sys.exc_info()
>                 print "sys.exc_info()  :", t, v
>                 _log.exception(u'exception detected')
>
> the following output is generated:
>
>         exception type  : <type 'instance'>
>         exception dir   : ['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args']
>         exception args  : ('FATAL:  Passwort-Authentifizierung f\xc3\xbcr Benutzer \xc2\xbbany-doc\xc2\xab fehlgeschlagen\n',)
>         msg type        : <type 'str'>
>         msg.decode(utf8): FATAL:  Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen
>
>         sys.exc_info()  : psycopg2.OperationalError FATAL:  Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen
>
>         Traceback (most recent call last):
>           File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
>             self.stream.write(fs % msg.encode("UTF-8"))
>         UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 191: ordinal not in range(128)
>
> Now, the string "FATAL: Passwort-Auth..." comes from libpq
> via psycopg2. It is translated to German via gettext within
> libpq (at the C level). As we can see it is of type string.
> I know from the environment that it is likely encoded in
> utf8 manually applying which (see the decode call) succeeds.
>
> On _log.exception() theloggingmodule wants to output the
> message as encoded as utf8 (that's what the log file is set
> up as). So it'll look at the string, decide it is of type
> "str" and decode with the *Python default encoding* to get
> to type "unicode". Following which it'll re-encode with utf8
> to get back to type "str" ready for outputting to the log
> file.
>
> However, since the Python default encoding is "ascii" that
> conversion fails.
>
> Changing the Python default encoding isn't really an option
> as it is advocated against and would have to be made to work
> reliably on other users machines.
>
> One could, of course, write code to specifically check for
> this condition and manually pre-convert the message string
> to unicode but that seems not as things should be.
>
> How can I cleanly handle this situation ?
>
> Should theloggingmodule internally use an encoding gotten
> from the locale module rather than the default string encoding ?
>
> Karsten
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Please reduce to a minimal program which demonstrates the issue and
log an issue on bugs.python.org.

Best regards,

Vinay Sajip



More information about the Python-list mailing list