[py-dev] Unicode support in py.log

Wed Jul 13 17:49:58 CEST 2005

--- holger krekel <hpk at trillke.net> wrote:

> Hi Grig! 
> 
> On Wed, Jul 13, 2005 at 08:11 -0700, Grig Gheorghiu wrote:
> > I was using the default STDOUT consumer in py.log when I ran into a
> > problem printing strings that contain non-ASCII characters. I think
> we
> > should modify at a minimum the 'content' method of the Message
> class in
> > producer.py so that it uses unicode as opposed to str. Here's what
> I
> > did temporarily to get past my problem:
> > 
> > def content(self):
> >     return " ".join(map(unicode, self.args)).encode('utf-8')
> >
> > Of course, the encoding should be configurable, but I'm not sure if
> > it's best to have it as a global variable in producer.py or some
> other
> > way.
> 
> Hum, i am not sure about the best way to go about unicode handling in
> 
> py.log context.  I guess that content() should always return 
> a unicode object and the log consumer should care about encodings. 
> And the default STDOUT/STDERR consumer should convert to the system 
> encoding.  If a user wants something different he has to register
> an appropriate consumer.  makes sense? 

It seems that if you call str(msg) you get back a string object and not
a unicode object, even if the __str__ method of the Message class
returns a unicode object. So the solution I found was to define a
__unicode__ method for the Message class. Consumers will have to call
unicode(msg).encode('desired_encoding'). At this point, the __str__
method is not doing anything useful, unless we do some default encoding
ourselves. Here's how I modified the Message class, with both content()
and prefix() now returning unicode objects:

class Message(object):
    def __init__(self, keywords, args):
        self.keywords = keywords
        self.args = args

    def content(self):
        return " ".join(map(unicode, self.args))

    def prefix(self):
        return "[%s] " % (u":".join(self.keywords))

    def __unicode__(self):
        return self.prefix() + self.content()

    def __str__(self):
        s = self.prefix() + self.content()
        return s.encode('utf-8')

If we leave __str__ as defined above, consumers will still be able to
call str(msg) and things will work, so we're not really enforcing the
policy that consumers are in charge of encoding. So maybe we should
just drop the __str__ method?

Grig