[Tutor] Encode problem

Mark Tolonen metolone+gmane at gmail.com
Tue May 5 16:16:54 CEST 2009


"Kent Johnson" <kent37 at tds.net> wrote in message 
news:1c2a2c590905050337j1afc177ene64f800dcc3a7e7a at mail.gmail.com...
> On Tue, May 5, 2009 at 1:14 AM, Mark Tolonen <metolone+gmane at gmail.com> 
> wrote:

> > The below works. ConfigParser isn't written to support Unicode 
> > correctly. I
> > was able to get Unicode sections to write out, but it was just luck. 
> > Unicode
> > keys and values break as the OP discovered. So treat everything as byte
> > strings:

> Thanks for the complete example.

> > files = glob.glob('*.txt')
> > c.add_section('files')
> >
> > for i,fn in enumerate(files):
> > fn = fn.decode(sys.getfilesystemencoding())

> I think if you give a Unicode string to glob.glob(), e.g.
> glob.glob(u'*.txt'), then the strings returned will also be unicode
> and this decode step will not be needed.

You're right, that's why I had the comment above it :^)

    # The following could be glob.glob(u'.') to get a filename in
    # Unicode, but this is for illustration that the encoding of the
    # source file has no bearing on the encoding strings other than
    # ones hard-coded in the source file.

The OP had wondered why his source file encoding "doesn't use the encoding 
defined for the application (# -*- coding: utf-8 -*-)." and I thought this 
would illustrate that byte strings could be in other encodings.  It also 
shows the reason spir could said "... you shouldn't even need explicit 
encoding; they should pass through silently because they fit in an 8 bit 
latin charset.".  If I'd left out the Chinese, I could've use a latin-1 
encoding for everthing and not decode or encode at all (assuming the file 
system was latin-1).

-Mark




More information about the Tutor mailing list