[Tutor] Encode problem
Mark Tolonen
metolone+gmane at gmail.com
Tue May 5 16:16:54 CEST 2009
"Kent Johnson" <kent37 at tds.net> wrote in message
news:1c2a2c590905050337j1afc177ene64f800dcc3a7e7a at mail.gmail.com...
> On Tue, May 5, 2009 at 1:14 AM, Mark Tolonen <metolone+gmane at gmail.com>
> wrote:
> > The below works. ConfigParser isn't written to support Unicode
> > correctly. I
> > was able to get Unicode sections to write out, but it was just luck.
> > Unicode
> > keys and values break as the OP discovered. So treat everything as byte
> > strings:
> Thanks for the complete example.
> > files = glob.glob('*.txt')
> > c.add_section('files')
> >
> > for i,fn in enumerate(files):
> > fn = fn.decode(sys.getfilesystemencoding())
> I think if you give a Unicode string to glob.glob(), e.g.
> glob.glob(u'*.txt'), then the strings returned will also be unicode
> and this decode step will not be needed.
You're right, that's why I had the comment above it :^)
# The following could be glob.glob(u'.') to get a filename in
# Unicode, but this is for illustration that the encoding of the
# source file has no bearing on the encoding strings other than
# ones hard-coded in the source file.
The OP had wondered why his source file encoding "doesn't use the encoding
defined for the application (# -*- coding: utf-8 -*-)." and I thought this
would illustrate that byte strings could be in other encodings. It also
shows the reason spir could said "... you shouldn't even need explicit
encoding; they should pass through silently because they fit in an 8 bit
latin charset.". If I'd left out the Chinese, I could've use a latin-1
encoding for everthing and not decode or encode at all (assuming the file
system was latin-1).
-Mark
More information about the Tutor
mailing list