[Tutor] Encode problem
Mark Tolonen
metolone+gmane at gmail.com
Tue May 5 07:14:24 CEST 2009
"spir" <denis.spir at free.fr> wrote in message
news:20090501220601.31891dfc at o...
> Le Fri, 1 May 2009 15:19:29 -0300,
> "Pablo P. F. de Faria" <pablofaria at gmail.com> s'exprima ainsi:
>
>> self.cfg.write(codecs.open(self.properties_file,'w','utf-8'))
>>
>> As one can see, the character encoding is explicitly UTF-8. But
>> ConfigParser keeps trying to save it as a 'ascii' file and gives me
>> error for directory-names containing >128 code characters (like "Á").
>> It is just a horrible thing to me, for my app will be used mostly by
>> brazillians.
>
> Just superficial suggestions, only because it's 1st of May and WE so that
> better answers won't maybe come up before monday.
>
> If all what you describe is right, then there must be something wrong with
> char encoding in configParser's write method. Have you had a look at it?
> While I hardly imagine why/how ConfigParser would limit file pathes to
> 7-bit ASCII...
> Also, for porteguese characters, you shouldn't even need explicit
> encoding; they should pass through silently because they fit in an 8 bit
> latin charset. (I never encode french path/file names.)
The below works. ConfigParser isn't written to support Unicode correctly.
I was able to get Unicode sections to write out, but it was just luck.
Unicode keys and values break as the OP discovered. So treat everything as
byte strings:
----------------------------------------------------
# coding: utf-8
# Note coding is required because of non-ascii
# in the source code. This ONLY controls the
# encoding of the source file characters saved to disk.
import ConfigParser
import glob
import sys
c = ConfigParser.ConfigParser()
c.add_section('马克') # this is a utf-8 encoded byte string...no u'')
c.set('马克','多少','明白') # so are these
# The following could be glob.glob(u'.') to get a filename in
# Unicode, but this is for illustration that the encoding of the
# source file has no bearing on the encoding strings other than
# one's hard-coded in the source file. The 'files' list will be byte
# strings in the default file system encoding. Which for Windows
# is 'mbcs'...a magic value that changes depending on the
# which country's version of Windows is running.
files = glob.glob('*.txt')
c.add_section('files')
for i,fn in enumerate(files):
fn = fn.decode(sys.getfilesystemencoding())
fn = fn.encode('utf-8')
c.set('files','file%d'%(i+1),fn)
# Don't need a codec here...everything is already UTF8.
c.write(open('chinese.txt','wt'))
--------------------------------------------------------------
Here is the content of my utf-8 file:
-----------------------------
[files]
file3 = ascii.txt
file2 = chinese.txt
file1 = blah.txt
file5 = ÀÈÌÒÙ.txt
file4 = other.txt
[马克]
多少 = 明白
----------------------------
Hope this helps,
Mark
More information about the Tutor
mailing list