problems writing utf8

Sat Apr 13 05:20:16 EDT 2002

Martin v. Loewis wrote:

> Boudewijn Rempt <boud at valdyas.org> writes:
> 
>> Then I tried to write the utf-8 data to a file. I have tried to
>> construct that file with two methods:
>> 
>>     f = open("syllables", "w+")
>>     d2 = codecs.EncodedFile(f, "unicode_internal", "utf-8")
>>     f2.write(u"a")
>>     f2.close()
> 
> This can't work (even though it should not crash). The EncodedFile
> performs transparent recoding from two named encodings. In this
> context, unicode_internal is the name of a *byte* encoding, namely the
> encoding which exposes the internal memory layout of Unicode objects.
> 

Right. That wasn't clear to me.

> 
> import codecs
> f3 = codecs.open("syllables2", "w+", "utf-8")
> f3.write(u"\N{LATIN LETTER GLOTTAL STOP}")
> f3.close()
> 
> print repr(open("syllables2").read())
> 
> I get
> 
> '\xca\x94'
> 
> which indeed is the UTF-8 representation of the glottal stop. What did
> you get?
>

'\xc3\x8a\xc2\x94' ...

At least, when I do exactly what you did, I get the same result.

Oh, verdraaid, I see what I did wrong. On my previous computer, which
has melted down, I had hacked sys so my default encoding was the same
as that in my locale -- utf-8.

On my new computer, I just compiled and installed the latest Python,
and I can only enter source in ASCII and escape sequences.

Was PEP 263 implemented already? Must investigate...

-- 
Boudewijn Rempt | http://www.valdyas.org