problems writing utf8
Boudewijn Rempt
boud at valdyas.org
Sat Apr 13 05:20:16 EDT 2002
Martin v. Loewis wrote:
> Boudewijn Rempt <boud at valdyas.org> writes:
>
>> Then I tried to write the utf-8 data to a file. I have tried to
>> construct that file with two methods:
>>
>> f = open("syllables", "w+")
>> d2 = codecs.EncodedFile(f, "unicode_internal", "utf-8")
>> f2.write(u"a")
>> f2.close()
>
> This can't work (even though it should not crash). The EncodedFile
> performs transparent recoding from two named encodings. In this
> context, unicode_internal is the name of a *byte* encoding, namely the
> encoding which exposes the internal memory layout of Unicode objects.
>
Right. That wasn't clear to me.
>
> import codecs
> f3 = codecs.open("syllables2", "w+", "utf-8")
> f3.write(u"\N{LATIN LETTER GLOTTAL STOP}")
> f3.close()
>
> print repr(open("syllables2").read())
>
> I get
>
> '\xca\x94'
>
> which indeed is the UTF-8 representation of the glottal stop. What did
> you get?
>
'\xc3\x8a\xc2\x94' ...
At least, when I do exactly what you did, I get the same result.
Oh, verdraaid, I see what I did wrong. On my previous computer, which
has melted down, I had hacked sys so my default encoding was the same
as that in my locale -- utf-8.
On my new computer, I just compiled and installed the latest Python,
and I can only enter source in ASCII and escape sequences.
Was PEP 263 implemented already? Must investigate...
--
Boudewijn Rempt | http://www.valdyas.org
More information about the Python-list
mailing list