Can't print Chinese to HTTP

Alf P. Steinbach alfps at start.no
Sat Dec 5 06:30:31 EST 2009


* Lie Ryan:
> On 12/5/2009 2:57 PM, Gnarlodious wrote:
>> On Dec 1, 3:06 pm, Terry Reedy wrote:
>>> def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))
>>
>> Here is a better solution that lets me send any string to the
>> function:
>>
>> def print(html): return sys.stdout.buffer.write(("Content-type:text/
>> plain;charset=utf-8\n\n"+html).encode('utf-8'))
> 
> No, that's wrong. You're serving HTML with Content-type:text/plain, it 
> should've been text/html or application/xhtml+xml (though technically 
> correct some older browsers have problems with the latter).
> 
>> Why this changed in Python 3 I do not know, nor why it was nowhere to
>> be found on the internet.
>>
>> Can anyone explain it?
> 
> Python 3's str() is what was Python 2's unicode().
> Python 2's str() turned into Python 3's bytes().
> 
> Python 3's print() now takes a unicode string, which is the regular string.
> 
> Because of the switch to unicode str, a simple print('晉') should've 
> worked flawlessly if your terminal can accept the character, but the 
> problem is your terminal does not.
> 
> The correct fix is to fix your terminal's encoding.
> 
> In Windows, due to the prompt's poor support for Unicode, the only real 
> solution is to switch to a better terminal.

A bit off-topic perhaps, but that last is a misconception. Windows' [cmd.exe] 
does have poor support for UTF-8, in short it Does Not Work in Windows XP, and 
probably does not work in Vista or Windows7 either. However, Windows console 
windows have full support for the Basic Multilingual Plane of Unicode: they're 
pure Unicode beasts.

Thus, the problem is an interaction between two systems that Do Not Work: the 
[cmd.exe] program's practically non-existing support for UTF-8 (codepage 65001), 
and the very unfortunate confusion of stream i/o and interactive i/o in *nix, 
which has ended up as a "feature" (it's more like a design bug) in a lot of 
programming languages stemming from *nix origins, and that includes Python.

Windows' "terminal", its console window support, is INNOCENT... :-)

In Windows, as opposed to *nix, interactive character i/o is separated at the 
API level. There is integration with stream i/o, but the interactive i/o can be 
accessed separately. This is the "console function" API.

So for interactive console i/o one solution could be some Python module for 
interactive console i/o, on Windows internally using the Windows console 
function API, which is fully Unicode (based on UCS-2, i.e. the BMP).

Cheers,

- Alf


> Another workaround is to use a real file:
> 
> import sys
> f = open('afile.html', 'w', encoding='utf-8')
> print("晉", file=f)
> sys.stdout = f
> print("晉")
> 
> or slightly better is to rewrap the buffer with io.TextIOWrapper:
> import sys, io
> sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
> print("晉")



More information about the Python-list mailing list