how can i write a hello world in chinese with python

MRAB google at mrabarnett.plus.com
Wed Dec 13 17:31:42 EST 2006


Dennis Lee Bieber wrote:
> On 12 Dec 2006 23:40:41 -0800, "kernel1983" <kernel1983 at gmail.com>
> declaimed the following in gmane.comp.python.general:
>
> > and I tried unicode and utf-8
> > I tried to both use unicode&utf-8 head just like "\xEF\xBB\xBF" and not
> > to use
> >
> 	"unicode" is a term covering many sins. "utf-8" is a specification
> for encoding elements of specific unicode characters using 8-bit
> elements (I believe by using certain codes x00 to x7F alone as "normal",
> and then x80 to xFF to represent an "escape" to higher [16-bit] element
> sets).
>
> 	"\xEF\xBB\xBF" is just a byte string with no identifier of what
> encoding is in use (unless the first one or two are supposed to be
> BOM)... In the "Windows: Western" character set, it is equivalent to
> small-i-diaeresis/right-guillemot/upside-down? () In MS-DOS: Western
> Europe, those same bytes represent an
> acute-accent/double-down&left-box-drawing/solid-down&left
>
> 	I've not done any unicode work (iso-latin-1, or subset thereof, has
> done for me). I also don't know Mac's, so I don't know if the windowing
> API has specific calls for Unicode data... But you probably have to
> encode or decod that bytestring into some compatible unicode
> representation.
>
When you save a textfile as UTF-8 in Notepad.exe (Windows) it puts the
bytestring "\xEF\xBB\xBF" at the start to indicate that it's UTF-8 and
not ANSI (ie 8-bit characters). The bytes are actually the BOM
bytestring "\xFE\xFF" encoded in UTF-8.




More information about the Python-list mailing list