[I18n-sig] Reading UTF-16 Scripts

M.-A. Lemburg mal@lemburg.com
Tue, 11 Apr 2000 14:14:00 +0200


Doug Edmunds wrote:
> 
> python ver: 1.6a
> os: Win98
> 
> Are there any plans to allow
> python to be able to read scripts
> written entirely in UTF-16 format
> (such as those written by
> Win98's Wordpad program and saved
> as unicode text?)
> 
> Since each of these files begin
> with 'FFEE' it would seem to be
> not too difficult for python
> to recognize that format and convert
> the non-string context to 8bit, i.e.,
> p r i n t -> print.

As I understand, Python scripts are supposed to
be ASCII (or maybe UTF-8).

Your proposal would only work if *all* strings were
Unicode in Python. There currently are two types:
one for 8-bit strings and the 16-bit Unicode one.

> The advantage is that mixed language
> scripts (i.e English/Russian) can
> be written and saved unambiguously,
> not dependent upon selection
> of a particular 'font script' such as
> cp1251 or KOI8-r for Russian.
> 
> The motivation for getting away from
> these scripts (encodings, whatever)
> is to be able to write multiple languages
> in a single string.
> 
> This kind of scripting could be avoided:
> a = unicode ('Правда - газета', 'cp1251')
> print a.encode('cp1251')
> 
> and replaced with a simpler:
> print "In Russian, newspaper is ____; in Polish it is ______"
> 
> Notes:
> 1. Cyrillic fonts do not appear in IDLE (US English is base).
> 2. In PythonWin, even with a Cyrillic 'script' selected,
>    such as Courier New (Cyrillic), output appears in English
>    -- the 'script' aspect is being ignored.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/