[Python-Dev] Unicode Proposal: Version 0.4
M.-A. Lemburg
mal@lemburg.com
Fri, 12 Nov 1999 13:38:44 +0100
I've uploaded a new version of the proposal which incorporates
a lot of what has been discussed on the list.
Thanks to everybody who helped so far. Note that I have extended
the list of references for those who want to join in, but are
in need of more background information.
The latest version of the proposal is available at:
http://starship.skyport.net/~lemburg/unicode-proposal.txt
Older versions are available as:
http://starship.skyport.net/~lemburg/unicode-proposal-X.X.txt
Some POD (points of discussion) that are still open:
· support for line breaks (see
http://www.unicode.org/unicode/reports/tr13/ )
· support for case conversion:
Problems: string lengths can change due to multiple
characters being mapped to a single new one, capital letters
starting a word can be different than ones occurring in the
middle, there are locale dependent deviations from the standard
mappings.
· support for numbers, digits, whitespace, etc.
· support (or no support) for private code point areas
· should Unicode objects support %-formatting ?
One possibility would be to emulate this via strings and
<default encoding>:
s = '%s %i abcäöü' # a Latin-1 encoded string
t = (u,3)
# Convert Latin-1 s to a <default encoding> string
s1 = unicode(s,'latin-1').encode()
# The '%s' will now add u in <default encoding>
s2 = s1 % t
# Finally, convert the <default encoding> encoded string to Unicode
u1 = unicode(s2)
· specifying file wrappers:
Open issues: what to do with Python strings
fed to the .write() method (may need to know the encoding of the
strings) and when/if to return Python strings through the .read()
method.
Perhaps we need more than one type of wrapper here.
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 49 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/