[issue4153] Unicode HOWTO up to date?

Ezio Melotti report at bugs.python.org
Thu Sep 1 10:04:12 CEST 2011


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

After the recent discussions on python-dev I went through the Unicode howto and fixed a few things, then I found this issue so I'm attaching the patch here.
The patch addresses mostly markup issues, but it also removes the usage of 'byte string'.
A few more things that should be done:
  * clarify some more terms (e.g. codepoints, code units, characters, possibly scalar values etc.);
  * mention the differences between narrow and wide builds, including:
    - a discussion about the UCS-2/UTF-16 implementation of narrow builds;
    - something about surrogates and surrogate pairs;
    - effects of slicing and indexing on narrow builds;
    - functions/methods that (don't) accept non-BMP chars on narrow builds;
  * something about Unicode supports in the re module (this probably can wait after the 'regex' inclusion).

Also the codecs doc has a section about Unicode and encodings that might be moved to the howto.

----------
assignee: georg.brandl -> 
resolution: fixed -> 
stage:  -> commit review
versions: +Python 3.3
Added file: http://bugs.python.org/file23081/issue4153-2.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4153>
_______________________________________


More information about the Python-bugs-list mailing list