[Python-Dev] Divorcing str and unicode (no more implicit conversions).

Neil Hodgson nyamatongwe at gmail.com
Tue Oct 25 01:13:51 CEST 2005


M.-A. Lemburg:

> Unicode has the concept of combining code points, e.g. you can
> store an "é" (e with a accent) as "e" + "'". Now if you slice
> off the accent, you'll break the character that you encoded
> using combining code points.
> ...
>     next_<indextype>(u, index) -> integer
>
>         Returns the Unicode object index for the start of the next
>         <indextype> found after u[index] or -1 in case no next element
>         of this type exists.

   Should entity breakage be further discouraged by returning a slice
here rather than an object index?

   Something like:

i = first_grapheme(u)
x = 0
while x < width and u[i] != "\n":
   x, _ = draw(u[i], (x, y))
   i = next_grapheme(u, i)

   Neil


More information about the Python-Dev mailing list