Flexible string representation, unicode, typography, ...

Ian Kelly ian.g.kelly at gmail.com
Sun Aug 26 11:40:13 EDT 2012


On Sun, Aug 26, 2012 at 5:49 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
>> Sorry, you do not get it.
>>
>> The rune is an alias for int32. A sequence of runes is a sequence of
>> int32's.
>
> It certainly is not. Runes are variable-width. Here, for example, are a
> number of Go functions which return a single rune and its width in bytes:
>
> http://golang.org/pkg/unicode/utf8/

I think the documentation for those functions is simply badly worded.
The "width in bytes" it returns is not the width of the rune (which as
jmf notes is simply an alias for int32 that stores a single code
point).  It means the UTF-8 width of the character, i.e. the number of
UTF-8 bytes the function "consumed", presumably so that the caller can
then reslice the data with that many bytes fewer.



More information about the Python-list mailing list