Flexible string representation, unicode, typography, ...

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Aug 26 16:13:21 EDT 2012


On Sun, 26 Aug 2012 09:40:13 -0600, Ian Kelly wrote:

> I think the documentation for those functions is simply badly worded.
> The "width in bytes" it returns is not the width of the rune (which as
> jmf notes is simply an alias for int32 that stores a single code point).

Is this documented somewhere?

I can't tell you how long I spent unsuccessfully googling for variations 
on "go language runes", which unsurprisingly mostly came back with pages 
about Germanic runes and elf runes but not Go runes. I read the golang 
FAQs, which mentioned Unicode *once* and runes not at all. Obviously Go 
language programmers don't care much about Unicode.


>  It means the UTF-8 width of the character, i.e. the number of UTF-8
> bytes the function "consumed", presumably so that the caller can then
> reslice the data with that many bytes fewer.

That makes sense, given the lousy string implementation and API they're 
working with.

I note that not all 32-bit ints are valid code points. I suppose I can 
see sense in having rune be a 32-bit integer value limited to those valid 
code points. (But, dammit, why not call it a code point?) But if rune is 
merely an alias for int32, why not just call it int32?


-- 
Steven



More information about the Python-list mailing list