[Python-Dev] len(chr(i)) = 2?

Greg Ewing greg.ewing at canterbury.ac.nz
Thu Nov 25 03:35:50 CET 2010


On 24/11/10 22:03, Stephen J. Turnbull wrote:
> But
> if you actually need to remember positions, or regions, to jump to
> later or to communicate to other code that manipulates them, doing
> this stuff the straightforward way (just copying the whole iterator
> object to hang on to its state) becomes expensive.

If the internal representation of a text pointer (I won't call it
an iterator because that means something else in Python) is a byte
offset or something similar, it shouldn't take up any more space
than a Python int, which is what you'd be using anyway if you
represented text positions by grapheme indexes or whatever.

If you want the text pointer to also remember which string it
points into, it'll be a bit bigger, but again, no bigger than
you would need to get the same functionality using a grapheme
index plus a reference to the original string. Probably smaller,
because it would all be encapsulated in one object.

So I don't really see what you're arguing for here. How do
*you* think positions in unicode strings should be represented?

-- 
Greg


More information about the Python-Dev mailing list