PEP 262: Unicode Indexing Helper Module

M.-A. Lemburg mal at lemburg.com
Fri Jul 13 09:44:55 EDT 2001


> Paul Moore (in privte mail):
>
> You have methods for finding
> the start and end of various <indextypes>, but you don't have a method for
> finding the length of an <indextype>. In the case of words (which is the one
> I understand :-), the length of a word is not the same as the difference
> between the starts of consecutive words - the intervening whitespace should
> be excluded (at least for some applications). I would suggest
> 
> length_<indextype>(u, index) -> integer
> Returns the length in Unicode objects of the <indextype> found at u[index]
> or -1 in case u[index] is not in an element of this type (for example, in
> the whitespace between words). [XXX Should this be the number of Unicode
> objects between index and the end of the element, or should it be the length
> from start to end even if you are in the middle?]
> 
> or maybe better
> 
> nextend_<indextype>(u, index) -> integer
> Returns the Unicode object index for the end of the next <indextype> found
> after u[index] or -1 in case no next element of this type exists.
> 
> [But that runs into issues when you are in a word - If index is not the
> first Unicode object, nextend is the end of *this* element, whereas next is
> the start of the *next* element. I think I'm starting to show my
> ignorance...]
> 
> Even though I suspect my suggested methods are too simplistic, I'd suggest
> at least a comment in the PEP on how to work out the length of the element
> you're in (or why it's hard, and you'd never want to do it :-)...

The two suggested APIs probe into the Unicode object. I think it would
be more useful to return the slice (as slice object) which represents
the <indextype> element found at the given index in u, e.g.

<indextype>_slice(u, index) -> slice object or None

    Returns the slice pointing to the <indextype> element found in 
    u at the given index or None in case no such element can be found
    at that position.

Hmm, I wonder whether slice objects can be "applied" to sequences
somehow...

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/
-------------- next part --------------
An embedded message was scrubbed...
From: "Moore, Paul" <Paul.Moore at atosorigin.com>
Subject: PEP 262: Unicode Indexing Helper Module
Date: Fri, 13 Jul 2001 13:26:52 +0100
Size: 3184
URL: <http://mail.python.org/pipermail/python-list/attachments/20010713/2d065ec4/attachment.mht>


More information about the Python-list mailing list