[Python-Dev] [NPERS] Re: a feature i'd like to see in python #2: indexing of match objects

Thu Dec 7 03:49:04 CET 2006

"Michael Urman" <murman at gmail.com> wrote:
> On 12/6/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > *We* may not be confused, but it's not about us (I'm personally happy to
> > use the .group() interface); it's about relative newbies who, generally
> > speaking, desire/need consistency (see [1] for a paper showing that
> > certain kinds of inconsistancies are bad  - at least in terms of grading
> > - for new computer science students). Being inconsistant because it's
> > *easy*, is what I consider silly. We've got the brains, we've got the
> > time, if we want slicing, lets produce a match object. If we don't want
> > slicing, or if prodicing a slice would produce a semantically
> > questionable state, then lets not do it.
> 
> The idea that slicing a match object should produce a match object
> sounds like a foolish consistency to me. It's a useful invariant of
> lists that slicing them returns lists. It's not a useful invariant of
> sequences in general. This is similar to how it's a useful invariant
> that indexing a string returns a string; indexing a list generally
> does not return a list.

The string and unicode case for S[i] is special.  Such has already been
discussed ad-nauseum.  As for seq[i:j] returning an object of the same
type, if it was "foolish consistency", then why is it consistent across
literally the entire standard library (except for buffer), and (in my
experience) many 3rd party libraries?

> I only found a couple __getslice__ definitions in a quick perusal of
> stdlib. ElementTree.py's _ElementInterface class returns a slice from
> a contained list; whereas sre_parse.py's SubPattern returns another
> SubPattern. UserList and UserString also define __getslice__ but I
> don't consider them representative of the standards of non-string/list
> classes.
> 
> As an aside, if you're trying to show that inconsistencies in a
> language are bad by referencing a paper showing that people who used
> consistent (if incorrect) mental models scored better than those who
> did not, you may have to explain further; I don't see the connection.

The idea is that those who were consistant in their behavior, regardless
of whether they were incorrect, can be trained to do things the correct
way.  That is to say, people who understand that X = Y will behave
consistently regardless of context tend to do better than those who
believe that it will do different things.  Introducing inconsistencies
because it is *easy* for the writer of an API, makes it more difficult
to learn said API.

In this context, the assumption that one makes when slicing in Python 
(as stated by someone else whom I can't remember in this thread): X[0:0]
== type(X)().  That works _everywhere_ in Python where slices are
allowed (except for buffers, which are generally rarely used except by
certain crazies (like myself)). By not making it true here, we would be
adding an exception to the rule.

Special cases aren't special enough to break the rules.

I'm not going to go all gloom and doom on you; maybe no one will ever
have a situation where it is necessary. But implementing "slice of match
returns a slice" isn't impossible, whether it is done via subclass, or
by direct manipulation of the match struct.  And not implementing the
functionality because we are *lazy* isn't a terribly good excuse to give
someone if/when they run into this.

 - Josiah