[Python-Dev] Remove str.find in 3.0?

Sun Aug 28 14:05:54 CEST 2005

[Guido]
> Another observation: despite the derogatory remarks about regular
> expressions, they have one thing going for them: they provide a higher
> level of abstraction for string parsing, which this is all about.
> (They are higher level in that you don't have to be counting
> characters, which is about the lowest-level activity in programming --
> only counting bytes is lower!)
> 
> Maybe if we had a *good* way of specifying string parsing we wouldn't
> be needing to call find() or index() so much at all! (A good example
> is the code that Raymond lifted from ConfigParser: a semicolon
> preceded by whitespace starts a comment, other semicolons don't.
> Surely there ought to be a better way to write that.)

A higher level abstraction is surely the way to go.

I looked over the use cases for find and index.  As from cases which are
now covered by the "in" operator, it looks like you almost always want
the index to support a subsequent partition of the string.

That suggests that we need a variant of split() that has been customized
for typical find/index use cases.  Perhaps introduce a new pair of
methods, partition() and rpartition() which work like this:

    >>> s = 'http://www.python.org'
    >>> s.partition('://')
    ('http', '://', 'www.python.org')
    >>> s.rpartition('.')
    ('http://www.python', '.', 'org')
    >>> s.partition('?')
    (''http://www.python.org', '', '')

The idea is still preliminary and I have only applied it to a handful of
the library's find() and index() examples.  Here are some of the design
considerations:

* The function always succeeds unless the separator argument is not a
string type or is an empty string.  So, a typical call doesn't have to
be wrapped in a try-suite for normal usage.

* The split invariant is:   s == ''.join(s.partition(t))

* The result of the partition is always a three element tuple.  This
allows the results to be unpacked directly:

   head, sep, tail = s.partition(t)

* The use cases for find() indicates a need to both test for the
presence of the split element and to then to make a slice at that point.
If we used a contains test for the first step, we could end-up having to
search the string twice (once for detection and once for splitting).
However, by providing the middle element of the result tuple, we can
determine found or not-found without an additional search.  Accordingly,
the middle element has a nice Boolean interpretation with '' for
not-found and a non-empty string meaning found.  Given
(a,b,c)=s.partition(p), the following invariant holds:

   b == '' or b is p

* Returning the left, center, and right portions of the split supports a
simple programming pattern for repeated partitions:

   while s:
       head, part, s = s.partition(t)
       . . .

Of course, if this idea survives the day, then I'll meet my own
requirements and write a context diff on the standard library.  That
ought to give a good indication of how well the new methods meet
existing needs and whether the resulting code is better, cleaner,
clearer, faster, etc.

Raymond