"in" operator for strings

Fri Feb 2 08:02:14 EST 2001

"Cliff Crawford" <cjc26 at nospam.cornell.edu> wrote in message
news:QPee6.1195$o91.132115 at typhoon.nyroc.rr.com...
> * Alex Martelli <aleaxit at yahoo.com> menulis:
    [snip]
> | of this 'obvious' generalization, specialcasing string would
> | surely not be warranted).
>
> Strings seem to be special-cased already; for example, you don't have to
> type
>
> >>> print ''.join(['H','e','l','l','o',',',' ','w','o','r','l','d','!'])
>
> if you don't want to <wink>.

Literals have different syntax sugar for various kinds of sequences
(tuples, lists, strings, Unicode strings -- that's it, others, such
as array.array's, are built without any peculiar syntax sugar).  That
is vastly different from having one operator, with identical syntax,
get completely different semantics for one sequence type (or two?)
versus all others; THAT one is a recipe for disaster & confusion.

> Anyway, I think having 'in' look for substrings, when used with strings,
> is more useful than its current behavior.

And maybe, since hardly anybody ever does modulo operations on
complex numbers, having '%' mean something utterly different
for them wrt any other kind of number would be "more useful" --
and somebody might argue that it's OK, because complex numbers
"seem to be special-cased already" (horribly confusing issues
of syntax sugar used for literals, with issues of uniformity
of operator semantics over various kinds of numbers).

And wouldn't it be "more useful" to have "foo"() look for a
function NAMED foo and call it, rather than complain about a
"call of non-function"?

Etc, etc.  This way Per^H^H^H madness lies.  Building a language
out of a collection of "more useful" (the usual euphemism for
such unspeakable horrors is "convenient") irregularities, ad
hoc "solutions", special cases, and second guessing of coder's
intent... ***thanks, but NO, THANKS***.  Python's *regularity and
simplicity* are its key strengths.

Python might perfectly well have decided that strings do not
expose a sequence interface, but rather are 'atoms' (one would
then explicitly call some suitable method, returning a list or
tuple of the string's characters, in those relatively rare
occasions where treating this as a sequence is needed).  That
would have some pluses, and some minuses, breaking roughly
even overall IMHO.  But if strings are sequences, and I see
little prospect for that to change (short of the mythical
"Python 3000", and I don't think such a change will in fact
happen even then), then having the 'in' operator mean
something utterly different for them, than it does for all
other sequences, would be nothing short of a disaster (I'd
_almost_ rule this out as a possibility, except that we have
one historical precedent -- if somebody in the core dev't
team feels strongly about it and does a patch, and the BDFL
unexpectedly agrees, then a horrid, unmitigated wart CAN be
added to Python; it's happened once, and now the team spirit
of the core developers turns them into apologists for the
wart -- oh well, on past performance we can expect about one
such occurrence per decade).

 for relatively rare
occasions where one wants the list or tuple of

> Again, I don't think this would be a problem if substring 'in' only
> worked with strings and not all sequences, because I don't think
> "for x in y", where y is a string, is a common construction.  In fact,

It's quite frequent in my code (mostly with strings built by
struct, or read from binary files, etc, and representing "an
array of bytes" rather than "a ``veritable'' string" in some
sense).  If strings were atomic, I would of course turn them
into buffers for such purposes (and buffers should probably
be used in many places where now strings are).  As strings are
sequences, then I expect them to behave just like any other
(non-mutable) sequence, and, in particular, to have a loose
parallel between 'if x in s' and 'for x in s'.  E.g., for
any sequences a and b of the same type:

def inboth1(a, b):
    return [x for x in a if x in b]

def inboth2(a, b):
    return [x for x in b if x in a]

switching the roles of a and b may affect the ordering and
repetitions in the result sequence, but NOT the membership
thereof -- a handy and useful axiom (which allows certain
refactoring operations to be applied without needing much
global context to check their correctness - one of the
greatest boons of such axioms, of course).

> This is probably the best solution, since I'm sure no one will go for
> yet another change to the language <0.3wink>.

The language keeps being changed (and there are grumblings
whenever that happens, of course).  A change to make strings
not sequences would probably be too pervasive to consider,
breaking zillions of lines of working code.  A change to
break the uniformity in the behavior of sequences of various
types is unfortunately more conceivable, but it would be an
very very bad idea if it ever happened.

Alex