'<char> in <string>' works, why doesnt '<string> in <string>'?

Magnus Lie Hetland mlh at vier.idi.ntnu.no
Mon Mar 11 22:44:13 EST 2002


In article <4abd9ce7.0203101059.444f1ec1 at posting.google.com>, damien
morton wrote:
[snip]
>> No.  That is NOT a good reason to change a language in an inconsistent way.
>
>The 'in' operator is now a verb that can be defined to mean pretty
>much any function with two arguments.

Well, yes, it could be made to print out both its arguments, for
instance, but why would that be a good idea? It is (in the main
language and standard libraries) as a container membership operator.
Using it as something else is possible, but seems like a strange thing
to do -- and to suggest such a change in parts of the language itself
is even stranger.

>It cant be a set membership operator because python doesnt have sets.

Yet. There is a PEP on the way, and the membership operator will
certainly function in the expected way there as well.

> The issue at stake is that its meaning should be well understood.

Yes.

> Strings are like sequences in that their individual elements
> (characters ONLY) can be accessed as a sequence, but there are
> important differences in the kinds of things people want to do with
> strings.

What do you mean? In Python (and all programming languages I can think
of at the moment) strings are sequences. I think "a sequence of
characters" is a reasonable model of a string (since the word "string"
means "(1) a series of things arranged in or as if in a line, (2) a
sequence of like items"). It seems that you are suggesting that
strings are to be modelled as "a set of substrings" or something.

>From the persepective of an english speaker, the the string 'fox' is
>IN the string 'the quick brown fox'.

No it isn't. It's in the *phrase* 'the quick brown fox', which isn't a
sequence of characters, but a sequence of *words*. What you're looking
for is

  'fox' in 'the quick brown fox'.split()

Or, if you insist that 'ox' is IN the string 'fox' (with 'in' meaning
'is a part of'), wouldn't you also say that the sequence 3, 4, 5 can
be found IN the sequence 1, 2, 3, 4, 5, 6? But would you say that this
interpretation of the word 'in' should be used in Python? That would
make things very ambiguous:

>>> (3, 4, 5) in (1, 2, 3, 4, 5, 6)
1
>>> (3, 4, 5) in (1, 2, (3, 4, 5), 6)
1

> Under the covers, the IN operator is defined as the __contains__
> function.

Method, actually.

> Clearly, the string 'the quick brown fox' CONTAINS the string 'fox'.

As a substring, yes. As an element, no. The string 'fox' cannot be
said to be a member of the string 'the quick brown fox'.

> I think the verb IN means something slightly different for strings
> than it does for sequences and dicts.

What about sets? A set certainly CONTAINS a subset -- do you think
the membership operator should be used as a subset operator in the
upcoming set type? In that case, what should be used as a membership
operator? Or perhaps you think it should be used as both, making
things ambiguous?

> Check this out:
>
> ('fox' in 'the quick brown fox')
> ('the quick brown fox'.find('fox') != -1)
>
> Which of these expressions is clearer, more intuitive and
> convenient?

Are you saying that making one sequence behaving unlike all others is
clear and intuitive? It may be 'convenient', but that's a common and
dangerous trap.

Just to poke another hole in your suggestion -- what would the
following code produce?

  for x in 'the quick brown fox':
      print x

If you want the members of the string to be words, it should produce

the
quick
brown
fox

If you want to include all substrings, it should produce

<empty line>
t
th
the
...
the quick brown fox
h
he
he q
he qu

Etc.

And how would you number the elements? If s is the string above,
should s[0] be 'the' or ''? Or perhaps it should be 't', making the
proposal thoroughly self-inconsistent?

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org



More information about the Python-list mailing list