"in" operator for strings

Cliff Crawford cjc26 at nospam.cornell.edu
Thu Feb 1 09:57:52 EST 2001


* Alex Martelli <aleaxit at yahoo.com> menulis:
| "Magnus Lie Hetland" <mlh at idi.ntnu.no> wrote in message
| news:95beco$e0u$1 at tyfon.itea.ntnu.no...
|     [snip]
| > This isn't quite logical... A string works like a sequence
| > of characters, and sequence membership only works on
| > single elements (in this case characters), not subsequences
| > (in this case, substrings).
| 
| Right, and an extension of this is basically what's being
| asked for (though the original poster may not have thought
| of this 'obvious' generalization, specialcasing string would
| surely not be warranted).

Strings seem to be special-cased already; for example, you don't have to
type

>>> print ''.join(['H','e','l','l','o',',',' ','w','o','r','l','d','!'])

if you don't want to <wink>.

Anyway, I think having 'in' look for substrings, when used with strings,
is more useful than its current behavior.  Having 'in' look for a single
element works well with lists and tuples, but when using strings I think
it's more common to search for a substring than to search for a single
character.


| Unfortunately, for general cases
| it doesn't scale well -- i.e., now:
| 
| >>> print [1,2] in [6, 4, [1,2], 7]
| 1
| >>> print [6,4] in [6, 4, [1,2], 7]
| 0
| 
| and having it return 1 in the second case too would be making
| this 'in' very ambiguous and confusing, alas.

Right, but we will never run into this problem with strings, because its
elements are always single characters.  So there wouldn't be any
ambiguity in extending the meaning of 'in' in this case.


| Also, of course, this would throw any parallel between
| "x in y" and "for x in y" out of the windows unless the
| latter starts looping on all *subsequences* -- eeep!-)

Again, I don't think this would be a problem if substring 'in' only
worked with strings and not all sequences, because I don't think
"for x in y", where y is a string, is a common construction.  In fact,
whenever this pops up in my code, it usually means there's a bug
somewhere (i.e. I forgot to do y.split() or something like that).
"for x in str" can be useful sometimes, but I think most of the time
string methods (or functions from the string module, if you're
old-fashioned ;) and indexing are used instead.


| [snip]
| 
| For general substring-matching, a class wrapper is not
| too bad:
| 
| class subsOf:
|     def __init__(self, seq):
|         self.seq = seq
|     def __contains__(self, subseq):
|         return self.seq.find(subseq) != -1
| 
| this only works for strings, as written, AND only to
| enable such idioms as
| 
|     if 'ald' in subsOf("Waldo"):
|         print 'yep!'
| 
| but it's not too hard to generalize it to any sequence
| type (at least if you're content to use elementary
| algorithms for __contains__!-), implement __getitem__
| to allow looping on all subsequences:-), etc.

This is probably the best solution, since I'm sure no one will go for
yet another change to the language <0.3wink>.


-- 
Cliff Crawford               http://www.people.cornell.edu/pages/cjc26/
                             print "Just another Python hacker"



More information about the Python-list mailing list