Characters contain themselves?

Atanas Banov enterr at gmail.com
Fri Apr 7 22:40:34 EDT 2006


congratulations for (ostensibly) discovering the Barber's paradox (if
the village barber shaves all and only those who don't shave
tehmselves, who shaves the barber?
http://en.wikipedia.org/wiki/Barber_paradox) in python ! :-D

as far as i see it, you complaint is not just that any string X
contains itself but that string X can contain another string Y (i.e.
object of class string to contain another of class string) - where you
understand "contain" as per the operator "in" to be set-theory
operator, when in fact the meaning put for strings is instead "has a
substring".

therefore your grudge is not just with
'a' in 'a'
but also with
'a' in 'abcd'

here is excerpt from the reference manual:
----------------------------------------
The operators in and not in test for set membership. x in s evaluates
to true if x is a member of the set s, and false otherwise. x not in s
returns the negation of x in s. The set membership test has
traditionally been bound to sequences; an object is a member of a set
if the set is a sequence and contains an element equal to that object.
However, it is possible for an object to support membership tests
without being a sequence. In particular, dictionaries support
membership testing as a nicer way of spelling key in dict; other
mapping types may follow suit.

For the list and tuple types, x in y is true if and only if there
exists an index i such that x == y[i] is true.

For the Unicode and string types, x in y is true if and only if x is a
substring of y. An equivalent test is y.find(x) != -1. Note, x and y
need not be the same type; consequently, u'ab' in 'abc' will return
True. Empty strings are always considered to be a substring of any
other string, so "" in "abc" will return True.
----------------------------------------

it is apparent "in" was overriden for strings for convenience's sake,
not to get freaky on the therory of sets.

what can you do about it? well, you can check for string type
specifically but there are no guarantees in life: someone else can
define new type with "in" that behaves like that: say "interval(x,y)",
where "interval(x,y) in interval(a,b)" checks if [x,y] is a
sub-interval of [a,b] - very intuitive - but there you have the problem
again!

or you can specifically check if the objects are from a "semanthically
supported group" of classes - but that will hamper authomatic extension
by introducing new types.

- Nas

WENDUM Denis 47.76.11 (agent) wrote:
> While testing recursive algoritms dealing with generic lists I stumbled
> on infinite loops which were triggered by the fact that (at least for my
> version of Pyton) characters contain themselves.See session:
>
>  >>> 'a' is 'a'
> True
>  >>> 'a' in 'a'
> True
>  >>> 'a' in ['a']
> True
>  >>>  ....
>
> Leading to paradoxes and loops objects which contain themselves (and
> other kinds of monsters) are killed in set theory with the Axiom of
> Foundation:=)
>
> But let's go back to more earthly matters. I couldn't find any clue in a
> python FAQ after having googled with the following "Python strings FAQ"
> about why this design choice and how to avoid falling in this trap
> without having to litter my code everywhere with tests for stringiness
> each time I process a generic list of items.
> 
> Any hints would be appreciated.




More information about the Python-list mailing list