Why can't I xor strings?

Bengt Richter bokr at oz.net
Sun Oct 10 18:03:08 EDT 2004


On Sun, 10 Oct 2004 19:25:16 GMT, Jeremy Bowers <jerf at jerf.org> wrote:

>On Sun, 10 Oct 2004 15:30:32 +0000, Grant Edwards wrote:
>> While I agree with your points, they're immaterial to the
>> argument I was making.  The poster to which I responded was
>> arguing that "xor" didn't make sense because having it coerce
>> it's arguments to booleans was "wrong".
>
>I didn't say "wrong", I said "non-obvious".
>
>And, with respect, the fact that you have to argue your point is evidence
>in my favor. Of course "proof" would require a comprehensive survey, and I
>think we can all agree this point isn't worth such effort :-)
>
>But I do think that a bitwise operator should silently transform to a
>logical operator for strings is not obvious.
>
>What is 388488839405842L ^ "hello"?
>
>Python says that's an "unsupported operand type(s) for ^: 'long' and
>'str'". Why is it wrong?
I agree it is not wrong. The user is reminded that s/he should be explicit
and create a type that behaves as desired. However (;-) ...

What's right about accepting 3^7 ? Why should xor be defined for integers?
IMO we have implicit subtyping of integers as vectors or column matrices
of bools and operations element by element and implicit re-presentation
of the result as integer.

This dual interpretation of course reflects CPU hardware functionality, and
I would argue that familiarity (not to mention the expediency of conciseness)
has bred acceptance of operator notation without explicit cruft such as
int(boolvec(3)^boolvec(7)) instead of 3^7. The trouble is that a boolvec
should have a length, and boolvec(3) really hides implicit determination of length
(by dropping sign bits of unsigned value or extending signed integer sign bits infinitely),
and the ^ operation between boolvecs of different length hides implicit normalization to
the length of the longer (or infinity with compressed representation)). Again,
CPU hardware legacy comes into play, with the hidden imposition of length=32, sometimes
64 now, and we are forced (happily IMO) into defining what we mean in terms of
physical-representation-independent abstractions.

So the question becomes

    What is boolvec(388488839405842L) ^ boolvec("hello")?

and boolvec("hello") is the more ambiguous one. If we wrote

    boolvec("hello", as_bytes=True)

I think most would have an idea of what it meant -- until they
started to think about endianness ;-) I.e., what is the value of the following?

    (boolvec("hello", asbytes=True) & boolvec(0xffff)).as_string() #note '&' for simpler example

should it be

    "he\x00\x00\x00"

or do you see it as big-endian

    "\x00\x00\x00lo"

and should "sign" bits be dropped, so the result would be "he" or "lo" ?
Or -- should boolvec("hello") default to boolvec(bool("hello"), length=1) ?

>
>This would also work if Python were more like C++ and we could define
>
>xor(string, string)
>xor(int, int) 
>
>and be done with it, but in Python, there should be an obvious meaning for
>int ^ string, and there isn't.

Notice that no one complains about intgr^intgr
not being defined as int(bool(intgr)^bool(intgr)) ?;-)

>
>It is also true that I recommended the OP consider subclassing string to
>make ^ do what he wants. But it seems to be reasonably well expected that
>while user classes can do what they like with operators (as long as they
>are willing to pay the sometimes-subtle prices, especially the ones
>involved with poorly defining __cmp__), that the core language should not
>do such things.
Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:

 >>> ['',(),[],0, 0.0, 0L].count(0)
 3
 >>> ['',(),[],0, 0.0, 0L].count(())
 1
 >>> ['',(),[],0, 0.0, 0L].count([])
 1
 >>> ['',(),[],0, 0.0, 0L].count('')
 1
 >>> ['',(),[],0, 0.0, 0L].count(0.0)
 3
 >>> ['',(),[],0, 0.0, 0L].count(0L)
 3

BTW2, just had the thought that ',' could be generalized as an operator. E.g.,

    obj, x

would mean

    type(obj).__dict__['__comma__'](obj, x)

unless __comma__ was undefined. And you could have __rcomma__ for x, obj.
Returning the same type object would support multiple commas as in obj, x, y

Of course, you would get interesting effects in such contexts as print obj,x,y
and foo(obj,x,y) or even (obj,x,y) vs (obj,x,y,) vs ((obj,x,y),) ;-)

Probably sequence-building should have priority and be overridden by parenthesized
expression. So print obj,x would be effectively be print str(obj),x and
print (obj,x,y) would be print str(type(obj).__dict__['__comma__'](obj,x)).
Similarly for arg list formation.

Regards,
Bengt Richter



More information about the Python-list mailing list