Why can't I xor strings?

Sun Oct 10 22:19:03 EDT 2004

On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <jerf at jerf.org> wrote:

>On Sun, 10 Oct 2004 22:03:08 +0000, Bengt Richter wrote:
>> What's right about accepting 3^7 ? Why should xor be defined for integers?
>> IMO we have implicit subtyping of integers as vectors or column matrices
>> of bools and operations element by element and implicit re-presentation
>> of the result as integer.
>
>I'm intrigued but torn by your arguments.
>
>For positive numbers, a number really is its vector of bits. In the
              ^^^^^[1]              [2]^^ ^^^[3]        ^^^^[4]
[1] Abstract, non-negative integers?
[2] "is" as in "is identical with"? I doubt it.
[3] In what sense does an abstract number posess (its) a vector of bits?
    I take it 'its' is in the sense of a 1:1 mapping to the corresponding
    (unique) vector of "bits".
[4] A two's complement representation of an (abstract) integer, has "bits"
    as _numerical_ values in the sum series with corresponding numerical
    coefficients that are powers of two. But "bits" in the context of
    bitwise ^ or & or | do not represent abstract _numerical_ values or 0 or 1,
    they represent logical values False or True (and 0<->False, 1<->True is
    just a convention that we're used to.

>mathematical sense of "equal" (which I usually express in English as "two
>equal things are fully substitutable with each other in all relevant
>contexts"), where all base numbers are written in base 10:
>
>10 base 10 = 11 base 9 = 22 base 4 = 1010 base 2
>
>The fact that it happens to really be a bit vector in the computer in this
            [1]^^         [2]^^^^^^^^^   ^^^[3]
[1] The abstract integer value?
[2] have 1:1 mapping to?
[3] bit is short for binary _digit_, not binary logic-value -- i.e., it's numeric,
    and ties in with your base 2 representation.
>case actually doesn't count for anything, because just as that bit vector
>*is*, in every conceivable mathematical way, a base 10 number, it is also
1:1 correspondence is not identity. IOW, there's no such thing as "a base 10 number"
There is a set of abstract integers, and each can be put into 1:1 correpondence with
a set of symbols and rules. A "base 10 number" is composed with a set of digits [0-9]
and a rule for ordered composition and a rule for an interpretation to map back to
the abstract value. We can discuss each part of this process, but the bottom line is
that when we say that represented values are equal, we are saying that representations
map back to the same unique abstract value, not that representations per se are identical.

>a base 3 number. I can and have defined ternary math before, and
>experimental computers have been built on ternary bases and ternary logic,
>and 44 in a binary computer is 44 in a ternary computer.
Sure, but again we are talking unique abstract integers vs various representations
and the rules for composing them and interpreting them. BTW, did you ever hear
of bi-quinary? That was a 4-bit representation of a decimal digit where the msb
had a "coefficient" of 5 and the 3 lsb's counted in binary modulo 5 ;-)

>
>Only our choice of negative numbers, practically speaking, makes a
>difference between the number-as-bitstring and number-qua-number, and
>something as high level as Python can conceptually use an extra "negative"
>bit so even that distinction goes away. In fact, playing with xor'ing
>longs makes me wonder if Python doesn't *already* do that.
Yes, it plays the as-if-extended-infinitely-to-the-left game with sign bits,
but the hex representation of negative longs is not much good for seeing what
the bits are in a negative number >;-(
>
>>     What is boolvec(388488839405842L) ^ boolvec("hello")?
>
>I'd rather see this as the fairly meaningless "388488839405842L base 2",
>meaningless because the base of a number does not affect its value, and
>bitwise-xor, as the name implies, operates on the number base two.
Well, to be nit-picky, I still think it does not really operate on a number
at all. It converts a number to an ordered set of bools, by way of an intermediate
two's complement _representation_ whose _numeric_ bit values are mapped to bool values.
Then, after the operating with the ordered sets of bools, the resulting set is mapped
back to a two's complement integer representation again, which can be interpreted as
an abstract integer value again ;-)

>
>Since I reject the need to cast ^ in terms of boolvec, I don't feel
>compelled to try to define "boolvec" for strings. Cast it into a number
>explicitly, refusing the temptation to guess.
>
>> Except where there is an accepted legacy of such things being done already ;-)
>> 
>> BTW, what is the rationale behind this:
>> 
>>  >>> ['',(),[],0, 0.0, 0L].count(0)
>>  3
>>  >>> ['',(),[],0, 0.0, 0L].count(())
>>  1
>>  >>> ['',(),[],0, 0.0, 0L].count([])
>>  1
>>  >>> ['',(),[],0, 0.0, 0L].count('')
>>  1
>>  >>> ['',(),[],0, 0.0, 0L].count(0.0)
>>  3
>>  >>> ['',(),[],0, 0.0, 0L].count(0L)
>>  3
>
I guess this shows what's happening

 >>> class joker(object):
 ...     def __cmp__(self, other): return 0
 ...
 >>> ['',(),[],0, 0.0, 0L, joker()].count('')
 2
 >>> ['',(),[],0, 0.0, 0L, joker()].count(())
 2
 >>> ['',(),[],0, 0.0, 0L, joker()].count(0)
 4
 >>> ['',(),[],0, 0.0, 0L, joker()].count(0.0)
 4
 >>> ['',(),[],0, 0.0, 0L, joker()].count(0L)
 4
 >>> ['',(),[],0, 0.0, 0L, joker()].count(joker())
 7

>Again, considered abstractly as numbers, 0 is zero no matter how you slice
>it. Abstractly, even floats have a binary representation, just as they
I think of "float" as indicating representation technique, as opposed to "real"
which I think of as a point in the continuous abstract -+infinity interval
of real numbers.

>have a decimal representation, and we could even xor them. Realistically,
>it is much less useful and there is the infinite-length problem of floats
>to deal with.
Infinite-length problem...must...not...go...there... ;-)
>
>Theoretically, no float should ever equal an int or a long, since an Int
>or Long conceptually identifies exactly one point and a float should be
>seen as representing a range of numbers depending on the precision that
IMO a float should be seen as what it is (a _representation_), which may be
interpreted variously, just like integers ;-)

IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
possible states of the representation machinery. Not all of
those states are legal floating point number representations, but each one
that _is_ legal corresponds to an _exact_ abstract real value. It _can_ be
the exact value you wanted to represent, or not, depending on your programming
problem. Inexactness in not a property of floating point representations, it is
a property of their _relation_ to the exact abstract values the programmer is trying
to represent. 0.0 represents the exact same abstract value as integer 0
(perhaps by way of mapping integers conceived as separate to a subset of reals).

>can be made arbitrarily small but never truly identifies one point. This
I disagree. 0.0 truly identifies one point. So does

    3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation.
But this is _not_ the exact value of the abstract mathematical pi.

But that is not a problem with floating point numbers' not representing
exact values. It's that the set of available exact values is finite, and
you have to choose one to approximate (including right on sometimes) an abstract value.
(Or you might choose two to represent _exact_ bounds on a value, for interval math).

>is another one of those "practicality beats purity" things.
I think the fact that repr(math.pi) is not printed as above is
a case of "practicality beats purity" ;-)

 >>> import math
 >>> repr(math.pi)
 '3.1415926535897931'
 >>> float(repr(math.pi))
 3.1415926535897931
 >>> float(repr(math.pi)) == float(3.141592653589793115997963468544185161590576171875)
 True

IMO floats have their own kind of purity, besides being practical ;-)

Regards,
Bengt Richter