Why can't I xor strings?

Jeremy Bowers jerf at jerf.org
Mon Oct 11 00:08:49 EDT 2004


On Mon, 11 Oct 2004 02:19:03 +0000, Bengt Richter wrote:

> On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <jerf at jerf.org> wrote:
>>For positive numbers, a number really is its vector of bits. In the
>               ^^^^^[1]              [2]^^ ^^^[3]        ^^^^[4]
> [1] Abstract, non-negative integers?

Integers is more convenient here, but any one dimensional number will do.
It so happens that computers do not encode "reals" in the naive way, to
allow for more variation in magnitude.

> [2] "is" as in "is identical with"? I doubt it.

With respect, this isn't something to doubt or not doubt. There is one,
and only one, way to represent any positive number in base two, since
encoding sign is not an issue. Assuming an extra bit to show sign, there
is one and only one way to represent any negative number, too. (Zero gets
to be the exception since then you can have positive and negative zero,
which are theoretically identical, but on some machines that implement
them have practical differences.)

Endianness is just some wackiness in the ordering, but does not really
change that. 60000 + 50000 = 110000 no matter what endianess you have.

> [3] In what sense does an abstract number posess (its) a vector of bits?

It does not. It *is* a series of ones and zeros, in base two. It *is* a
series of ones, zeros, and twos, in base three. It *is* a series of [0-9]
in base ten. It is all of these things at once, and an infinite number of
other things besides.

1 + 1 fully *is* 2. There is no difference mathematically. If there is,
your math is broken.

> [4] A two's complement representation of an (abstract) integer, has
> "bits"
>     as _numerical_ values in the sum series with corresponding numerical
>     coefficients that are powers of two. But "bits" in the context of
>     bitwise ^ or & or | do not represent abstract _numerical_ values or
>     0 or 1, they represent logical values False or True (and 0<->False,
>     1<->True is just a convention that we're used to.

No. They represent both of those things at once. 1 *is* True, and 0 *is*
false, in the domain of bitwise operators. 

There is no "either/or" here, only "both/and".

> IOW, there's no such thing as "a base 10 number"...  A "base 10 number" is composed with a set of
> digits [0-9] and a rule for ordered composition and a rule for an
> interpretation to map back to the abstract value. 

Well, I thank you for supplying the missing definition, I suppose.

> We can discuss each
> part of this process, but the bottom line is that when we say that
> represented values are equal, we are saying that representations map
> back to the same unique abstract value, not that representations per se
> are identical.

Yep, that is part of my point, see below.

>>>     What is boolvec(388488839405842L) ^ boolvec("hello")?
>>
>>I'd rather see this as the fairly meaningless "388488839405842L base 2",
>>meaningless because the base of a number does not affect its value, and
>>bitwise-xor, as the name implies, operates on the number base two.
> Well, to be nit-picky, I still think it does not really operate on a
> number at all. It converts a number to an ordered set of bools, by way
> of an intermediate two's complement _representation_ whose _numeric_ bit
> values are mapped to bool values. Then, after the operating with the
> ordered sets of bools, the resulting set is mapped back to a two's
> complement integer representation again, which can be interpreted as an
> abstract integer value again ;-)

In the domain of binary logic, I do not concede a difference between "1"
and "True", on the grounds that there is no way in the domain of binary
logic to distinguish the two.

> IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
> possible states of the representation machinery. Not all of those states
> are legal floating point number representations, but each one that _is_
> legal corresponds to an _exact_ abstract real value. It _can_ be the
> exact value you wanted to represent, or not, depending on your
> programming problem. Inexactness in not a property of floating point
> representations, it is a property of their _relation_ to the exact
> abstract values the programmer is trying to represent. 0.0 represents
> the exact same abstract value as integer 0 (perhaps by way of mapping
> integers conceived as separate to a subset of reals).

This is a matter of definition. My problem with your definition mostly
lies in the fact that it is not as practically useful as mine. Someone
operating on the "floats are almost, but not quite, exact values" is
*much* less well equipped to handle the way floats bite you than someone
who operates with "floats represent a value and have an error range around
that value", because then they should understand they need to compute not
just with the values, but track the buildup of that error range as well.
It's a point-of-view thing.

>>can be made arbitrarily small but never truly identifies one point. This
> I disagree. 0.0 truly identifies one point. So does
> 
>     3.141592653589793115997963468544185161590576171875
> 
> which is the _exact_ value of math.pi in decimal representation. But
> this is _not_ the exact value of the abstract mathematical pi.

Python 2.3.4 (#1, Sep 28 2004, 20:15:25) 
[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>  3.1415926535897931159979634685441851615905761718758327502398752309 == 3.14159265358979311599796346854418516159057617187500000000000
True

Look at the last digits. (Python is just a convenience here, this is
fundamental to all float representation schemes, as opposed to things like
Decimal.)

This is why I say they represent a range. In set theory, we'd call this an
equivalence class for equality. There is *no* way to distinguish between 
3.1415926535897931159979634685441851615905761718758327502398752309 and
3.14159265358979311599796346854418516159057617187500000000000 with floats
(of the size Python uses, add more digits for some other bit count), not
even in theory. They are the same in every way. That's because the closest
float represents a range that covers them both. You can shrink that range
arbitrarily small but you can not get to a unique point, such that only
one true real will compare equal to that number and all others do not, not
even in theory.



More information about the Python-list mailing list