Why can't I xor strings?

Mon Oct 11 06:51:49 EDT 2004

On Mon, 11 Oct 2004 04:08:49 GMT, Jeremy Bowers <jerf at jerf.org> wrote:

>On Mon, 11 Oct 2004 02:19:03 +0000, Bengt Richter wrote:
>
>> On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <jerf at jerf.org> wrote:
>>>For positive numbers, a number really is its vector of bits. In the
>>               ^^^^^[1]              [2]^^ ^^^[3]        ^^^^[4]
>> [1] Abstract, non-negative integers?
>
>Integers is more convenient here, but any one dimensional number will do.
>It so happens that computers do not encode "reals" in the naive way, to
>allow for more variation in magnitude.
True. But my point (just about everywhere in the post) was to draw attention
to the distinction between abstract entities (e.g. numbers) and concrete representations.
>
>> [2] "is" as in "is identical with"? I doubt it.
>
>With respect, this isn't something to doubt or not doubt. There is one,
>and only one, way to represent any positive number in base two, since
I think we are focusing on different things. When you say "a number really is
its vector of bits" I think wait: a number is a number and a vector of bits
is a vector of bits, but you can't say one _is_ the other. You can say they
correspond in some way. When you say of a number "its vector of bits" I presume
that "its" refers to "its" partner in the number<->bit vector relationship,
not in the sense that an aggregate has its components. To me, a (one-dimensional)
number is a primitive entity, and "has" no parts, though it may correspond to
a representation that does have parts and order etc.

I am hard put to think of anything that is _identical_ with any of the
possible representations that it may be put in correspondence with.

>encoding sign is not an issue. Assuming an extra bit to show sign, there
>is one and only one way to represent any negative number, too. (Zero gets
>to be the exception since then you can have positive and negative zero,
>which are theoretically identical, but on some machines that implement
>them have practical differences.)
>
>Endianness is just some wackiness in the ordering, but does not really
>change that. 60000 + 50000 = 110000 no matter what endianess you have.
Wackiness must be accounted for ;-)

>
>> [3] In what sense does an abstract number posess (its) a vector of bits?
>
>It does not. It *is* a series of ones and zeros, in base two. It *is* a
>series of ones, zeros, and twos, in base three. It *is* a series of [0-9]
>in base ten. It is all of these things at once, and an infinite number of
>other things besides.
Depends on what the meaning of "is" is ;-) Your "is" seems to mean
"can be represented by" in my terms. That's not "is" in my book ;-)

>
>1 + 1 fully *is* 2. There is no difference mathematically. If there is,
>your math is broken.
You are apparently ignoring representation and just thinking in terms of
what is indicated. So yes, '1 + 1' represents something and '2' represents
something, and for some getabstractvalue, getabstractvalue('1 + 1') _is_ 
getabstractvalue('2'). But whatever your representation, there is that
function that maps to the abstract mathematical value. 

>
>> [4] A two's complement representation of an (abstract) integer, has
>> "bits"
>>     as _numerical_ values in the sum series with corresponding numerical
>>     coefficients that are powers of two. But "bits" in the context of
>>     bitwise ^ or & or | do not represent abstract _numerical_ values or
>>     0 or 1, they represent logical values False or True (and 0<->False,
>>     1<->True is just a convention that we're used to.
>
>No. They represent both of those things at once. 1 *is* True, and 0 *is*
>false, in the domain of bitwise operators. 
ISTM we have different concepts of "is" ;-)
>
>There is no "either/or" here, only "both/and".
>
>> IOW, there's no such thing as "a base 10 number"...  A "base 10 number" is composed with a set of
>> digits [0-9] and a rule for ordered composition and a rule for an
>> interpretation to map back to the abstract value. 
>
>Well, I thank you for supplying the missing definition, I suppose.
>
Well, I made up an ad hoc definition to try to express my thinking on it ;-)

>> We can discuss each
>> part of this process, but the bottom line is that when we say that
>> represented values are equal, we are saying that representations map
>> back to the same unique abstract value, not that representations per se
>> are identical.
>
>Yep, that is part of my point, see below.
>
>>>>     What is boolvec(388488839405842L) ^ boolvec("hello")?
>>>
>>>I'd rather see this as the fairly meaningless "388488839405842L base 2",
>>>meaningless because the base of a number does not affect its value, and
>>>bitwise-xor, as the name implies, operates on the number base two.
>> Well, to be nit-picky, I still think it does not really operate on a
>> number at all. It converts a number to an ordered set of bools, by way
>> of an intermediate two's complement _representation_ whose _numeric_ bit
>> values are mapped to bool values. Then, after the operating with the
>> ordered sets of bools, the resulting set is mapped back to a two's
>> complement integer representation again, which can be interpreted as an
>> abstract integer value again ;-)
>
>In the domain of binary logic, I do not concede a difference between "1"
>and "True", on the grounds that there is no way in the domain of binary
>logic to distinguish the two.
It's a matter of interpretation. I view 1 and True as different, sort of like

 >>> type(1), type(bool(1))
 (<type 'int'>, <type 'bool'>)

>
>> IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64
>> possible states of the representation machinery. Not all of those states
>> are legal floating point number representations, but each one that _is_
>> legal corresponds to an _exact_ abstract real value. It _can_ be the
>> exact value you wanted to represent, or not, depending on your
>> programming problem. Inexactness in not a property of floating point
>> representations, it is a property of their _relation_ to the exact
>> abstract values the programmer is trying to represent. 0.0 represents
>> the exact same abstract value as integer 0 (perhaps by way of mapping
>> integers conceived as separate to a subset of reals).
>
>This is a matter of definition. My problem with your definition mostly
>lies in the fact that it is not as practically useful as mine. Someone
>operating on the "floats are almost, but not quite, exact values" is
I didn't say that. I said they do represent _exact_ values, and that you
can use the available exact values as you like, including intepreting them
as tokens representing an interval on the real axis containing all the values
that will round to the exact value in question.

>*much* less well equipped to handle the way floats bite you than someone
>who operates with "floats represent a value and have an error range around
>that value", because then they should understand they need to compute not
>just with the values, but track the buildup of that error range as well.
>It's a point-of-view thing.
>
>>>can be made arbitrarily small but never truly identifies one point. This
>> I disagree. 0.0 truly identifies one point. So does
>> 
>>     3.141592653589793115997963468544185161590576171875
>> 
>> which is the _exact_ value of math.pi in decimal representation. But
>> this is _not_ the exact value of the abstract mathematical pi.
>
>Python 2.3.4 (#1, Sep 28 2004, 20:15:25) 
>[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2
>Type "help", "copyright", "credits" or "license" for more information.
>>>>  3.1415926535897931159979634685441851615905761718758327502398752309 == 3.14159265358979311599796346854418516159057617187500000000000
>True
Two different numbers were rounded to the same exact number ;-)
>
>Look at the last digits. (Python is just a convenience here, this is
>fundamental to all float representation schemes, as opposed to things like
>Decimal.)
>
>This is why I say they represent a range. In set theory, we'd call this an
I agree they can be used to _represent_ a range. The range (actually a half-open
continuous interval I think) will be defined in terms of the _exact_ value
and the neighboring _exact_ values available in the set of values with concrete
representations in the particular hardware.

>equivalence class for equality. There is *no* way to distinguish between 
>3.1415926535897931159979634685441851615905761718758327502398752309 and
>3.14159265358979311599796346854418516159057617187500000000000 with floats
>(of the size Python uses, add more digits for some other bit count), not
>even in theory. They are the same in every way. That's because the closest
>float represents a range that covers them both. You can shrink that range
Now tell me _exactly_ what that "range" is. I know it's a messy algorithm,
but the intevals are _exactly_ defined, so where will the _exact_ mathematical
values of the limits come from? ;-) (we'll ignore fpu rounding modes ;-)
>arbitrarily small but you can not get to a unique point, such that only
>one true real will compare equal to that number and all others do not, not
>even in theory.
Well, yes, we can only create finite sets of representations ;-)
But you can "get to a unique point" by switching interpretation. I.e.,
if you take the floating point number's exact value as such, instead of
using that value and neighbors as a means to define an interval, then
you do have a unique point. E.g., all the exact values of a 53-bit integer
can be represented _exactly_ by a typical 64-bit double. The floating point
representation is no less exact than the corresponding int or long.

Regards,
Bengt Richter