Does Python need a '>>>' operator?

Ken Peek Ken.Peek at SpiritSongDesigns.comNOSPAM
Mon Apr 15 22:40:49 EDT 2002


"Bengt Richter" <bokr at oz.net> wrote in message
news:a9fcut$e9l$0 at 216.39.172.122...
| On Mon, 15 Apr 2002 00:06:34 -0700, "Ken Peek"
<Ken.Peek at SpiritSongDesigns.comNOSPAM> wrote:
|
| >
| >"Martin v. Loewis" <martin at v.loewis.de> wrote in message
| >news:m34ridmq00.fsf at mira.informatik.hu-berlin.de...
| >
| >| That's what I'm trying to tell you all the time: the
| >| '>>>' operator is meaningless if it is defined as
| >| "fill in zeroes". How does it know where to start
| >| inserting zeroes?
| >
| >No, it isn't meaningless:
| >
| >The 'long' class has an internal representation for the
| >long number.  The number of bytes that are currently
| >being used to contain the number are known (internally)
| >to the object.  The zeroes get shifted into the high bit
| >of the number, no matter how many bytes are used to
| >contain the number.
| >
| Actually the longs may not be byte-based at all. They could
| be based on an array of shorts with 15-bit values. You don't
| want to see the actual bit pattern of that, do you? At least,
| not as a general purpose thing.

What I want, is an object that "looks and feels" like a 'binary twos-complement
integer', that can be extended a byte at a time.  I don't _CARE_ how the machine
represents it internaly.  It might even be in BCD for some weird CPU (for all we
know)-- but the interface that _I_ see, is a 'binary twos-complement integer'
made up of a finite number of bytes.  I don't want to know HOW the machine
STORES it-- I just want a consistent interface to it.

When I print it out (in decimal)-- I want it to appear the way it does now--
sign and magnitude.

When I print it out in hexadeciaml, I want the bit representation of a 'binary
twos-complement' number, of a size (in bytes) that will represent the number
without truncation (regardless of HOW the number is stored inside the machine.)

| >An int type simply shifts a zero into bit 31, so that
| >this works the same as it will on a long.

Let me clarify the above-- the '>>' should behave in a manner such that the sign
bit is COPIED to the high bit during a right shift.  So-- if the number starts
out as a positive number, it STAYS a positive number after the shift.  The
proposed '>>>' operator forces a zero into the high bit, having no effect on the
sign bit of a positive number, but forcing a negative (twos-complement) number
to now be positive after the shift.  Again-- I don't care HOW the number is
represented internally-- this is the way it should always appear to the
programmer...

| >The '>>' operator should also work the same for a long
| >as it does for an int.
| It can't, because the long does not represent a fixed-width
| bit pattern, it represents an infinite bit pattern. The infinite
| comes from extending the sign bit. Thus shifting in zeroes at
| the 'top' of a negative number implies shifting them in at bit
| position infinity. You can say that we don't actually use an
| infinite number of bits in the internal representations, but
| the/a point of numeric unification is to hide that representation.
| All you can legitimately refer to is the implied abstraction, which
| has indefinitely extended sign bits.
|
| If you do choose some top bit in the indefinite extension as 'the'
| sign bit, it should be done consistently, and what you are doing
| should be clear. For example, if you scanned down from the left until
| you found two adjacent differing bits (or a single bitif you reach the
| bottom) you could say the ms bit of those was 'the' sign bit. Now you have
| a width, and this can be computed for any integer, positive or negative.
|
| On the basis of that width, you could shift in zeroes. I'm not sure how
| useful that is, but I would say it would be more consistent that picking
| an arbitraryly occuring k*4 or k*8 or k*15 bit width value that you might
| get from the current state of an internal representation.
|
| In any case, you can define a function to whatever you actually want.
| (I suspect you'll wind up specifying width explicitly). But if
| you want a >>> operation, what width should it use? Remember, different
| platforms may use different internal long representations.

You CANNOT do this in a portable way from OUTSIDE of the object.  Only the
OBJECT "knows" how many bytes in an equivalent twos-complement binary number are
required to represent the number contained in the object.  Even if the internal
representation is in BCD, or a funny "17-bit per word" [an old IBM machine did
this] format, the programmer should always "see" an array of BYTES that
represent a twos-complement binary integer.  If the long class overloads the
'>>>' operator, and it "knows" how many "pseudo-bytes" are needed to represent
the number, then it can damn well "do the good thing" when this method is called
(as an operator on the long.)  YOU however, are not supposed to be peeking
inside the object (OOP rule #1-- encapsulation)-- let the object figure out how
to do the '>>>' (and the '>>' and '<<' for that matter.)

| >The '<<' operator should also work the same for a long
| >as it does for an int.
| >
| That one's easy ;-)
|
| >The 'hex()' method should work the same for a long as
| >it does for and int.
| >
| I think that's agreed too, just not the details. I agree with you that
| sign/magnitude is not very useful if you're interested in bits. (I might
| be referring to different bits than you though).

See my comments above-- IF we ALWAYS have the "look and feel" of a binary
twos-complement integer that is ALWAYS made up of an array of BYTES (no matter
HOW it is represented internally to the machine), then we can ALWAYS print a
hexadecimal in the twos-complement form, without any silly sign characters...
Example:

>>> a = 0x43210986

# a is an 'int' now, (on most machines,
# but that is none of our business)

>>> b = a << 1

# b is now a long, equal to: 0x008642130c, or the
# equivalent of 5 bytes (no matter HOW it is
# represented internally.)  This is because we need
# the extra byte to convey the sign.  We could
# shift this left 7 more times, without having to
# add another byte...

>>> print b
2252477196

>>> b = -b  # change sign

>>> print b
-2252477196

>>> print hex(b)

0xff79bdecf4     # no 'L' when version >= 2.3.x!!

# THIS is what _I_ expect--  the number of "bytes"
# (or at least the equivalent) is still the same,
# and we can see that the number is now negative,
# and the number of "bytes" is still 5 (no matter
# HOW it is represented internally), so we now we
# print 10 digits for 5 bytes...

# Now, if we type:

>>> d = 0x8642130c # d is a 'long' on most machines

>>> print type(d)
<type 'long'>

# when we reach Python version >= 2.3.x:
>>> d = d >> 1  # d is NOW an 'int' on most machines
>>> print type(d)
<type 'int'>

>>> print d
0x43210986

# ===============================

And now, to address the so-called "type casts" 'int()'
and 'long()'-- well, these just aren't needed anymore
because the system handles this automatically now, so
they are deleted from Python...

You see? Maybe I should have provided some examples.

This is how _I_ believe things should work.






More information about the Python-list mailing list