[Python-Dev] Octal literals

Thu Feb 2 20:11:13 CET 2006

On Wed, 1 Feb 2006 13:54:49 -0500 (EST), Paul Svensson <paul-python at svensson.org> wrote:

>On Wed, 1 Feb 2006, Barry Warsaw wrote:
>
>> The proposal for something like 0xff, 0o664, and 0b1001001 seems like
>> the right direction, although 'o' for octal literal looks kind of funky.
>> Maybe 'c' for oCtal?  (remember it's 'x' for heXadecimal).
>
>Shouldn't it be 0t644 then, and 0n1001001 for binary ?
>That would sidestep the issue of 'b' and 'c' being valid
>hexadecimal digits as well.
>
>Regarding negative numbers, I think they're a red herring.
>If there is any need for a new literal format,
>it would be to express ~0x0f, not -0x10.
>1xf0 has been proposed before, but I think YAGNI.
>
YMMV re YAGNI, but you have an excellent point re negative numbers vs ~.

If you look at examples, the representation digits _are_ actually "~" ;-)
I.e., I first proposed 'c' in place of 'r' for 16cf0, where "c" stands for
radix _complement_, and 0 and 1 are complements wrt 2, as are
hex 0 and f wrt radix 16.

So the actual notation has digits that are radix-complement, and
are evaluated as such to get the integer value.

So ~0x0f is represented r16-f0, which does produce a negative number
(but whose integer value BTW is -0x10, not 0x0f. I.e., -16r-f0 == 16r+10,
and the sign after the 'r' is a complement-notation indicator, not
an algebraic sign. (Perhaps or '^' would be a better indicator, as -16r^f0 == 0x10)

Thank you for making the point that the negative value per se is a red herring.

Still, that is where the problem shows up: e.g. when we want to define a hex bit mask
as an int and the sign bit happens to be set. IMO it's a wart that if you want
to define bit masks as integer data, you have to invoke computation for the sign bit,
e.g.,

BIT_0 = 0x1
BIT_1 = 0x02
...
BIT_30 = 0x40000000
BIT_31 = int(-0x80000000)

instead of defining true literals all the way, e.g.,

BIT_0 = 16r1
BIT_1 = 16r2 # or 16r00000002 obviously
...
BIT_30 = 16r+40000000
BIT_31 = 16r-80000000)

and if you wanted to define the bit-wise complement masks as literals,
you could, though radix-2 is certainly easier to see (introducing '_' as transparent elision)

CBIT_0 = 16r-f # or 16r-fffffffe or 2r-0 or 2r-11111111_11111111_11111111_11111110
CBIT_1 = 16r-d # or 16r-fffffffd or 2r-01 or 2r-11111111_11111111_11111111_11111101
...
CBIT_30 = 16r-b0000000 or 2r-10111111_11111111_11111111_11111111
CBIT_31 = 16r+7fffffff or 2r+01111111_11111111_11111111_11111111

With constant-folding optimization and some kind of inference-guiding for expressions like
-sys.maxint-1, perhaps computation vs true literals will become moot. And practically
it already is, since a one-time computation is normally insignificant in time or space.

But aren't we also targeting platforms also where space is at a premium, and being able to
define constants as literal data without resorting to workaround pre-processing would be nice?

BTW, base-complement decoding works by generalized analogy to twos complement decoding, by assuming
that the most significant digit is a signed coefficient value for base**digitpos in radix-complement form,
where the upper half of the range of digits represents negative values as digit-radix, and the rest positive as digit.
The rest of the digits are all positive coefficients for base powers.

E.g., to decode our simple example[1] represented as a literal in base-complement form (very little tested):

 >>> def bclitval(s, digits='0123456789abcdefghijklmnopqrstuvwxyz'):
 ...     """
 ...     decode base complement literal of form <base>r<sign><digits>
 ...     where
 ...         <base> is in range(2,37) or more if digits supplied
 ...         <sign> is a mnemonic + for digits[0] and - for digits[<base>-1] or absent
 ...         <digits> are decoded as base-complement notation after <sign> if
 ...             present is changed to appropriate digit.
 ...         The first digit is taken as a signed coefficient with value
 ...         digit-<base> (negative) if the digit*2>=B and digit (positive) otherwise.
 ...     """
 ...     B, s = s.split('r', 1)
 ...     B = int(B)
 ...     if s[0] =='+': s = digits[0]+s[1:]
 ...     elif s[0] =='-': s = digits[B-1]+s[1:]
 ...     ds = digits.index(s[0])
 ...     if ds*2 >= B: acc = ds-B
 ...     else: acc = ds
 ...     for c in s[1:]: acc = acc*B + digits.index(c)
 ...     return acc
 ...
 >>> bclitval('16r80000004')
 -2147483644
 >>> bclitval('2r10000000000000000000000000000100')
 -2147483644

BTW, because of the decoding method, extended "sign" bits
don't force promotion to a long value:

 >>> bclitval('16rffffffff80000004')
 -2147483644

[1] To reduce all this eye-glazing discussion to a simple example, how do people now
use hex notation to define an integer bit-mask constant with bits 31 and 2 set?
(assume 32-bit int for target platform, counting bit 0 as LSB and bit 31 as sign).

Regards,
Bengt Richter