Clever hack or code abomination?

Fri Dec 2 01:02:01 EST 2011

On Fri, Dec 2, 2011 at 4:34 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Fri, 02 Dec 2011 13:07:57 +1100, Chris Angelico wrote:
>> I would consider integer representations of ASCII to be code smell. It's
>> not instantly obvious that 45 means '-', even if you happen to know the
>> ASCII table by heart (which most people won't).

Note, I'm not saying that C's way is perfect; merely that using the
integer 45 to represent a hyphen is worse.

> In what mad universe would you describe
> 65 as a letter?

I dunno, this universe looks pretty mad from where I am... wait, where
am I? Oh, right. Rutledge's Private Clinic... nice soft walls they
have here...

> To say nothing of the fact that C's trick only works (for some definition
> of works) for ASCII. Take for example one of the many EBCDIC encodings,
> cp500. If you expect 'I' + 1 to equal 'J', you will be sorely
> disappointed:
>
> py> u'I'.encode('cp500')
> '\xc9'
> py> u'J'.encode('cp500')
> '\xd1'

Nothing to do with C, this is a feature of ASCII. Anything involving
arithmetic on ordinals depends on your encoding, regardless of
conflation of int and char.

> Anyone unfamiliar with C's model would have trouble guessing what 'A' + 1
> should mean. Should it be?
>
> -  an error
> -  'B'
> -  'A1'
> -  the numeric value of variable A plus 1
> -  66  (assuming ascii encoding)
> -  194  (assuming cp500 encoding)
> -  some other number
> -  something else?

Agreed. But implicit casting is both a minefield and a source of
immense amounts of clarity. Imagine if you couldn't implicitly cast
int to float - it'd stop you from losing precision on large integers,
but it would get in the way any time you want to work with integers
and floating point together.

> Note that this still doesn't work the way we might like in EBCDIC, but
> the very fact that you are forced to think about explicit conversion
> steps means you are less likely to make unwarranted assumptions about
> what characters convert to.

I don't know about that. Anyone brought up on ASCII and moving to
EBCDIC will likely have trouble with this, no matter how many function
calls it takes.

> Better than both, I would say, would be for string objects to have
> successor and predecessor methods, that skip ahead (or back) the
> specified number of code points (defaulting to 1):
>
> 'A'.succ()  => 'B'
> 'A'.succ(5)  => 'F'
>
> with appropriate exceptions if you try to go below 0 or above the largest
> code point.

... and this still has that same issue. Arithmetic on codepoints
depends on that.

I'd be fine with a simple syntax that gives a Unicode codepoint,
rather than an ASCII one; at least that's standardized. Being able to
work with characters as though they're integers is a huge advantage in
low-level code, but not so vital in Python. But if you're going to do
it, you may as well do it without all the syntactic salt of explicit
conversion functions.

ChrisA