flaming vs accuracy [was Re: Performance of int/long in Python 3]

Thu Mar 28 12:55:46 EDT 2013

Chris,

Your problem with int/long, the start of this thread, is
very intersting.

This is not a demonstration, a proof, rather an illustration.

Assume you have a set of integers {0...9} and an operator,
let say, the addition.

Idea.
Just devide this set in two chunks, {0...4} and {5...9}
and work hardly to optimize the addition of 2 operands in
the sets {0...4}.

The problems.
- When optimizing "{0...4}", your algorithm will most probably
weaken "{5...9}".
- When using "{5...9}", you do not benefit from your algorithm, you
will be penalized just by the fact you has optimized "{0...4}"
- And the first mistake, you are just penalized and impacted by the
fact you have to select in which subset you operands are when
working with "{0...9}".

Very interestingly, working with the representation (bytes) of
these integers will not help. You have to consider conceptually
{0..9} as numbers.

Now, replace numbers by characters, bytes by "encoded code points",
and you have qualitatively the flexible string representation.

In Unicode, there is one more level of abstraction: one conceptually
neither works with characters, nor with "encoded code points", but
with unicode transformed formated "entities". (see my previous post).

That means you can work very hardly on the "bytes levels",
you will never solves the problem which is one level higher
in the unicode hierarchy:
character -> code point -> utf -> bytes (implementation)
with the important fact that this construct can only go
from left to right.

---

In fact, by proposing a flexible representation of ints, you may
just fall in the same trap the flexible string representation
presents.

----

All this stuff is explained in good books about the coding of the
characters and/or unicode.
The unicode.org documention explains it too. It is a little
bit harder to discover, because the doc is presenting always
this stuff from a "technical" perspective.
You get it when reading a large part of the Unicode doc.

jmf