[Tutor] Re:Base 207 compression algorithm [numbers are represented as on/off switches]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Fri Jun 27 19:02:01 2003


On Thu, 26 Jun 2003, cino hilliard wrote:

> You are not understanding what my program does. Have you tried it? This
> bas converter is my unique design allowing characters from ascii 48 -
> 255. So you will get ??  for 255 base 10 to base 16.


Hi Cino,

What Jeff is trying to say is that a single Python character actually must
need to represent a numeric range from 0 to 255: a character in Python is
a "byte", a sequence of eight bits.


What we are seeing on screen, when we say something like:

###
>>> x = 42
>>> x
42
###


is actually misleading, if not an outright lie: it's not how modern
computers store numbers!  Computers don't store them in decimal base ten:
they actually store them in binary base 2.


That is, there is a bit of software that transparently translates the
binary representation of 42,

    42 == 2     +   8     +   32
       == 2**1  +   2**3  +   2**5

and in the computer, this is actually represented as a bunch of switches:


 position:       0      1      2      3      4    5     6     7
 value:         off     on    off     on    off   on   off   off



Anyway, there's a real program out there that just deals with the binary
"on/off" representation of a number, and generates a string that people in
our civilization can read: a programmer had to write this software to turn
something like 00101010 into the string '42' on a computer screen.

It's to the credit of computer designers today that most people don't
realize how stupid computers really are.  *grin*



If you're interested, you can see a sample implementation of what such a
"binary" to "decimal" number stringing function might look like here:

    http://www.linet.gr.jp/~mituiwa/h8/printf-20020909.c

This file contains a set of definitions in the C language for the 'printf'
string formatting function, and several of those functions deals with
taking binary numbers and transforming it into ASCII.  In no sense is your
PC designed to use base ten natively: it's all base two, and programmers
have written software to make it only look like base ten.

(By the way, anyone interested in coding printf in Python?  It would make
a cute Useless Python project.  *grin*)




A long long time ago, some computers did used to store numbers in a
different base system other than binary, but computer architects
eventually agreed to standardize on base two, and now binary computers are
the dominant species on this planet.


If you're interested in the history of how numbers are actually
represented in computers, take a look at:

    http://www2.hursley.ibm.com/decimal/

near the bottom; there's a good set of links on that page that talk about
different base systems.


Another awesome resource is in the book "The Art of Computer Programming".
If you visit your local technical library or bookstore, check out Chapter
Four of that book (it's in Vol 2): you'll can find a heck of a lot of
information on the history of the positional number systems.




Getting back on track: Jeff's point is that your compression scheme makes
a subtle mistake: it is assuming that computers can use an arbitrary base
to represent numbers.  The problem is that every "number" eventually has
to revert to base two to be stored by a computer.  The hardware of today's
modern computers can only represent on/off.


That's where the savings of your compression will evaporate.  You're
seeing:

    computer ---> human ----------> cino's algorithm

    base 2 -----> base 10 --------> base 207



and if this were the end of the story, then yes, your algorithm will save
space.  But you're forgetting that we ultimately have to put this back
into the computer:


    computer ---> human ----------> cino's algorithm --> computer

    base 2 -----> base 10 --------> base 207  ---------> base 2


and it's this invisible translation back to computer representation that
wrecks the savings of your compression scheme.



I hope that made sense.  When we're saying that your scheme won't work,
please don't take it personally!  We are criticizing the technical parts
of the algorithm you described, and not you personally.  Because you
invented the algorithm, though, I can understand if you feel a little
touchy.

Anyway, please feel free to ask more questions on Python-Tutor;  we'll try
what we can to expose the facade of these stupid computers.  *grin*


Good luck to you.