[Tutor] Re:Base 207 compression algorithm

Blake Winton bwinton@latte.ca
Thu Jun 26 23:10:04 2003


* Jeff Shannon <jeff@ccvcorp.com> [030626 20:39]:
> >>However, you're not going to have any luck in actually doing any math 
> >>with either of these strings.
> >Sure you can. You convert back to decimal. 
> No, because your computer can't do math on a string of decimal digits. 
> It needs to convert that into a binary number somehow before it can do 
> math.  And it can *store* it as a binary number a lot more efficiently 
> than it can store it as a string of digit characters, no matter what 
> encoding scheme you use for those characters.

Well, that's not entirely true...
_I_ know how to do arithmatic in decimal.  It's kind of painful,
but it's possible.

i.e.:
def stringPlus( x, y ):
    if len(x) < len(y):
        x = ("0" * (len(y)-len(x))) + x
    elif len(y) < len(x):
        y = ("0" * (len(x)-len(y))) + y

    carry = "0"
    answer = ""
    digitCount = len(x)
    for i in range(digitCount):
        xDigit = x[len(x)-1-i]
        yDigit = y[len(x)-1-i]
        (newDigit, carry) = add( xDigit, yDigit, carry )
        answer = newDigit + answer
    if carry != "0":
        answer = carry + answer
    return answer

def add( x, y, carry ):
    (newDigit,carry) = addTwo( x, y )
    (newDigit,carry) = addTwo( newDigit, carry )
    return (newDigit,carry)

def addTwo( x, y ):
    if x == "0":
        return y,"0"
    elif y == "0":
        return x,"0"
    elif x == "1":
        if y == "1":
            return "2","0"
        if y == "2":
            return "3","0"
        # The rest of this elif left as an exercise for the reader.
    elif x == "2":
        if y == "1":
            return "3","0"
        if y == "2":
            return "4","0"
    elif x == "3":
        if y == "1":
            return "4","0"
        if y == "2":
            return "5","0"
    # Aw, heck, you can fill in the rest of this method too.

if __name__ == "__main__":
    print stringPlus( "123", "11" )

Notice that I didn't even cheat and use "ord" and "chr".  :)

> Like I said, learn how your computer works at the level of registers, 
> and how the floating-point unit operates, and you'll understand this a 
> bit better.

This, of course, is always good advice.

> >So you admit I have reduced the file size of 5000 digits of pi to 2100 

Let me just theorize for a second here on how I might compress
5000 digits of pi.  First, I'm going to assume that those 5000
digits are in base 10.  So, I can get an integer by multiplying
that number by 10^5000.  Follow me?

Here's some code:
>>> # Okay, so I'll approximate pi, so sue me.
>>> x = "3141592654" + "0"*4990
>>> len(x)
5000
>>> y = long(x)
>>> # So now we have pi as a number.  Let's shift it right by a bit.
>>> y >> 16607
1L
>>> # Neat, so we now know that y takes 16607 bits to store.
>>> # Each byte is 8 bits, so we can store y as 16608/8 bytes.
>>> 16608/8
2076

So, the smallest you can store 5000 random digits as is 2076 bytes.
If you try to get it any smaller, you'll need to try and store more
than 8 bits in a byte, and since there are only 8 bits in a byte,
you've got a problem.  Cino, if you want to mail me privately about
this, I would be more than happy to try and explain it further.  But
I think that Jeff's right, and it should really go off-list now.

Later,
Blake.
-- 
 11:04pm  up 42 days, 11:11,  4 users,  load average: 0.00, 0.00, 0.00