[Tutor] Re:Base 207 compression algorithm

cino hilliard hillcino368@hotmail.com
Thu Jun 26 15:33:08 2003


This is a multi-part message in MIME format.

------=_NextPart_000_4c5d_617c_2695
Content-Type: text/plain; format=flowed

To:Jeff Shannon and all other Python users

Cino Hilliard wrote:
>># we can determine the compression ratio for various bases. # Eg., base 2 
>>= 332%,
>># base 8 =111%, base 10 =100%, base 100 = 50%, base 207 = 43.2%.
>
>
Jeff Shannon wrote:
>This is mistaken, because you're only changing how many characters are used 
>to display it on the screen.  No matter what base it's displayed in, it is 
>still *stored* in binary, and this will not compress anything. Whether you 
>see 'FF' or '255' or '@' (or whatever other character might be used to 
>represent that number in whatever base you try to use), it still must 
>occupy one byte of memory / hard drive space.  Once again, you're confusing 
>the display with internals.  Changing the base only affects display.

Does anyone else in the python community agree with this?


Attached is a script that can be used to zip numbers. It is a base 2-207 
compression algorithm that I
developed using python's arbitrary integer precision feature. Base 207 was 
used for the output in this
example.

Here is the output for 5000 digits of Pi stored to two files pib10.txt for 
the decimal expansion and
pib207.txt for the base 207 conversion. I included Win Zip and pkzip files 
also. You will notice that the
compression in base 207 gives a better ratio than the Zip file compression. 
The compression ratio
improves logorithmically with increasing bases.


Also shown is the compression for 1000! in base 207.

This is in no way intended to imply that this code is better than the 
professional zip code. It is just a
demonstration of ways to compress a string of numbers using base conversion. 
Currently, it only works
on numbers.


C:\Python23>dir pib*.*
Volume in drive C has no label.
Volume Serial Number is E834-0F93

Directory of C:\Python23

06/26/2003  12:37 PM             5,000 pib10.txt
06/26/2003  12:37 PM             2,159 pib207.txt
06/26/2003  12:44 PM             2,665 pib10.zip
06/26/2003  01:34 PM             2,594 pib10pk.ZIP
06/26/2003  02:02 PM             2,568 fact10.txt
06/26/2003  02:02 PM             1,109 fact207.txt

_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*  
http://join.msn.com/?page=features/virus

------=_NextPart_000_4c5d_617c_2695
Content-Type: text/plain; name="practical.py"; format=flowed
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="practical.py"

#                     A practical application of base conversion.
#                                 By Cino Hilliard
#                                    6/24/2003
# This little program demonstrates a practical use of base conversion to 
compress
# base 10 numbers using the ascii set 48 - 255 allowing bases 2 - 207. With 
a little work,
# it can be changed to compress text also. Using the testpi function for 
1000 digits,
# we can determine the compression ratio for various bases. # Eg., base 2 = 
332%,
# base 8 =111%, base 10 =100%, base 100 = 50%, base 207 = 43.2%.
# Perhaps others in the list can tweek to get better compression. It may be 
possible
# use another character set to say super ascii 511. Processing gets slow as 
we increase
# the number of digits to say 10000. This may be improved by doing 1000 
characters at a
# time getting 10 packets of base 207 to be converted back 1000 at a time. 
Also this could
# be used as an encryption scheme for sensitive data. If you are a Mystic, 
you cal look
# for words or messages in the characters of Pi. Go out far enough and you 
will read the
# Bible word for word. with this. You will have to place the spaces and 
punctuation in
# though. Prime number enthusiasts can use the base converter to find prime 
words or
# phrases.

def testpi(r1,r2,n):
    f1 = open('pib10.txt','w')
    f2 = open('pib207.txt','w')
    pi = piasn(n)
    print pi
    print ""
    x = base(r1,r2,pi)
    print x
    print len(x)
    y = base(r2,r1,x)
    print y
    f1.write(pi)
    f2.write(x)
    f1.close()
    f2.close()

def testfact(r1,r2,n):
    f1 = open('fact10.txt','w')
    f2 = open('fact207.txt','w')
    fact1 = fact(n)
    print fact1
    print ""
    x = base(r1,r2,fact1)
    print x
    print len(x)
    y = base(r2,r1,x)
    print y
    f1.write(fact1)
    f2.write(x)
    f1.close()
    f2.close()

def base(r1,r2,num):
    import math
    digits=""
    for j in range(48,255):
          digits = digits + chr(j)
    num = str(num)
    ln  = len(num)
    dec = 0
    for j in range(ln):
          asci = ord(num[j])
          temp   = r1**(ln-j-1)
          ascii2 = asci-48
          dec += ascii2*temp
    RDX = ""
    PWR = math.log(dec)/math.log(r2)
    j = int(PWR)
    while j >= 0:
          Q   = dec/(r2**j)
          dec = dec%(r2**j)
          RDX = RDX + digits[Q]
          j-=1
    return RDX

def pix(n):  # Compute the digits of Pi
    n1 = n*34/10
    m = n+5
    p=d =10**m
    k=1
    while k < n1:
           d = d*k/(k+k+1)
           p = p+d
           k+=1
    p*=2
    p=str(p)
    return p[:n]

def piasn(n):  # My faster version to compute Pi.
    n1=n*7/2+5
    n2=n/2 + 5
    m=n+5
    p=x=10**(m)
    d =1
    while d <= n1:
         x=x*d/(d+1)/4
         p=p+x/(d+2)
         d += 2
    p*=3
    p=str(p)
    return p[:n]

def fact(n):
    f = j = 1
    while j <= n:
            f*=j
            j+=1
    f=str(f)
    return f



------=_NextPart_000_4c5d_617c_2695--