Algorithm that makes maximum compression of completly diffused data.

Chris Angelico rosuav at gmail.com
Wed Oct 30 19:41:11 EDT 2013


On Thu, Oct 31, 2013 at 10:01 AM, Tim Chase
<python.list at tim.thechases.com> wrote:
> On 2013-10-30 21:30, Joshua Landau wrote:
>> started talking about compressing *random data*
>
> If it's truly random bytes, as long as you don't need *the same*
> random data, you can compress it quite easily.  Lossy compression is
> acceptable for images, so why not random files?  :-)

Maybe. But what if it's not truly random, but only pseudo-random?

# create a file full of random data
import random
seed = random.getrandbits(32)
length = random.getrandbits(16) # in four-byte units
random.seed(seed)
inname = "random.txt"
namez = inname + '.rnz'
with open(inname, "wb") as bigfile:
    for _ in range(length):
        bigfile.write(random.getrandbits(32).to_bytes(4,"big"))

# compress that file
with open(namez, "wb") as smallfile:
    smallfile.write(seed.to_bytes(4,"big"))
    smallfile.write(length.to_bytes(4,"big"))

# uncompress it
with open(namez, "rb") as f:
    seed = int.from_bytes(f.read(4),"big")
    length = int.from_bytes(f.read(4),"big")
random.seed(seed)
with open("out_" + inname, "wb") as bigfile:
    for _ in range(length):
        bigfile.write(random.getrandbits(32).to_bytes(4,"big"))

Voila! Very impressive compression ratio, and exploits the very
randomness of the data!

ChrisA



More information about the Python-list mailing list