Algorithm that makes maximum compression of completly diffused data.

Modulok modulok at gmail.com
Wed Oct 30 15:46:57 EDT 2013


On Wed, Oct 30, 2013 at 12:21 PM, <jonas.thornvall at gmail.com> wrote:

> I am searching for the program or algorithm that makes the best possible
> of completly (diffused data/random noise) and wonder what the state of art
> compression is.
>
> I understand this is not the correct forum but since i think i have an
> algorithm that can do this very good, and do not know where to turn for
> such question i was thinking to start here.
>
> It is of course lossless compression i am speaking of.
> --
> https://mail.python.org/mailman/listinfo/python-list



>> I am searching for the program or algorithm that makes the best possible
of
>> completly (diffused data/random noise) and wonder what the state of art
>> compression is.

None. If the data to be compressed is truly homogeneous, random noise as you
describe (for example a 100mb file read from cryptographically secure random
bit generator such as /dev/random on *nix systems), the state-of-the-art
lossless compression is zero and will remain that way for the foreseeable
future.

There is no lossless algorithm that will reduce truly random (high entropy)
data by any significant margin. In classical information theory, such an
algorithm can never be invented. See: Kolmogorov complexity

Real world data is rarely completely random. You would have to test various
algorithms on the data set in question. Small things such as non-obvious
statistical clumping can make a big difference in the compression ratio from
one algorithm to another. Data that might look "random", might not actually
be
random in the entropy sense of the word.

>> I understand this is not the correct forum but since i think i have an
>> algorithm that can do this very good, and do not know where to turn for
such
>> question i was thinking to start here.

Not to sound like a downer, but I would wager that the data you're testing
your
algorithm on is not as truly random as you imply or is not a large enough
body
of test data to draw such conclusions from. It's akin to inventing a
perpetual
motion machine or an inertial propulsion engine or any other classically
impossible solutions. (This only applies to truly random data.)

-Modulok-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20131030/e14291f0/attachment.html>


More information about the Python-list mailing list