Compression of random binary data

Terry Reedy tjreedy at udel.edu
Mon Jul 11 15:31:46 EDT 2016


On 7/11/2016 2:09 PM, Joonas Liik wrote:
> On 11 July 2016 at 20:52,  <jonas.thornvall at gmail.com> wrote:
>> What kind of statistic law or mathematical conjecture  or is it
>> even a physical law is violated by compression of random binary
>> data?

Off-topic, but...  It is unclear whether you mean 'random' in the 
technical sense of 'unpredictable' or the common sense that adds 'of 
equal probability'.

Bell engineers discovered that physical communication channels have a 
finite information transmission capacity that could be formalized as 
bits per second.  You should be able to find good articles on the web, 
and I suggest you read some.

If every message could be compressed, than every message could be 
compressed to 0 or 1, which is absurd.

>> I only know that Shanon [Shannon] theorised it could not be done, but were
>> there any proof?

Shannon meant random in the technical sense and explicitly considered 
unequal probabilities.  Random bit streams with unequal probabilities 
*can* be compressed by recoding.

> Compression relies on some items in the dataset being more frequent
> than others,

Perhaps better to say that compression relies on removing redundancy, 
*if there is any*.  The two ideas are related.

> if you have some dataset that is completely random it
> would be hard to compress as most items have very similar number of
> occurrances.

Assuming equal bit probabilities.  Uncorrelated bits of unequal 
probability results in blocks of whatever size having unequal 
probabilites and redundancy that can be removed by replacing blocks with 
coded blocks.  Huffman encoding does this by replacing blocks of equal 
size with code blocks of unequal size, with the size related to the 
probability of the block replaced.

-- 
Terry Jan Reedy




More information about the Python-list mailing list