xor: how come so slow?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Oct 19 03:40:11 EDT 2008


On Sun, 19 Oct 2008 16:38:37 +1300, Lawrence D'Oliveiro wrote:

> In message <010a9c3f$0$20653$c3e8da3 at news.astraweb.com>, Steven D'Aprano
> wrote:
> 
>> On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
>> 
>>> Data can come in fractional bits. That's how compression works.
>> 
>> If you don't believe me, try compressing a single bit and see if you
>> get a "fractional bit".
> 
> If both states of the bit are not equally likely, then you do indeed
> have a fractional bit, since
> 
>     nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2


That's an arithmetic mean of the logarithms. It doesn't imply that there 
are fractional bits any more than an average family having 2.3 children 
implies that there are 0.3 of a child wandering around the streets.

Using the Shannon measure of information, you can have messages which 
contain fractional information (technically, "surprisal"), when measured 
in bits. But that doesn't imply the existence of fractional bits. Look at 
it this way: consider a barter economy where I agree to swap 5 chickens 
for 2 axes. So each axe is equivalent to 2.5 chickens. But that doesn't 
imply that there is such a thing as 0.5 of a chicken -- at least not a 
*live* chicken. While I can blithely talk about bartering fractional 
chickens, in practice when I actually go to make good on my promise, it 
must be an integer number of chickens.

Similarly, we can talk about messages containing fractional bits of 
information, but when we actually store or transmit that message in 
practice, we can only use integer numbers of bits.

As Wikipedia puts it:

It is important to differentiate between the use of "bit" in referring to 
a discrete storage unit and the use of "bit" in referring to a 
statistical unit of information. The bit, as a discrete storage unit, can 
by definition store only 0 or 1. A statistical bit is the amount of 
information that, on average[citation needed], can be stored in a 
discrete bit. ... If these two ideas need to be distinguished, sometimes 
the name bit is used when discussing data storage while shannon is used 
for the statistical bit.

http://en.wikipedia.org/wiki/Bit



-- 
Steven



More information about the Python-list mailing list