random

Mon Jun 4 11:52:37 EDT 2001

On Sun, 3 Jun 2001 18:14:47 -0400, "Tim Peters" <tim.one at home.com>
wrote:

>[David C. Ullrich]
>> ...
>> Not that readers are likely to be reading carefully enough that it's
>> going to matter, but just for the sake of telling no lies: I put
>> my finger on what was bothering me about what I said here.
>> It's not true that the sequence of bits of Omega is recursively
>> enumerable. What's true (assuming that the N-th bit is 1 if the
>> N-th program halts and 0 otherwise) is that the sequence of N
>> for which the N-th bit is 1 is recursively enumerable. The
>> set of N for which the N-th bit is 0 is _not_ recursively
>> enumerable (and hence the sequence of bits of Omega is not
>> recursively enumerable.)
>
>It's the "assuming that the N-th bit is 1 [iff] the N-th program halts"
>that's off.  Omega is the probability that "a random" program halts, and it
>won't take long to figure out that "N'th-bit = halts(N-th program)" doesn't
>work for that (e.g., if the 0'th program happens to halt, that would imply
>the halting probability is >= 1/2 across all programs -- but the 0'th
>program isn't generally that smart <wink>). 

Um, actually I'm pretty sure that Omega _is_ what I said it is, and
is also what you say it is.

<mini essay on subleties in probability theory>

The first thing you need to note is that "probability that a random
program halts" is _meaningless_ until we have specified how we are
_choosing_ our "random program". Cf Bertand's Paradox: What's the
probability that the length of a chord of a circle is larger than 
the radius? The answer is that this question has no answer until
we specify how we choose that random chord. Any point inside the
circle is the midpoint of a unique chord - if you choose a "random"
point inside the circle and calculuate the probability that the
length of the chord is longer than the radius you get one answer.
But a chord is also determined by two points on the boundary -
if you choose two points on the boundary "at random" and calculuate
the probability that the chord is longer than the radius you
get a different answer.

Not an actual "paradox", simply shows that the meaning of
"at random" is not well-defined. Technically, one needs to
_specify_ a probability distribution on one's sample space
before talking about the probability of anything.

In some cases there is a "natural" distribution determined by
symmetry. For example if someone says "random integer between
0 and 5 inclusive" it's reasonable to assume that the uniform
distribution is intended, that is, we have X so that

P(X=0) = P(X=1) = ... = P(X=5) = 1/6.

(But in mathematics written by careful mathematicians one
would not assume this was the default, one would specify
the distribution. Never mind...)

Now, what about a "random positive integer"? Is there a
"natural" distribution to assume is meant there? No, when
someone says "random positive integer" we cannot assume
that the default uniform distribution is intended, because
there _is_ no uniform distribution!

Really. Suppose that I'm wrong, and saying "random positive
integer" actually does specify something. Suppose that
X is a random positive integer. Now what is P(X=1)?
If we're talking about a "natural" distribution then
P(X=1) = P(X=2) = P(X=3) = ... , by symmetry. But

P(X=1) + P(X=2) + ... = 1.

Now if all those numebers are equal then they're either
positive or 0. If P(X=1) = 0 then it follows that 0 = 1.
If P(X=1) > 0 then it follows that infinity = 1.

So. There _is_ no "natural" default distribution on the
positive integers - if you say something about
a "random positive integer" you have not said anything
unless you _specify_ a distribution. There are various
different ways to specify a distribution on the
positive integers, and they give different answers to
questions about the properties of a "random positive
integer."

</mini essay on subleties in probability theory>

Ok. So Omega is the probability that a "random program"
halts. This is one of those things I mentioned earlier:
you read people talking about Chaitin's stuff and you
say "what??", then you read what Chaitin actually wrote
and you're relieved to see he included the details:

Yesterday I looked at that link you provided and I
saw Chaitin define Omega as the probability that a
random program halts. My first reaction was that
of course that's meaningless until you specify
how you _select_ your "random program" (more
precisely but less intuitively, until you specify
what distribution on the space of all programs
you are referring to.) But I read on:

Chaitin is very well aware of everything I just
said. Although he did not explain why he needed
to do so, he _did_ "specify" what distribution
he had in mind. Even though it was just a survey
article - he knew I was going to be reading it
someday. He says something like this (um, I may
not get the details the same because I just skimmed
it, I'm pretty sure he says something like this):

There are countably many Turing machines. Enumerate
them as T1, T2, ... . Now perform an experiment:
Flip a coin. If it's heads then let T = T1.
If it's tails flip again and if it's heads let T = T2.
If it's tails flip again and if it's heads let T = T3.
etc. With probability one we eventually get a head,
so we have in fact chosen a random Turing machine T,
with a well-defined distribution. Now Omega is the
probability that T halts.

The previous paragraph is a precise/rigorous
specification of "Omega is the probability that
a random program halts" (except it doesn't quite
pin down what Omega _is_ since it does not specify
exactly what order we put the Turing machines
in at the start. To actually _define_ the sequence
T1, T2,... we'd need to get into details of the
syntax we were using to define TM's.)

Now what _is_ the probability that T halts?
We know that T is one of T1, T2, ... but we
don't know which one. So

P(T halts) = 
    P(T = T1) * P(T1 halts)
  + P(T = T2) * P(T2 halts)
  + P(T = T3) * P(T3 halts)
  + ...

But P(T = Tn) = 2**(-n). And there's nothing
random about T1, it halts or it doesn't.
If we say that Halts(T1) takes values 0 or 1
instead of True or False then we have exactly

P(Tn halts) = Halts(Tn)

for each n. Plug this and that into the previous
equation and you get

P(T halts) =
    2^(-1) * Halts(T1)
  + 2^(-2) * Halts(T2)
  + 2^(-3) * Halts(T3)
  + ...

And this exactly the sum of 2^(-n), where n ranges
over all the values such that Tn halts. Which is
a number with n-th bit = 1 iff Tn halts.

****************

Ane there you are. The phrase "probability that a
random program halts" _is_ meaningless mathematically
until you specify a distribution on the space
of all programs. Chaitin does specify a distribution.
It's possible that I'm recalling the way he chooses
a Turing machine by flipping coins incorrectly, but
I don't think so. And in any case if you _do_
choose your program by flipping a coin as above that
specifies a distribution, and with that distribution
the probability that a random program halts _is_
a number with n-th bit = 1 if and only if the n-th
TM halts.

Yes, this does give us a lot of choice regarding
what Omega should be. Um, "choice" is the wrong
word. It does leave a lot of ambiguity regarding
what the value of Omega actually is. In particular 
there exists an ordering T1, T2,... so that

Omega = 1010101010101010...

in binary. We can't _specify_ how to _define_
that ordering, but it exists (and when we start
with an unspecified ordering we cannot know
that it is _not_ one that gives Omega = .1010....)

In particular if Omega is "the probability that
a random program halts" and that's all we know
about Omega then we cannot know that Omega is
not 0.10101010..., which might seem to some as
not the best choice for an RNG.

(Exercise: Although we cannot know all of Omega,
and so we cannot _specify_ an ordering of our
Turing machines that will make Omega = 0.1010...,
we _can_ __specify__ an _explicit_ ordering
which will make Omega have the form

0.11?11?11?11?... .)

(And in fact when I think about how to do that
and I think about how one would "naturally"
order the TM's it's not clear to me that such
non-randomness does not creep into Omega
defined using a "natural" ordering...)

If I have the "definition" straight (and
note that nothing I've said contradicts
what you said about the definition!) then
the bits of Omega are not going to be much
use for an RNG.

But it turns out that Omega _does_ solve a 
_different_ practical problem! What's the
other pressing problem? Determining which
reals have exact floating-point representations.

Thm1. Omega does not have an exact representation
in finite-but-unbounded-precision floating point.

Thm2. Thm1 is all you can say about Omega, given
the definition above: If x is any real (between
0 and 1) and x does not have an exact representation
in finite-but-unbounded-precision floating point
then there is an ordering T1, T2,... of our
Turing machines that makes Omega = x.

So: If you're wondering whether x can be represented
exactly, simply first jiggle so 0 < x < 1, and then
determine whether or not you can prove that x <> Omega.
x can be represented exactly iff and only if it
is possible to prove x <> Omega from the information
about Omega above.

I mention this for the benefit of those naive
programmers we've been hearing about who want to
know how to tell what can be represented exactly.
HTH.

>Not that the number you define
>there isn't interesting in its own right!  It's a classic example of a
>non-computable real that can be approximated from below, and it shares that
>much with Chaitin's Omega; but, IIRC, it doesn't meet the criteria for
>randomness 

No it doesn't - this is exactly why I said to Alex just now that
come to think of it I don't think we could use the bits of Omega for
an RNG even if we could figure out what they were.

I may have the definition of Omega wrong (but note again
that what I'm taking for the "definition" is exactly
consistent with what you say the definition is, I've
just added more details). If I have the definition
right then this really shows that the notion of 
"random" being used here is very different from
other versions of the word.

(Devil's advocate: But if Omega can be 0.10101010...
then how can it be incompressible? You're not actually
saying Chatin's wrong about that, are you?

Ans: Have you read the _proof_ that it's incompressible?
In particular can you say exactly what the hypotheses
are? It may well be that if we start with a _definable_
ordering T1, T2, ... then we get an incompressible
Omega. That's consistent with the above (the ordering
giving .10101010... is _not_ definable, and while the
ordering giving .11?11?11?... is definable I don't see
why that sequence has to be compressible.

Could well be that he simply didn't include the
hypothesis about the ordering being definable in
his survey paper. (Just like nobody _ever_ includes
all the hypotheses in Godel's theorems when they're
discussing them informally - people just assume that
the axioms are recursive without mentioning it.)

> (machine I can be related to machine I+1 in an exploitable way
>s.t. halt(I) gives information about halt(I+1) -- for example, define the
>encoding s.t. odd-numbered machines J merely prepend a no-op to the J-1
>machine, and then halt(J) == halt(J-1), so adjacent bits in the derived sum
>are strongly correlated).
>
>Perhaps the easiest gimmick for programmers to grok is that the set of
>rationals less than omega is recursively enumerable:  Chaitin gives you a
>generator object g such that the loop
>
>    while 1:
>        r = g.next()
>
>eventually generates every rational number r less than omega.  Then the
>generated sequence of values r_0, r_1, .. approximates omega from below
>(although only the biggest-so-far-seen are interesting), and for any real
>epsilon greater than 0 there's *some* i s.t. omega - r_i < epsilon (since
>every rational less than omega is eventually generated).  So a subsequence
>of the r_i converges to omega.  Two results of the theory are that there's
>no effective way to compute i as a function of epsilon ("you can't tell how
>close you are"), yet there's no better way to compute omega than by running
>that loop (you can study the implementation of g.next() all you like, but
>there's no *fundamentally* better way to do it). 

Yup. Doesn't contradict the statement that the n-th bit is 1 if and
only if the n-th program halts.

> That last part eats at the
>optimizers among us <wink>.
>
>

David C. Ullrich
*********************
"Sometimes you can have access violations all the 
time and the program still works." (Michael Caracena, 
comp.lang.pascal.delphi.misc 5/1/01)