converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

James Stroud jstroud at mbi.ucla.edu
Sun May 20 04:09:02 EDT 2007


James Stroud wrote:
> Now with one test positive for Int, you are getting pretty certain you 
> have an Int column. Now we take a second cell randomly from the same 
> column and find that it too casts to Int.
> 
> P_2(H) = 0.9607843    --> Confidence its an Int column from round 1
> P(D|H) = 0.98
> P(D|H') = 0.02
> 
> P_2(H|D) = 0.9995836
> 
> 
> Yikes! But I'm still not convinced its an Int because I haven't even had 
> to wait a millisecond to get the answer. Lets burn some more clock cycles.
> 
> Lets say we really have an Int column and get "lucky" with our tests (P 
> = 0.98**4 = 92% chance) and find two more random cells successfully cast 
> to Int:
> 
> P_4(H) = 0.9999957
> P(D|H) = 0.98
> P(D|H') = 0.02
> 
> P(H|D) = 0.9999999


I had typos. P(D|H') should be 0.01 for all rounds.

Also, I should clarify that 4 of 4 are positive with no fails observed. 
Integrating fails would use the last posterior as a prior in a similar 
scheme.

Also, given a 1% false positive rate, after only 4 rounds you are 1 - 
(0.01**4) = 99.9999% sure your observations aren't because you 
accidentally pulled 4 of the false positives in succession.

James



More information about the Python-list mailing list