converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

James Stroud jstroud at mbi.ucla.edu
Sat May 19 07:17:22 EDT 2007


John Machin wrote:
> The approach that I've adopted is to test the values in a column for all 
> types, and choose the non-text type that has the highest success rate 
> (provided the rate is greater than some threshold e.g. 90%, otherwise 
> it's text).
> 
> For large files, taking a 1/N sample can save a lot of time with little 
> chance of misdiagnosis.


Why stop there? You could lower the minimum 1/N by straightforward 
application of Bayesian statistics, using results from previous tables 
as priors.


James



More information about the Python-list mailing list