converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

John Machin sjmachin at lexicon.net
Sat May 19 20:18:09 EDT 2007


On 19/05/2007 9:17 PM, James Stroud wrote:
> John Machin wrote:
>> The approach that I've adopted is to test the values in a column for 
>> all types, and choose the non-text type that has the highest success 
>> rate (provided the rate is greater than some threshold e.g. 90%, 
>> otherwise it's text).
>>
>> For large files, taking a 1/N sample can save a lot of time with 
>> little chance of misdiagnosis.
> 
> 
> Why stop there? You could lower the minimum 1/N by straightforward 
> application of Bayesian statistics, using results from previous tables 
> as priors.
> 

The example I gave related to one file out of several files prepared at 
the same time by the same organisation from the same application by the 
same personnel using the same query tool for a yearly process which has 
been going on for several years. All files for a year should be in the 
same format, and the format should not change year by year, and the 
format should match the agreed specifications ... but this doesn't 
happen. Against that background, please explain to me how I can use 
"results from previous tables as priors".

Cheers,
John



More information about the Python-list mailing list