converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

John Machin sjmachin at lexicon.net
Sat May 19 21:16:06 EDT 2007


On 19/05/2007 3:14 PM, Paddy wrote:
> On May 19, 12:07 am, py_genetic <conor.robin... at gmail.com> wrote:
>> Hello,
>>
>> I'm importing large text files of data using csv.  I would like to add
>> some more auto sensing abilities.  I'm considing sampling the data
>> file and doing some fuzzy logic scoring on the attributes (colls in a
>> data base/ csv file, eg. height weight income etc.) to determine the
>> most efficient 'type' to convert the attribute coll into for further
>> processing and efficient storage...
>>
>> Example row from sampled file data: [ ['8','2.33', 'A', 'BB', 'hello
>> there' '100,000,000,000'], [next row...] ....]
>>
>> Aside from a missing attribute designator, we can assume that the same
>> type of data continues through a coll.  For example, a string, int8,
>> int16, float etc.
>>
>> 1. What is the most efficient way in python to test weather a string
>> can be converted into a given numeric type, or left alone if its
>> really a string like 'A' or 'hello'?  Speed is key?  Any thoughts?
>>
>> 2. Is there anything out there already which deals with this issue?
>>
>> Thanks,
>> Conor
> 
> You might try investigating what can generate your data. With luck,
> it could turn out that the data generator is methodical and column
> data-types are consistent and easily determined by testing the
> first or second row. At worst, you will get to know how much you
> must check for human errors.
> 

Here you go, Paddy, the following has been generated very methodically; 
what data type is the first column? What is the value in the first 
column of the 6th row likely to be?

"$39,082.00","$123,456.78"
"$39,113.00","$124,218.10"
"$39,141.00","$124,973.76"
"$39,172.00","$125,806.92"
"$39,202.00","$126,593.21"

N.B. I've kindly given you five lines instead of one or two :-)

Cheers,
John



More information about the Python-list mailing list