converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

Paddy paddy3118 at googlemail.com
Sat May 19 01:14:40 EDT 2007


On May 19, 12:07 am, py_genetic <conor.robin... at gmail.com> wrote:
> Hello,
>
> I'm importing large text files of data using csv.  I would like to add
> some more auto sensing abilities.  I'm considing sampling the data
> file and doing some fuzzy logic scoring on the attributes (colls in a
> data base/ csv file, eg. height weight income etc.) to determine the
> most efficient 'type' to convert the attribute coll into for further
> processing and efficient storage...
>
> Example row from sampled file data: [ ['8','2.33', 'A', 'BB', 'hello
> there' '100,000,000,000'], [next row...] ....]
>
> Aside from a missing attribute designator, we can assume that the same
> type of data continues through a coll.  For example, a string, int8,
> int16, float etc.
>
> 1. What is the most efficient way in python to test weather a string
> can be converted into a given numeric type, or left alone if its
> really a string like 'A' or 'hello'?  Speed is key?  Any thoughts?
>
> 2. Is there anything out there already which deals with this issue?
>
> Thanks,
> Conor

You might try investigating what can generate your data. With luck,
it could turn out that the data generator is methodical and column
data-types are consistent and easily determined by testing the
first or second row. At worst, you will get to know how much you
must check for human errors.

- Paddy.




More information about the Python-list mailing list