very high-level IO functions?

Tom Anderson twic at urchin.earth.li
Tue Sep 20 06:00:01 EDT 2005


On Mon, 19 Sep 2005, Bruno Desthuilliers wrote:

> York a écrit :
> (snip)
>
>> I love python. However, as a biologist, I like some high-levels 
>> functions in R. I don't want to spend my time on parse a data file.
>
> http://www.python.org/doc/current/lib/module-csv.html
>
>> Then in my python script, I call R to read data file and write them 
>> into an MySQL table. If python can do this easily, I don't need R at 
>> all.
>
> So you don't need R at all.

Did you even read the OP's post? Specifically, this bit:

R language has very high-level IO functions, its read.table can read a 
total .csv file and recogonize the types of each column.
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Python's csv module gives you tuples of strings; it makes no effort to 
recognise the types of the data. AFAIK, python doesn't have any IO 
facilities like this.

Larry's point that automagical type detection is risky because it can make 
mistakes is a good one, but that doesn't mean that magic is useless - on 
the contrary, for the majority of cases, it works fine, and is extremely 
convenient.

The good news is that it's reasonably easy to write such a function: you 
just need a function 'type_convert' which takes a string and returns an 
object of the right type; then you can do:

import csv

def read_table(f):
 	for row in csv.reader(f):
 		yield map(type_convert, row)

This is a very, very rough cut - it doesn't do comment stripping, skipping 
blank lines, dealing with the presence of a header line or the use of 
different separators, etc, but all that's pretty easy to add. Also, note 
that this returns an iterator rather than a list; use list(read_table(f)) 
if you want an actual list, or change the implementation of the function.

type_convert is itself fairly simple:

def _bool(s): # helper method for booleans
 	s = s.lower()
 	if (s == "true"): return True
 	elif (s == "false"): return False
 	else: raise ValueError, s

types = (int, float, complex, _bool, str)

def type_convert(s):
 	for type in types:
 		try:
 			return type(s)
 		except ValueError:
 			pass
 	raise ValueError, s

This whole thing isn't quite as sophisticated as R's table.convert; R 
reads the whole table in, then tries to find a type for each column which 
will fit all the values in that column, whereas i do each cell 
individually. Again, it wouldn't be too hard to do this the other way 
round.

Anyway, hope this helps. Bear in mind that there are python bindings for 
the R engine, so you could just use R's version of read.table in python.

tom

-- 
Don't trust the laws of men. Trust the laws of mathematics.


More information about the Python-list mailing list