Reading 'scientific' csv using Pandas?

Peter Otten __peter__ at web.de
Mon Nov 19 06:32:35 EST 2018


Martin Schöön wrote:

> My pandas is up to date.
> 
> In your example you use the US convention of using "." for decimals
> and "," to separate data. This works perfect for me too.
> 
> However, my data files use European conventions: decimal "," and TAB
> to separate data:
> 
> col1	col2
> 1,1	0
> 10,24e-05	1
> 9,492e-10	2
> 
> I use
> 
> EUData = pd.read_csv('file.csv', skiprows=1, sep='\t',
> decimal=',', engine='python')
> 
> to read from such files. This works so so. 'Common floats' (3,1415 etc)
> works just fine but 'scientific' stuff (1,6023e23) does not work.

With

>>> with open("file.csv", "w") as f:
...     f.write("col1\tcol2\n"
... "1,1\t0\n"
... "10,24e-05\t1\n"
... "9,492e-10\t2\n")
... 
40

the following works on my system:

>>> pd.read_csv("file.csv", delimiter="\t", decimal=",")
           col1  col2
0  1.100000e+00     0
1  1.024000e-04     1
2  9.492000e-10     2

[3 rows x 2 columns]

The version is a bit old, though:

>>> pd.__version__
'0.13.1'

The engine="python" produces an exception over here:

"""
ValueError: The 'decimal' option is not supported with the 'python' engine
"""

Maybe you can try and omit that option?
If that doesn't work you can specify a converter:

>>> pd.read_csv("file.csv", sep="\t", converters={0: lambda s: 
float(s.replace(",", "."))})
           col1  col2
0  1.100000e+00     0
1  1.024000e-04     1
2  9.492000e-10     2

[3 rows x 2 columns]





More information about the Python-list mailing list