Most efficient way to replace "," with "." in a array and/or dataframe

MRAB python at mrabarnett.plus.com
Sat Sep 21 21:50:54 EDT 2019


On 2019-09-22 00:42, Markos wrote:
> Hi,
> 
> I have a table.csv file with the following structure:
> 
> , Polyarene conc ,, mg L-1 ,,,,,,,
> Spectrum, Py, Ace, Anth,
> 1, "0,456", "0,120", "0,168"
> 2, "0,456", "0,040", "0,280"
> 3, "0,152", "0,200", "0,280"
> 
> I open as dataframe with the command:
> 
> data = pd.read_csv ('table.csv', sep = ',', skiprows = 1)
> 
> and the variable "data" has the structure:
> 
> Spectrum,  Py,  Ace, Anth,
> 0  1         0,456  0,120  0,168
> 1  2         0,456 0,040 0,280
> 2  3         0,152 0,200 0,280
> 
> I copy the numeric fields to an array with the command:
> 
> data_array = data.values [:, 1:]
> 
> And the data_array variable gets the fields in string format:
> 
> [['0,456' '0,120' '0,168']
> ['0,456' '0,040' '0,280']
> ['0,152' '0,200' '0,280']]
> 
> The only way I found to change comma "," to dot "." was using the method
> replace():
> 
> for i, line in enumerate (data_array):
> data_array [i] = ([float (element.replace (',', '.')) for element in
> data_array [i]])
> 
> But I'm wondering if there is another, more "efficient" way to make this
> change without having to "iterate" all elements of the array with a loop
> "for".
> 
> Also I'm also wondering if there would be any benefit of making this
> modification in dataframe before extracting the numeric fields to the array.
> 
> Please, any comments or tip?
> 
I'd suggest doing all of the replacements in the CSV file first, 
something like this:

import re

with open('table.csv') as file:
     csv_data = file.read()

# Convert the decimal points and also make them look numeric.
csv_data = re.sub(r'"(-?\d+),(\d+)"', r'\1.\2', csv_data)

with open('fixed_table.csv', 'w') as file:
     file.write(csv_data)



More information about the Python-list mailing list