issue in handling CSV data

Sharan Basappa sharan.basappa at gmail.com
Sat Sep 7 22:12:05 EDT 2019


On Saturday, 7 September 2019 21:18:11 UTC-4, MRAB  wrote:
> On 2019-09-08 01:19, Sharan Basappa wrote:
> > I am trying to read a log file that is in CSV format.
> > 
> > The code snippet is below:
> > 
> > ###############################
> > import matplotlib.pyplot as plt
> > import seaborn as sns; sns.set()
> > import numpy as np
> > import pandas as pd
> > import os
> > import csv
> > from numpy import genfromtxt
> > 
> > # read the CSV and get into X array
> > os.chdir(r'D:\Users\sharanb\OneDrive - HCL Technologies Ltd\Projects\MyBackup\Projects\Initiatives\machine learning\programs\constraints')
> > X = []
> > #with open("constraints.csv", 'rb') as csvfile:
> > #    reader = csv.reader(csvfile)
> > #    data_as_list = list(reader)
> > #myarray = np.asarray(data_as_list)
> > 
> > my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> > print (my_data)
> > 
> > my_data_1 = np.delete(my_data, 0, axis=1)
> > print (my_data_1)
> > 
> > my_data_2 = np.delete(my_data_1, 0, axis=1)
> > print (my_data_2)
> > 
> > my_data_3 = my_data_2.astype(np.float)
> > ################################
> > 
> > Here is how print (my_data_2) looks like:
> > ##############################
> > [['"\t"81' '"\t5c']
> >   ['"\t"04' '"\t11']
> >   ['"\t"e1' '"\t17']
> >   ['"\t"6a' '"\t6c']
> >   ['"\t"53' '"\t69']
> >   ['"\t"98' '"\t87']
> >   ['"\t"5c' '"\t4b']
> > ##############################
> > 
> > Finally, I am trying to get rid of the strings and get array of numbers using Numpy's astype function. At this stage, I get an error.
> > 
> > This is the error:
> > my_data_3 = my_data_2.astype(np.float)
> > could not convert string to float: " "81
> > 
> > As you can see, the string "\t"81 is causing the error.
> > It seems to be due to char "\t".
> > 
> > I don't know how to resolve this.
> > 
> > Thanks for your help.
> > 
> Are you sure it's CSV (Comma-Separated Value) and not TSV (Tab-Separated 
> Value)?
> 
> Also the values look like hexadecimal to me. I think that 
> .astype(np.float) assumes that the values are decimal.
> 
> I'd probably start by reading them using the csv module, convert the 
> values to decimal, and then pass them on to numpy.

yes. it is CSV. The commas are gone once csv.reader processed the csv file.
The tabs seem to be there also which seem to be causing the issue.

Thanks for your response



More information about the Python-list mailing list