[SciPy-user] What can be improved ?
Stef Mientki
S.Mientki at ru.nl
Wed May 16 05:38:53 EDT 2007
hello,
I've just written a function,
(with a lot of trial and error,
converting strings to float, reshaping arrays etc)
to read a tab delimited file, exported from Excel,
and I'm glad it's working ok now.
But I've the unpleasant feeling, that this function is written in a very
clumsy way,
so may I ask some guru for some comment about improvements.
thanks,
Stef Mientki
#
******************************************************************************
#
******************************************************************************
def Read_SenseWear_Tab_File (filename, Print_Info = False):
from scipy import *
from time import strptime
# open the data file and read the column names (and print if desired)
Datafile = open(filename,'r')
line = Datafile.readline()
column_names = line.rstrip('\n').split('\t')
if Print_Info:
for items in column_names: print items
# initialize Number of columns and an empty sample-set
N = len(column_names)
zero_vals = N * [0]
SR = 5
# read the first dataline, to determine the start time
# (we forget this first sampleset)
line = Datafile.readline()
vals = line.rstrip('\n').split('\t')
start = datetime(*strptime(vals[0][0:16], "%Y-%m-%d %H:%M")[0:6])
prev_tyd = 0 # time of the previous sample
# create an empty array
data = asarray([])
sample_reduction = asarray([])
# read and interpretate all lines in file
for line in Datafile:
# remove EOL, split the line on tabs
vals = line.rstrip('\n').split('\t')
# calculate number of minutes from start
tyd = datetime(*strptime(vals[0][0:16], "%Y-%m-%d %H:%M")[0:6])
s = tyd - start
tyd = s.seconds/60 + s.days*24*60
# if there are sample-sets missing, fill them empty sample-sets
# (beware of sample reduction)
if tyd - prev_tyd > 1:
zero_vals = (( tyd - prev_tyd )/SR) * N * [0]
data = r_[data, zero_vals]
prev_tyd = tyd # remember the time of this sample-set
vals[0] = tyd # replace the datetime with number of minutes
# be sure all lines are of equal length
# (sometimes Excel omits the last columns if they are empty)
if len(vals) < N:
vals = vals + ( N- len(vals) )*[0]
# replace empty strings, otherwise float conversion raises an error
for i in range(len(vals)):
if vals[i] == '' : vals[i] = '0'
# convert the string vector to a float vector
# VERY STRANGE: the next 2 operation may not be done at once
vals = asarray(vals)
vals = vals.astype(float)
# append new sampleset, with a sample reduction of 5
sample_reduction = r_ [ sample_reduction, vals ]
if len(sample_reduction) == SR * N:
# reshape sample array, for easy ensemble average
sample_reduction = sample_reduction.reshape(SR, N)
sample_reduction = sample_reduction.mean(0)
# add mean value of SAMPLE_REDUCTION sample-sets to the total array
# and clear the averaging sample-set
data = r_[data, sample_reduction]
sample_reduction = asarray([])
# reshape into N signal vectors
data = data.reshape(size(data)/N,N)
data = transpose(data)
return data
#
******************************************************************************
Kamer van Koophandel - handelsregister 41055629 / Netherlands Chamber of Commerce - trade register 41055629
More information about the SciPy-User
mailing list