Splitting a file from specific column content

Yigit Turgut y.turgut at gmail.com
Sun Jan 22 09:32:36 EST 2012


Hi all,

I have a text file approximately 20mb in size and contains about one
million lines. I was doing some processing on the data but then the
data rate increased and it takes very long time to process. I import
using numpy.loadtxt, here is a fragment of the data ;

0.000006 	 -0.0004
0.000071 	 0.0028
0.000079 	 0.0044
0.000086 	 0.0104
.
.
.

First column is the timestamp in seconds and second column is the
data. File contains 8seconds of measurement, and I would like to be
able to split the file into 3 parts seperated from specific time
locations. For example I want to divide the file into 3 parts, first
part containing 3 seconds of data, second containing 2 seconds of data
and third containing 3 seconds. Splitting based on file size doesn't
work that accurately for this specific data, some columns become
missing and etc. I need to split depending on the column content ;

1 - read file until first character of column1 is 3 (3 seconds)
2 - save this region to another file
3 - read the file where first characters  of column1 are between 3 to
5 (2 seconds)
4 - save this region to another file
5 - read the file where first characters  of column1 are between 5 to
5 (3 seconds)
6 - save this region to another file

I need to do this exactly because numpy.loadtxt or genfromtxt doesn't
get well with missing columns / rows. I even tried the invalidraise
parameter of genfromtxt but no luck.

I am sure it's a few lines of code for experienced users and I would
appreciate some guidance.




More information about the Python-list mailing list