Splitting a file from specific column content

Sun Jan 22 10:19:49 EST 2012

On 22/01/2012 14:32, Yigit Turgut wrote:
> Hi all,
>
> I have a text file approximately 20mb in size and contains about one
> million lines. I was doing some processing on the data but then the
> data rate increased and it takes very long time to process. I import
> using numpy.loadtxt, here is a fragment of the data ;
>
> 0.000006 	 -0.0004
> 0.000071 	 0.0028
> 0.000079 	 0.0044
> 0.000086 	 0.0104
> .
> .
> .
>
> First column is the timestamp in seconds and second column is the
> data. File contains 8seconds of measurement, and I would like to be
> able to split the file into 3 parts seperated from specific time
> locations. For example I want to divide the file into 3 parts, first
> part containing 3 seconds of data, second containing 2 seconds of data
> and third containing 3 seconds. Splitting based on file size doesn't
> work that accurately for this specific data, some columns become
> missing and etc. I need to split depending on the column content ;
>
> 1 - read file until first character of column1 is 3 (3 seconds)
> 2 - save this region to another file
> 3 - read the file where first characters  of column1 are between 3 to
> 5 (2 seconds)
> 4 - save this region to another file
> 5 - read the file where first characters  of column1 are between 5 to
> 5 (3 seconds)
> 6 - save this region to another file
>
> I need to do this exactly because numpy.loadtxt or genfromtxt doesn't
> get well with missing columns / rows. I even tried the invalidraise
> parameter of genfromtxt but no luck.
>
> I am sure it's a few lines of code for experienced users and I would
> appreciate some guidance.
>
Here's a solution in Python 3:

input_path = "..."
section_1_path = "..."
section_2_path = "..."
section_3_path = "..."

with open(input_path) as input_file:
     try:
         line = next(input_file)

         # Copy section 1.
         with open(section_1_path, "w") as output_file:
             while line[0] < "3":
                 output_file.write(line)
                 line = next(input_file)

         # Copy section 2.
         with open(section_2_path, "w") as output_file:
             while line[5] < "5":
                 output_file.write(line)
                 line = next(input_file)

         # Copy section 3.
         with open(section_3_path, "w") as output_file:
             while True:
                 output_file.write(line)
                 line = next(input_file)
     except StopIteration:
         pass