Reading data from 2 different files and writing to a single file

Wed Jan 30 08:23:05 EST 2013

On 01/30/2013 05:43 AM, inshu chauhan wrote:
> On Mon, Jan 28, 2013 at 6:05 PM, Dennis Lee Bieber <wlfraed at ix.netcom.com>wrote:
>
>> On Mon, 28 Jan 2013 14:31:31 +0100, inshu chauhan
>> <insideshoes at gmail.com> declaimed the following in
>> gmane.comp.python.general:
>>
>>> In the code below I am trying to read 2 files f1 and f2 , extract some
>> data
>>> from them and then trying to write them into a single file that is 'nf'.
>>>
>>> import cv
>>> f1 = open(r"Z:\modules\Feature_Vectors_300.arff")
>>> f2 = open(r"Z:\modules\Feature_Vectors_300_Pclass.arff")
>>> nf = open(r"Z:\modules\trial.arff", "w")
>>>
>>>
>>> for l in f1:
>>>      sp = l.split(",")
>>
>>          If you are going to be splitting on commas, you might want to read
>> up on the csv (comma separate values) module
>>
>
> The  csv module has many fuctions but not of much use to me and it makes my
> programme slower
>
>>
>>>
>>>      if len(sp)!= 12:
>>>          continue
>>>      else:
>>
>>          Given the apparent block structure, you could drop the
>> continue/else, and more cleanly just use
>>
>
> Yeah, Thats Right
>
>>
>>                  if len(sp) == 12:
>>>          ix = sp[0].strip()
>>>          iy = sp[1].strip()
>>>          print ix, iy
>>>
>>>         for s in f2:
>>
>>          It's been mentioned that the indentation is wrong here
>>
>
> I dont know why the indentation is wrong ?

Your for statement is not lined up with the print that precedes it.  If 
your code were really that way, you'd be getting an indentation error. 
So we assume it's because your email editor is mangling the code.  Post 
in text email, not in html.

>
>>
>>>              st = s.split(",")
>>>
>>          csv module again
>>
>>>              if len(st)!= 11:
>>>                  continue
>>>              else:
>>
>>          I'm tempted to repeat the comment on reversing the conditional BUT
>>
>>>                  clas = st[10].strip()
>>>
>>>               print ix, iy, clas
>>>               print >> nf, ix, iy, clas
>>>
>>          The indentation of the print statements is not aligned with the
>> previous assignment -- the effect is the same however as everything
>> under the else is executed anyway.
>>
>>          But as has also been mentioned, ignoring indentation, the apparent
>> algorithm you have here is going to process every line of f2 for the
>> first line of f1 -- and then for later lines in f1 it will find f2 is at
>> the end of file, and do nothing. If it is supposed to process every line
>> of f2 for each line of f1, you'll need to rewind f2.
>>
>
> For that I added 'Break' statement as suggested by Chris in above mails.
>
>>
>>          If you mean to match one line of f1 with one line of f2, you do not
>> want nested loops. But now you have to define the behavior if one of the
>> two files is correct length and the other is not? Do you skip both or
>> read the next line from the wrong length file? And how will you handle
>> files with different numbers of records.
>>
>
> Yes , actually my Prog was like this :
> for l in f1:
>      sp = l.split(",")
>
>      if len(sp)!= 12:
>          continue
>      else:
>          ix = sp[0].strip()
>          iy = sp[1].strip()
>
>
> for s in f2:

This is not nested, it's back at the left margin.  Or it could be 
posting wrong because you're still posting in html, instead of plain 
text email.

>      st = s.split(",")
>
>      if len(st)!= 11:
>          continue
>      else:
>          clas = st[10].strip()
>
>      print ix, iy, clas
>      print >> nf, ix, iy, clas
>      break
>
>
> f1.close()
> f2.close()
> nf.close()
>
> I actually dont want nested loops but cant find another way to achieve what
> I want, But for these files I am sure that they have equal lengths, thats
> why I am taking the risk of using nested loops.

You have that backwards.  Because you say you can assume they're the 
same length, you don't need the flexibility (and unreadability) of the 
nested approach.  The zip approach works great, and nested is unnecessary.

. Can you suggest any
> different way to go around this problem , which could be flexible and
> non-errorneous ?
>
>

-- 
DaveA