Need help with a program

Thu Jan 28 16:22:10 EST 2010

Arnaud Delobelle wrote:
> nn <pruebauno at latinmail.com> writes:
>
> > On Jan 28, 10:50 am, evilweasel <karthikramaswam... at gmail.com> wrote:
> >> I will make my question a little more clearer. I have close to 60,000
> >> lines of the data similar to the one I posted. There are various
> >> numbers next to the sequence (this is basically the number of times
> >> the sequence has been found in a particular sample). So, I would need
> >> to ignore the ones containing '0' and write all other sequences
> >> (excluding the number, since it is trivial) in a new text file, in the
> >> following format:
> >>
> >> >seq59902
> >>
> >> TTTTTTTATAAAATATATAGT
> >>
> >> >seq59903
> >>
> >> TTTTTTTATTTCTTGGCGTTGT
> >>
> >> >seq59904
> >>
> >> TTTTTTTGGTTGCCCTGCGTGG
> >>
> >> >seq59905
> >>
> >> TTTTTTTGTTTATTTTTGGG
> >>
> >> The number next to 'seq' is the line number of the sequence. When I
> >> run the above program, what I expect is an output file that is similar
> >> to the above output but with the ones containing '0' ignored. But, I
> >> am getting all the sequences printed in the file.
> >>
> >> Kindly excuse the 'newbieness' of the program. :) I am hoping to
> >> improve in the next few months. Thanks to all those who replied. I
> >> really appreciate it. :)
> >
> > People have already given you some pointers to your problem. In the
> > end you will have to "tweak the details" because only you have access
> > to the data not us.
> >
> > Just as example here is another way to do what you are doing:
> >
> > with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
> >    partgen=(line.split() for line in infile)
> >    dnagen=(str(i+1)+'\n'+part[0]+'\n'
> >            for i,part in enumerate(partgen)
> >            if len(part)>1 and part[1]!='0')
> >    outfile.writelines(dnagen)
>
> I think that generator expressions are overrated :) What's wrong with:
>
> with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
>     for i, line in enumerate(infile):
>         parts = line.split()
>         if len(parts) > 1 and parts[1] != '0':
>             outfile.write(">seq%s\n%s\n" % (i+1, parts[0]))
>
> (untested)
>
> --
> Arnaud

Nothing really,
After posting I was thinking I should have posted a more
straightforward version like the one you wrote. Now there is! It
probably is more efficient too. I just have a tendency to think in
terms of pipes: "pipe this junk in here, then in here, get output".
Probably damage from too much Unix scripting.Since I can't resist the
urge to post crazy code here goes the bonus round (don't do this at
work):

open('dnaout.dat','w').writelines(
   'seq%s\n%s\n'%(i+1,part[0])
   for i,part in enumerate(line.split() for line in open('dnain.dat'))
   if len(part)>1 and part[1]!='0')