Need help with a program

Mensanator mensanator at aol.com
Thu Jan 28 13:38:17 EST 2010


On Jan 28, 12:28 pm, Steven Howe <howe.ste... at gmail.com> wrote:
> On 01/28/2010 09:49 AM, Jean-Michel Pichavant wrote:
>
>
>
>
>
> > evilweasel wrote:
> >> I will make my question a little more clearer. I have close to 60,000
> >> lines of the data similar to the one I posted. There are various
> >> numbers next to the sequence (this is basically the number of times
> >> the sequence has been found in a particular sample). So, I would need
> >> to ignore the ones containing '0' and write all other sequences
> >> (excluding the number, since it is trivial) in a new text file, in the
> >> following format:
>
> >>> seq59902
> >> TTTTTTTATAAAATATATAGT
>
> >>> seq59903
> >> TTTTTTTATTTCTTGGCGTTGT
>
> >>> seq59904
> >> TTTTTTTGGTTGCCCTGCGTGG
>
> >>> seq59905
> >> TTTTTTTGTTTATTTTTGGG
>
> >> The number next to 'seq' is the line number of the sequence. When I
> >> run the above program, what I expect is an output file that is similar
> >> to the above output but with the ones containing '0' ignored. But, I
> >> am getting all the sequences printed in the file.
>
> >> Kindly excuse the 'newbieness' of the program. :) I am hoping to
> >> improve in the next few months. Thanks to all those who replied. I
> >> really appreciate it. :)
> > Using regexp may increase readability (if you are familiar with it).
> > What about
>
> > import re
>
> > output = open("sequences1.txt", 'w')
>
> > for index, line in enumerate(open(sys.argv[1], 'r')):
> >    match = re.match('(?P<sequence>[GATC]+)\s+1')
> >    if match:
> >        output.write('seq%s\n%s\n' % (index, match.group('sequence')))
>
> > Jean-Michel
>
> Finally!
>
> After ready 8 or 9 messages about find a line ending with '1', someone
> suggests Regex.
> It was my first thought.

And as a first thought, it is, of course, wrong.

You don't want lines ending in '1', you want ANY non-'0' amount.

Likewise, you don't want to exclude lines ending in '0' because
you'll end up excluding counts of 10, 20, 30, etc.

You need a regex that extracts ALL the numeric characters at the end
of the
line and exclude those that evaluate to 0.

>
> Steven




More information about the Python-list mailing list