Need help with a program

Steven Howe howe.steven at gmail.com
Thu Jan 28 13:28:54 EST 2010


On 01/28/2010 09:49 AM, Jean-Michel Pichavant wrote:
> evilweasel wrote:
>> I will make my question a little more clearer. I have close to 60,000
>> lines of the data similar to the one I posted. There are various
>> numbers next to the sequence (this is basically the number of times
>> the sequence has been found in a particular sample). So, I would need
>> to ignore the ones containing '0' and write all other sequences
>> (excluding the number, since it is trivial) in a new text file, in the
>> following format:
>>
>>> seq59902
>> TTTTTTTATAAAATATATAGT
>>
>>> seq59903
>> TTTTTTTATTTCTTGGCGTTGT
>>
>>> seq59904
>> TTTTTTTGGTTGCCCTGCGTGG
>>
>>> seq59905
>> TTTTTTTGTTTATTTTTGGG
>>
>> The number next to 'seq' is the line number of the sequence. When I
>> run the above program, what I expect is an output file that is similar
>> to the above output but with the ones containing '0' ignored. But, I
>> am getting all the sequences printed in the file.
>>
>> Kindly excuse the 'newbieness' of the program. :) I am hoping to
>> improve in the next few months. Thanks to all those who replied. I
>> really appreciate it. :)
> Using regexp may increase readability (if you are familiar with it). 
> What about
>
> import re
>
> output = open("sequences1.txt", 'w')
>
> for index, line in enumerate(open(sys.argv[1], 'r')):
>    match = re.match('(?P<sequence>[GATC]+)\s+1')
>    if match:
>        output.write('seq%s\n%s\n' % (index, match.group('sequence')))
>
>
> Jean-Michel

Finally!

After ready 8 or 9 messages about find a line ending with '1', someone 
suggests Regex.
It was my first thought.

Steven




More information about the Python-list mailing list