[Tutor] Concatenating multiple lines into one

Mark Lawrence breamoreboy at yahoo.co.uk
Fri Feb 10 18:30:27 CET 2012


On 10/02/2012 17:08, Peter Otten wrote:
> Spyros Charonis wrote:
>
>> Dear python community,
>>
>> I have a file where I store sequences that each have a header. The
>> structure of the file is as such:
>>
>>> sp|(some code) =>1st header
>> ATTTTGGCGG
>> MNKPLOI
>> .....
>> .....
>>
>>> sp|(some code) =>  2nd header
>> AAAAAA
>> GGGG ...
>> .........
>>
>> ......
>>
>> I am looking to implement a logical structure that would allow me to group
>> each of the sequences (spread on multiple lines) into a single string. So
>> instead of having the letters spread on multiple lines I would be able to
>> have 'ATTTTGGCGGMNKP....' as a single string that could be indexed.
>>
>> This snipped is good for isolating the sequences (=stripping headers and
>> skipping blank lines) but how could I concatenate each sequence in order
>> to get one string per sequence?
>>
>>>>> for line in align_file:
>> ...     if line.startswith('>sp'):
>> ...             continue
>> ...     elif not line.strip():
>> ...             continue
>> ...     else:
>> ...             print line
>>
>> (... is just OS X terminal notation, nothing programmatic)
>>
>> Many thanks in advance.
>
> Instead of printing the line directly collect it in a list (without trailing
> "\n"). When you encounter a line starting with">sp" check if that list is
> non-empty, and if so print "".join(parts), assuming the list is called
> parts, and start with a fresh list. Don't forget to print any leftover data
> in the list once the for loop has terminated.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

The advice from Peter is sound if the strings could grow very large but 
you can simply concatenate the parts if they are not.  For the indexing 
simply store your data in a dict.

-- 
Cheers.

Mark Lawrence.



More information about the Tutor mailing list