[Tutor] Concatenating multiple lines into one

Peter Otten __peter__ at web.de
Fri Feb 10 18:08:28 CET 2012


Spyros Charonis wrote:

> Dear python community,
> 
> I have a file where I store sequences that each have a header. The
> structure of the file is as such:
> 
>>sp|(some code) =>1st header
> ATTTTGGCGG
> MNKPLOI
> .....
> .....
> 
>>sp|(some code) => 2nd header
> AAAAAA
> GGGG ...
> .........
> 
> ......
> 
> I am looking to implement a logical structure that would allow me to group
> each of the sequences (spread on multiple lines) into a single string. So
> instead of having the letters spread on multiple lines I would be able to
> have 'ATTTTGGCGGMNKP....' as a single string that could be indexed.
> 
> This snipped is good for isolating the sequences (=stripping headers and
> skipping blank lines) but how could I concatenate each sequence in order
> to get one string per sequence?
> 
>>>> for line in align_file:
> ...     if line.startswith('>sp'):
> ...             continue
> ...     elif not line.strip():
> ...             continue
> ...     else:
> ...             print line
> 
> (... is just OS X terminal notation, nothing programmatic)
> 
> Many thanks in advance.

Instead of printing the line directly collect it in a list (without trailing 
"\n"). When you encounter a line starting with ">sp" check if that list is 
non-empty, and if so print "".join(parts), assuming the list is called 
parts, and start with a fresh list. Don't forget to print any leftover data 
in the list once the for loop has terminated.



More information about the Tutor mailing list