Regular expression

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Wed Jul 16 12:10:44 EDT 2008


On Jul 16, 4:14 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> Beema shafreen wrote:
> > How do I write a regular expression for this kind of sequences
>
> >  >gi|158028609|gb|ABW08583.1| CG8385-PF, isoform F [Drosophila melanogaster]
> > MGNVFANLFKGLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVE
>
> line.split("|") ?
>
> it's a bit hard to come up with a working RE with only a single sample;
> what are the constraints for the different fields?  is the last part
> free form text or something else, etc.
>
> have you googled for existing implementations of the format you're using?

That'a a fasta file, so for the header line this is enough:
[part.strip() for part in line.split("|")]
But better is to use the biopython libs that already perform all such
things better.

Bye,
bearophile



More information about the Python-list mailing list