What strategy for random accession of records in massive FASTA file?

David E. Konerding DSD staff dek at pabst.lbl.gov
Wed Jan 12 19:19:47 EST 2005


In article <1105569967.129284.85470 at c13g2000cwb.googlegroups.com>, Chris Lasher wrote:
> Hello,
> I have a rather large (100+ MB) FASTA file from which I need to
> access records in a random order. The FASTA format is a standard format
> for storing molecular biological sequences. Each record contains a
> header line for describing the sequence that begins with a '>'
> (right-angle bracket) followed by lines that contain the actual
> sequence data. Three example FASTA records are below:

Use biopython.  They have dictionary-style classes which wrap FASTA files using indexes.

http://www.biopython.org

Dave



More information about the Python-list mailing list