What strategy for random accession of records in massive FASTA file?

Chris Lasher chris.lasher at gmail.com
Thu Jan 13 10:52:48 EST 2005


>Before you get too carried away, how often do you want to do this and
>how grunty is the box you will be running on?

Oops, I should have specified this. The script will only need to be run
once every three or four months, when the sequences are updated. I'll
be running it on boxes that are 3GHz/100GB Ram, but others may not be
so fortunate, and so I'd like to keep that in mind.

>BTW, you need to clarify "don't have access to an RDBMS" ... surely
>this can only be due to someone stopping them from installing good
>free software freely available on the Internet.

I understand your and others' sentiment on this. I agree, the
open-source database systems are wonderful. However, keeping it
Python-only saves me hassle by only having to assist in instances where
others need help downloading and installing Python. I suppose if I keep
it in Python, I can even use Py2exe to generate an executable that
wouldn't even require them to install Python. A solution using
interaction with a database is much sexier, but, for the purposes of
the script, seems unnecesary. However, I certainly appreciate the
suggestions.

>My guess is that you don't need anything much fancier than the
>effbot's index method -- which by now you have probably found works
>straight out of the box and is more than fast enough for your needs.

I think Mr. Lundh's code will work very well for these purposes. Thanks
very much to him for posting it. Many thanks for posting that! You'll
have full credit for that part of the code. Thanks very much to all who
replied!

Chris




More information about the Python-list mailing list