[Baypiggies] reading files quickly and efficiently

Mark Voorhies mvoorhie at yahoo.com
Wed Nov 17 22:30:08 CET 2010


On Wednesday, November 17, 2010 01:18:34 pm Tony Cappellini wrote:
> Don't read the entire file into memory.
> 
> readlines() does that.
> 
> Take a look at Dave Beazely's slides on generators and how he
> processes multi-GB sized  files.
> http://www.dabeaz.com/generators/
> 

For NR, it can also be convenient to convert the FASTA to BLAST
database format (via formatdb or downloading the pre-generated
databases from NCBI) and extract sequences with fastacmd
(formatdb and fastacmd are both included in the NCBI BLAST
package).

--Mark


More information about the Baypiggies mailing list