"Python for Bioinformatics" available and in stock

Mon Oct 19 04:43:59 EDT 2009

Sebastian Bassi, this is an piece from the #5:

ProtSeq = raw_input("Protein sequence: ").upper()
ProtDeg = {"A":4,"C":2,"D":2,"E":2,"F":2,"G":4,"H":2,
           "I":3,"K":2,"L":6,"M":1,"N":2,"P":4,"Q":2,
           "R":6,"S":6,"T":4,"V":4,"W":1,"Y":2}
SegsValues = []
for aa in range(len(ProtSeq)):

A more pythonic code is:

prot_seq = raw_input("Protein sequence: ").upper()
prot_deg = {...
segs_values = []
for aa in xrange(len(prot_seq)):

Note the use of xrange and names_with_underscores. In Python names are
usually lower case and their parts are separated by underscores.

>From #6:

segsvalues=[]; segsseqs=[]; segment=protseq[:15]; a=0
==>
segs_values = []
segs_seqs = []
segment = prot_seq[:15]
a = 0

If you want to limit the space in the book the you can pack those
lines in a single line, but it's better to keep the underscores.

>From #18:
prop = 100.*cp/len(AAseq)
return (charge,prop)
==>
prop = 100.0 * cp / len(aa_seq)
return (charge, prop)

Adding spaces between operators and after a comma, and a zero after
the point improves readability.

>From #35:
import re
pattern = "[LIVM]{2}.RL[DE].{4}RLE"
...
rgx = re.compile(pattern)
When the pattern gets more complex it's better to show readers to use
a re.VERBOSE pattern, to split it on more lines, indent those lines as
a program, and add #comments to those lines.

The #51 is missing.

I like Python and I think Python is fit for bioinformatics purposes,
but 3/4 of the purposes of a book like this are to teach
bioinformatics first and computer science and Python second. And
sometimes a dynamic language isn't fast enough for bioinformatics
purposes, so a book about this topic probably has to contain some
pieces of C/D/Java code too, to show and discuss implementations of
algorithms that require more heavy computations (that are often
already implemented inside biopython, etc, but someone has to write
those libs too).
The purpose here is not to teach how to write industrial-strength C
libraries to perform those heavier computations, but to give the
reader an idea (in a real lower-level language) how those libraries
are actually implemented. Because science abhors black boxes, a
scientist must have an idea of how all subsystems she/he/hir is using
are working inside (that's why using Mathematica can be bad for a
scientist, because such person has to write "and here magic happens"
in the produced paper).

Bye,
bearophile