[Tutor] make code less ugly/more pythonic?

Wed Dec 10 13:47:18 EST 2003

On 10 Dec 2003, ashleigh smythe <- absmythe at ucdavis.edu wrote:

> Greetings!  I've written a little script to process a text file.  It

Daniel wrote something about your script; there's nothing more to say so
let me just show you an alternative way.

[...]
> The program simply parses a file that has 10,000 iterations of the
> following:

> Strict consensus of 1 tree:

> Statistics derived from consensus tree:

>      Component information (consensus fork) = 185 (normalized = 0.964)
>      Nelson-Platnick term information = 3855
>      Nelson-Platnick total information = 4040
>      Mickevich's consensus information = 0.183
>      Colless weighted consensus fork (proportion max. information) =
> 0.216
>      Schuh-Farris levels sum = 0 (normalized = 0.000)
>      Rohlf's CI(1) = 0.989
>      Rohlf's -ln CI(2) = 948.111 (CI(2) = 0.00)

> I want to extract that Component information (consensus fork) value (185
> in this instance) from each of the 10,000 "paragraphs" of data in the
> file and get an average of that value.

[Code]

To extract the information I would like you use a regexp.  Then if a
match is found the number should be returned and it is added to an
accumulator (you added it to a list; I see no need to do that, as you
don't need the numbers if you only want to compute the average
(arithmetic mean)).  You just have to know how much numbers have been
found.

I'll show you two solutions.  They are similar but the second one is
nicer IMO.

********************************************************************
import sre

reg = sre.compile(r'Component information \(consensus fork\) = (\d+)')
fil = file_to_search

def extract_number(fil, reg):
    acc = 0
    freq = 0
    f = open(fil)
    r = sre.compile(reg)
    for line in f:
        m = r.search(line)
        if m:
            acc += int(m.group(1))
            freq += 1
    f.close()
    return (acc, freq)

summ = extract_number(fil, reg)

print 'Average is %s' % (float(summ[0])/summ[1])
********************************************************************

Now the second one; it uses a feature called generator.  I'm not sure if
you already know about them; you use them always if you iterate over a
file.  Here it's used to obtain the numbers from the file.

********************************************************************
import sre

reg = sre.compile(r'Component information \(consensus fork\) = (\d+)')
fil = file_to_search

def extract_number(fil, reg):
    f = open(fil)
    r = sre.compile(reg)
    acc = 0
    for line in f:
        m = r.search(line)
        if m:
            acc += int(m.group(1)) 
            yield acc
    f.close()

summ = freq = 0

for frq, num in enumerate(extract_number(fil, re)):
    summ = num
    freq = frq

print 'Average is %s' % (float(summ)/(freq+1))
********************************************************************

   Karl
-- 
Please do *not* send copies of replies to me.
I read the list