[Tutor] code improvement for beginner ?

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Sun Oct 9 08:00:40 CEST 2005



On Sat, 8 Oct 2005, lmac wrote:

> Ok. Here we go. Wanted to start my page long ago. Now is the right time.
>
> http://daderoid.freewebspace24.de/python/python1.html

Hi lmac,

I'll pick out some stuff that I see; I'm sure others will be happy to give
comments too.  I'll try to make sure that all the criticism I give is
constructive in nature, and if you have questions on any of it, please
feel free to ask about it.


I'll concentrate on the imgreg() function first.  The declaration of
'images' as a global variable looks a little weird.  I do see that
'imgreg' feeds into 'images'.  Not a major issue so far, but you might
want to see if it's possible to do without the global, and explicitly pass
in 'images' as another parameter to imgreg.  Globals just bother me on
principle.  *grin*


You may want to document what 'patt' and 'search' are meant to be.  A
comment at the top of imgreg, like:

     """imgreg searches for a pattern 'patt' within the text 'search'.  If
     a match exists, adds it to the set of images, and returns 1.  Else,
     returns 0."""

will help a lot.  Documenting the intent of a function is important,
because people are forgetful.  No programming language can prevent memory
loss:  what we should try to do is to compensate for our forgetfulness.
*grin*


Looking at pageimgs(): I'm not sure what 't' means in the open statement:

	f = open(filename, "rt")

and I think that 't' might be a typo: I'm surprised that Python doesn't
complain.  Can anyone confirm this?  I think you may have tried to do "r+"
mode, but even then, you probably don't: you're just reading from the
file, and don't need to write back to it.


Looking further into pageimgs(): again, I get nervous about globals.  The
use of the 'r1' global variable is mysterious.  I had to hunt around to
figure out what it was it near the middle of the program.

If anything, I'd recommend naming your global variables with more meaning.
A name like 'image_regex_patterns' will work better than 'r1'.  Also, it
looks like pageimgs() is hardcoded to assume 'r1' has three regular
expressions in it, as it calls imgreg three times for each pattern in r1.

		if imgreg(r1[0],a) == 1:
			continue
		if imgreg(r1[1],a) == 1:
			continue
		imgreg(r1[2],a)

and that looks peculiar.  Because this snippet of code is also
copy-and-pasted around line 106, it appears to be a specific kind of
conceptual task that you're doing to register images.

I think that the use of 'r1' and 'imgreg' should be intrinsically tied.
I'd recommend revising imgreg() so that when we register images, we don't
have to worry that we've called it on all the regular expressions in r1.
That is, let imgreg worry about it, not clients: have imgreg go through
r1[0:3] by itself.


If we incorporate these changes, the result might look something like
this:

###################################################
image_regex_patterns = map(re.compile,
                           [r'http://\w+.\w+.\w+/i.+.gif',
                            r'http://\w+.\w+.\w+/i.+.jpg',
                            r'http://\w+.\w+.\w+/i.+.png'])
def imgreg(search):
    """Given a search text, looks for image urls for registration.  If
a new one can be found, returns 1.  Otherwise, returns 0.

Possible bug: does not register all images in the search text, but only
the first new one it can find.
"""
    for r in image_regex_patterns:
        z = r.search(search)
        if z != None:
            x = z.group(0)
            if x not in images:
                images.append(x)
                return 1
    return 0
###################################################

Does this make sense?

The point of this restructuring is to allow you to add more image types
without too much pain, since there's no more hardcoded array indexing
against r1.  It also simplifies to calls to imgreg from:

    if imgreg(r1[0],a) == 1:
        continue
    if imgreg(r1[1],a) == 1:
        continue
    imgreg(r1[2],a)

to the simpler:

    imgreg(a)


I think I'll stop there and look at the program again later.  *grin* Hope
this helps!



More information about the Tutor mailing list