timeit module for comparing the performance of two scripts

Tue Jul 11 18:14:50 EDT 2006

On 12/07/2006 1:33 AM, Phoe6 wrote:
> Hi,

Hi,

I'm a little astonished that anyone would worry too much (if at all!) 
about how long it took to read a config file. Generally, one would 
concentrate on correctness, and legibility of source code.  There's not 
much point IMHO in timing your pyConfig.py in its current form. Please 
consider the following interspersed comments.

Also, the functionality of the two modules that you are comparing is 
somewhat different ;-)

Cheers,
John

>        Following are my files.
> ----
> 
> pyConfig.py
> ----
> import re
> 
> def pyConfig():
>     fhandle = open( 'config1.txt', 'r' )
>     arr = fhandle.readlines()
>     fhandle.close()
> 
>     hash = {}
> 
>     for item in arr:

There is no need of readlines(). Use:

     for item in fhandle:

>         txt = item.strip()

str.strip() removes leading and trailing whitespace. Hence if the line 
is visually empty, txt will be "" i.e. a zero-length string.

>         if re.search( '^\s*$', txt ):

For a start, your regex is constrained by the ^ and $ to matching a 
whole string, so you should use re.match, not re.search. Read the 
section on this topic in the re manual. Note that re.search is *not* 
smart enough to give up if the test at the beginning fails. Second 
problem: you don't need re at all! Your regex says "whole string is 
whitespace", but the strip() will have reduced that to an empty string. 
All you need do is:

     if not txt: # empty line

>             continue
>         if re.search( '^#.*$', txt ):

Similar to the above. All you need is:

     if txt.startswith('#'): # line is comment only
or (faster but less explicit):
     if txt[0] == '#': # line is comment only

>             continue
>         if not re.search( '^\s*(\w|\W)+\s*=\s*(\w|\W)+\s*$', txt ):

(1) search -> match, lose the ^
(2) lose the first and last \s* -- strip() means they are redundant.
(3) What are you trying to achieve with (\w|\W) ??? Where I come from, 
"select things that are X or not X" means "select everything". So the 
regex matches 'anything optional_whitespace = optional_whitespace 
anything'. However 'anything' includes whitespace. You probably intended 
something like 'word optional_whitespace = optional_whitespace 
at_least_1_non-whitespace':

     if not re.match('\w+\s*=\s*\S+.*$'):
but once you've found a non-whitespace after the =, it's pointless 
continuing, so:
     if not re.match('\w+\s*=\s*\S'):

>             continue

Are you sure such lines can be silently ignored? Might they be errors?

>         hash[txt.split( '=' )[0].strip().lower()] = txt.split( '='
> )[1].strip()

Best to avoid splitting twice. Also it's a little convoluted. Also 
beware of multiple '=' in the line.

    left, right = txt.split('=', 1)
    key = left.strip().lower()
    if key in hash:
         # does this matter?
    value = right.strip()
    hash[key] = value

> 
>     print hash['dbrepository']
> ----

Oh, yeah, that hash thing, regexes everywhere ... it's left me wondering 
could this possibly be a translation of a script from another language :-)