is for reliable?

John Machin sjmachin at lexicon.net
Mon May 7 18:48:41 EDT 2007


On May 8, 5:46 am, "pablo... at giochinternet.com"
<pablo... at giochinternet.com> wrote:
> Hi to all I have a question about the for statement of python. I have the
> following piece of code where cachefilesSet is a set that contains the
> names of 1398 html files cached on my hard disk
>
> for fn in cachefilesSet:
>
>     fObj = codecs.open( baseDir + fn + '-header.html', 'r', 'iso-8859-1' )
>     u = fObj.read()
>
>     v = u.lower()
>     rows = v.split('\x0a')
>
>     contentType = ''
>
>     for r in rows:
>         if r.find('content-type') != -1:
>             y = r.find(':')
>             if y != -1:
>                 z = r.find(';', y)
>                 if z != -1:

u, v, r, y, z .... are you serious?

>                     contentType = r[y+1:z].strip()
>                     cE = r[z+1:].strip()
>                     characterEncoding = cE.strip('charset = ')

Read the manual ... strip('charset = ') is NOT doing what you think it
is.

>                 else:
>                     contenType = r[y+1:].strip()

Do you mean contentType ?
Consider using pychecker and/or pylint.

>                     characterEncoding = ''
>             break
>
>     if contentType == 'text/html':
>         processHTMLfile( baseDir + fn + '-body.html', characterEncoding, cardinalita )


We don't have crystal balls -- what does processHTMLfile() do? Where
is "cardinalita" bound to a value?

>
>     fileCnt += 1
>     if fileCnt % 100 == 0: print fileCnt
>
> this code stops at the 473th file instead of reaching 1398

Sets are not ordered. There is no such thing as the 473rd element. I
presume you mean that you believe that your code processes only 473
elements

>
> however I changed the for and substituted it with a while in this way
>
> while cachefilesSet:
>     fn = cachefilesSet.pop()
>     .......
>     .......
>
> the while loop reaches the 1398th file and is some 3-4 times faster than
> the for loop
>
> How is this possible?

Given that you open each file and read it, the notion that processing
3 times as many files is 3-4 times faster is very hard to swallow
(even if you mean 3-4 times faster *per file*). Show us *all* of your
code (with appropriate counting and timing) and its output.

Something daft is happening in the part of the code that you haven't
shown us. E.g. you are deleting elements from cachefilesSet, or
perhaps even adding new entries. As a general rule, don't fiddle with
a container over which you are iterating. Using the
    while container:
        item = container.pop()
trick is not "iterating over".

Interestingly, 1398 is rather close to 3 times 473. Coincidence?

Have you tried your code on subsets of your 1398 element set? [This
concept is called "testing"] A subset of size 10 plus copious relevant
print statements in your code might show you what is happening.

HTH,
John




More information about the Python-list mailing list