newbie - infinite loop

Mon Mar 19 11:03:26 EST 2001

"Paul Brian" <pbrian at demon.net> wrote in message
news:985012513.18971.0.nnrp-12.c1c3e154 at news.demon.co.uk...

You have diagnosed problems modifying the very list you are looping on.
Right: such modifications would indeed cause problems.  But, anyway,
your code is not an implementation of the algorithm you intend:

> I devcided to create a (neaarly) empty list, compare each name in the file
> with that list. If the name was not in, append to list, otherwise leave it
> alone.

That would be...:

    uniqueList = []

    line = 'q'

    for item in uniqueList:
        if line == item: break
    else:
        uniqueList.append(line)

Note we append in the *else* clause of the loop, that is,
if and only if the loop has NOT been interrupted by a
'break'.  Appending what IS already in the list (i.e.,
having the .append call instead of the break statement
as the guarded code of the if statement) is the other
way around from what you thought of (and your thoughts
were indeed correct).

If we did need to modify a list as well as iterate on
it, we'd do our iteration on a _copy_ of the list.  For
example, say that your target was to get a list made
up of the first 3 lines of a fine *and any further line
identical to one of the first 3, in order*.  Then, one
approach (perhaps not optimal, but simple indeed) -- if
we didn't know about the 'in' operator, etc -- might be:

def peculiarLines(afile):
    result = []
    for i in range(3):
        result.append(afile.readline())
    for line in afile.readlines():
        for existing in result[:]:
            if line==existing:
                result.append(line)
                break
    return result

OK, _way_ suboptimal, but still, should work.  The
key to making it work is the specific statement:
        for existing in result[:]:
whereby we iterate on a *copy* (a full slice) of
the result-list -- so, our result.append (as it
modifies the real result-list, not the copy) does
not break anything.

Here, of course, we can do much better, but the
idea is worth considering since at _other_ times
we may indeed need to loop on a list AND alter
it in the body of said loop as well.

For this specific case, the 'in' operator is not
really very fast either -- it still has to scan
linearly the whole result list, after all.  A
small auxiliary dictionary would make the whole
procedure O(N) rather than O(N squared):

def peculiarLines(afile):
    result = []
    seen = {}
    for i in range(3):
        line = afile.readline()
        result.append(line)
        seen[line] = 1
    for line in afile.readlines():
        if seen.has_key(line):
            result.append(line)
    return result

The auxiliary-dictionary approach is equally
helpful for your original problem:

def uniqueLines(afile):
    result = []
    seen = {}
    for line in afile.readlines():
        if not seen.has_key(line):
            result.append(line)
            seen[line] = 1
    return result

or, if you really wanted lines *which only
appear once* (in the same order in which
they appear in the file):

def reallyUniqueLines(afile):
    result = []
    seen = {}
    for line in afile.readlines():
        if seen.has_key(line):
            try: result.remove(line)
            except ValueError: pass
        else:
            result.append(line)
            seen[line] = 1
    return result

This is, again, sub-optimal, since repeated
lines cause calls to possibly-slow method
remove -- ideal might be to keep a list of
all lines seen _at least once_ and use the
auxiliary dictionary to count how many times
each line was seen, then filter appropriately
at the end.  For example, generalizing:

def seenExactlyNtimes(afile, N=2):
    result = []
    seen = {}
    for line in afile.readlines():
        count = seen.get(line)
        if count:
            seen[line] = count+1
        else:
            result.append(line)
            seen[line] = 1
    return [line for line in result if seen[line]==N]

This returns the lines seen exactly N times
_in the order in which they were first seen_.

An interesting variation is to return them
in the order of their M-th appearance, where
M is another parameter (1<=M<=N, of course)...
but I'll leave that as an exercise for the
reader!

Alex