Something keeps nibbling on my list

Wed Apr 18 10:27:58 EDT 2001

"Steve Purcell" <stephen_purcell at yahoo.com> wrote in message
news:mailman.987537806.6571.python-list at python.org...
    [snip]
>     FILENAME = '/etc/hosts.deny'
>     OUTPUT = 'hosts.deny.test'
>
>     orig = open(FILENAME)
>     lines = orig.readlines()
>     orig.close()
>
>     def skip(line):
>        return line[:1] == '#'
>
>     filtered = open(OUTPUT,'w')
>     for linenum in range(len(lines)):
>        line = lines[linenum]
>        if skip(line):
>    continue
>        if line in lines[:linenum]:   # duplicate
>    continue
>        filtered.write(line)
>
>     filtered.close()

This will work fine.  The following also works, and it
may be faster (it uses a different way to detect
duplicates, with an auxiliary dictionary):

    lines_seen = {}
    def skip(line):
        if lines_seen.has_key(line) or\
           line.startswith('#'): return 1
        lines_seen[line] = 1
        return 0

    filtered = open(OUTPUT, 'w')
    seen = {}
    for line in open(FILENAME).readlines():
        if not skip(line):
            filtered.write(line)
    filtered.close()

Some would enjoy coding the last block as

    open(OUTPUT, 'w').writelines(
        [line for line in open(FILENAME).readlines()
            if not skip(line)
        ]
    )

but I personally think that's the kind of things
that gives list comprehensions a bad name, just as:

    open(OUTPUT, 'w').writelines(
        filter(keep, open(FILENAME).readlines())
    )

(with keep defined like skip, but negated) would
be slightly too much of a good (functional) thing.

Alex