IMAP2IMAP Synchronization -- iis [Was: Re: iis: comparsion of two lists]

Alex Martelli aleax at aleax.it
Sun Apr 28 03:22:06 EDT 2002


Matej Cepl wrote:

> On Sat, Apr 27, 2002 at 06:51:21AM +0000, Alex Martelli wrote:
>> It would, but the first assignment is useless and the approach
>> needlessly complicated IMHO.
>> 
>> erase = [ m for m in proc_old if m not in loc_old ]
> 
> Concerning the first assignement, rather be (slightly) slower
> than sorry and your solution seems to work only with Python 2.*.
> I am still on 1.5.2. Do I have to _really_ download multimegabyte
> beast just to get above command?

Of course not, you can keep using a 4-years-old language version for
another 4 years (or 44).

Even in 1.5.2, in your use of:

    erase = ()
    erase = something else

the first assignment is useless.  Not an issue of performance, but of
Occam's razor: entities must not be multiplied needlessly, and a
statement that is totally useless is is a good candidate for pruning.

In 1.5.2, you didn't have list comprehensions, so the idiomatic approach
would use filter and (yech) a lambda with an injected name:

erase = filter(lambda m, loc_old=loc_old: m not in loc_old, proc_old)

My favorite would be to forego the hard-to-read lambda for a named
function (the name injection is inevitable in 1.5.2 or 2.0):

def not_in_loc_old(m, loc_old=loc_old): return m not in loc_old
erase = filter(not_in_loc_old, proc_old)

> Therefore, could I ask somebody for review of the below script,
> especially concerning the safety of messages and speed, please?

Code that claims it must be as fast as possible should NOT use
completely and obviously unnecessary assignments, as yours does.
Take out either the unneeded assignments or the speed claims.

It's silly to use two operators in expressions such as
        not (x in y)
when a single operator called not in suffices:
        x not in y

The in and not in operators with a list RHS are O(N) [take time
roughly proportional to the number of items in the RHS].  This is
likely to produce the speed bottleneck unless the lists involved
are very, VERY small.  For membership checking, prepare, not
lists, but dictionaries, and (in 1.5.2 and up to 2.1) use has_key.

Access to global names is quite a bit slower than access to
local names.

So the code you consider "ugly but as fast as possible" should become:

    def makeSet(msgDict):
        result = {}
        for m in msgDict.values():
            result[m[0]] = 1
        return result

    proc_new = makeSet(processed.new_msg)
    proc_old =  makeSet(processed.old_msg)
    loc_new = makeSet(local.new_msg)
    loc_old =  makeSet(local.old_msg)

    def filterSet(aSet, avoidSet):
        result = {}
        for m in aSet.keys():
            if not avoidSet.has_key(m):
                result[m]=1
        return result

    e = filterSet(proc_old, loc_old)
    d = filterSet(proc_new, loc_new)

    def valueInE(x, e=e, p=processed.old_msg):
        return e.has_key(p[x][0])
    erase = filter(valueInE, processed.old_msg.keys())

    def valueInD(x, d=d, p=processed.new_msg):
        return d.has_key(p[x][0])
    download = filter(valueInE, processed.new_msg.keys())

    return erase, download

By modestly increasing elegance, we should also get substantial
increases in performance unless your sets are all tiny (in which
case performance doesn't matter all that much).  Try -- performance
issues are always to be checked by measurement.


Alex




More information about the Python-list mailing list