Help me out with bugs in my list ?

Alex Martelli aleax at aleax.it
Fri Nov 8 10:57:43 EST 2002


Mindy wrote:

> Hey, I have two lists like:
> binding_list =
> ["ls","ls","ls","netscape","netscape","lib","lib"]
> 
> to_list =
> ["ls","lib","ls","netscape","ls","lib","lib"]
> 
> I want to get non-duplicated pairs of strings from
> them like
> (binding_list[i],to_list[i]) by eliminating the
> duplicated pairs. For the above two lists, I want to

Is the *ordering* of the resulting set of "non-duplicated
pairs of strings" important to you?  If not, easy -- two
statements to build a list of non-duplicated pairs:

auxlist = zip(binding_list, to_list)
nondups = dict(zip(auxlist, auxlist)).keys()

This gives you a list of non-duplicated pairs in
arbitrary order -- no relation to the order of those
pairs in binding_list and to_list.

> get two new lists:
> binding_list = ['ls', 'ls', 'netscape', 'netscape',
> 'lib']
> to_list = ['ls', 'lib', 'netscape', 'ls', 'lib']

>From the above nondups list of pairs, you can rebuild
a pair of lists most easily:

binding_list = [ bind for bind, tol in nondups ]
to_list = [ tol for bind, tol in nondups ]


If you need to preserve order in the lists, the
problem becomes substantially harder.  E.g.:

auxlist = zip(binding_list, to_list)
auxdict = dict(zip(auxlist, xrange(len(auxlist))))
nondups = [ auxlist[i] for i in xrange(len(auxlist))
    if auxdict[auxlisti]] == i ]

this recovers order but keeps the LAST occurrence
of each duplicated pair.  So one obvious way to get
to keep the FIRST is through a couple of reversals:

auxlist = zip(binding_list, to_list)
auxlist.reverse()
auxdict = dict(zip(auxlist, xrange(len(auxlist))))
nondups = [ auxlist[i] for i in xrange(len(auxlist))
    if auxdict[auxlisti]] == i ]
nondups.reverse()

There are other approaches, of course.  E.g.:

auxlist = zip(binding_list, to_list)
dondict = {}
nondups = []
for pair in auxlist:
    if pair in dondict: continue
    nondups.append(pair)
    dondict.pair = 1

this way, you naturally end up with the FIRST
occurrence of each duplicated pair.


Your approach was tainted because you were trying
to modify lists on which you were iterating.  Don't
do that: that's almost invariably a mistake, as
things "shift under you" as you remove things from
the list you're iterating on, or insert things into
it; it's very unlikely that the end effect is what
you need.  When "modifying a list while iterating on
it" appears to be the only course (not in this case
of course, since the above approaches provide much
preferable solutions), iterate on a COPY of the list
while you're modifying the original -- taking the
copy, a "snapshot" of the list (e.g. by thelist[:]),
guards you against the "shift under you" effect.


Alex




More information about the Python-list mailing list