remove duplicates from list *preserving order*

Francis Girard francis.girard at free.fr
Sun Feb 6 14:50:40 EST 2005


Hi,

I think your last solution is not good unless your "list" is sorted (in which 
case the solution is trivial) since you certainly do have to see all the 
elements in the list before deciding that a given element is not a duplicate. 
You have to exhaust the iteratable before yielding anything.

Besides that, I was thinking that the best solution presented here proceed in 
two steps :

1- Check that an element is not already in some ordered data type of already 
seen element

2- If not, put the element in that structure

That's probably twice the work.

There might be some way to do both at the same time, i.e.

- Put the element in the data structure only if not already there and tell me, 
with some boolean, if you did put the element.

Then you have to do the work of finding the right place where to insert 
(fetch) the element only once. I don't see any easy way to do this in python, 
other than rolling your sleeves and code your own data structure in Python, 
which would be slowlier than the provided dict or set C implementation.

I think this a hole into the Pythin libs.

Regards

Francis Girard

Le jeudi 3 Février 2005 21:39, Steven Bethard a écrit :
> I'm sorry, I assume this has been discussed somewhere already, but I
> found only a few hits in Google Groups...  If you know where there's a
> good summary, please feel free to direct me there.
>
>
> I have a list[1] of objects from which I need to remove duplicates.  I
> have to maintain the list order though, so solutions like set(lst), etc.
> will not work for me.  What are my options?  So far, I can see:
>
> def filterdups(iterable):
>      result = []
>      for item in iterable:
>          if item not in result:
>              result.append(item)
>      return result
>
> def filterdups(iterable):
>      result = []
>      seen = set()
>      for item in iterable:
>          if item not in seen:
>              result.append(item)
>              seen.add(item)
>      return result
>
> def filterdups(iterable):
>      seen = set()
>      for item in iterable:
>          if item not in seen:
>              seen.add(item)
>              yield item
>
> Does anyone have a better[2] solution?
>
> STeve
>
> [1] Well, actually it's an iterable of objects, but I can convert it to
> a list if that's helpful.
>
> [2] Yes I know, "better" is ambiguous.  If it helps any, for my
> particular situation, speed is probably more important than memory, so
> I'm leaning towards the second or third implementation.




More information about the Python-list mailing list