Algorithm help per favore
Steven Taschuk
staschuk at telusplanet.net
Wed Jun 18 12:41:23 EDT 2003
Quoth Larry:
> I need to take a list (probably 20k to 40k elements) of numbers and
> remove consecutive duplicates. Non consecutive duplicates are ok.
>
> Example: [6,3,3,3,5,7,6,3,4,4,3] => [6,3,5,7,6,3,4,3]
>
> The 3 and 6 can appear more than once in the result set because
> they're separated by another value. Obviously this is trivial to
> accomplish by walking thru the list and building a new one (or yanking
> elements out of the existing one) but I'm curious if anyone knows of a
> more clever way, with speed being a goal more than memory usage.
If speed is the goal, removing elements from the existing list is
not indicated; that would take O(n**2) time, since each removal
has to shift all subsequent elements down one spot.
Building a new list seems not only like the obvious approach, but
probably the fastest. I think it quite unlikely that there's any
faster clever algorithm.
def unique(iterable):
result = []
append = result.append # avoid attribute lookup in loop
next = iter(iterable).next # ditto
try:
previous = next()
append(previous)
while True:
value = next()
if value != previous:
append(value)
previous = value
except StopIteration:
pass
return result
(If latency is important, this could be changed into a generator
easily; that would also decrease memory usage. I'm not sure what
it would do to throughput -- try it and see.)
I don't see any way to use built-ins or whatnot to move the loop
into C. Of course, you could recode the whole thing in C.
--
Steven Taschuk "[W]e must be very careful when we give advice
staschuk at telusplanet.net to younger people: sometimes they follow it!"
-- "The Humble Programmer", Edsger Dijkstra
More information about the Python-list
mailing list