[issue37723] important performance regression on regular expression parsing

Wed Jul 31 13:23:32 EDT 2019

Matthew Barnett <python at mrabarnett.plus.com> added the comment:

I've just had a look at _uniq, and the code surprises me.

The obvious way to detect duplicates is with a set, but that requires the items to be hashable. Are they?

Well, the first line of the function uses 'set', so they are.

Why, then, isn't it using a set to detect the duplicates?

How about this:

def _uniq(items):
    newitems = []
    seen = set()
    for item in items:
        if item not in seen:
            newitems.append(item)
            seen.add(item)
    return newitems

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37723>
_______________________________________