More elegant solution for diffing two sequences

Lie Ryan lie.1296 at gmail.com
Fri Dec 4 14:20:05 EST 2009


On 12/5/2009 4:20 AM, Ulrich Eckhardt wrote:
> Thinking about it, I perhaps should store the glyphs in a set from the
> beginning. Question is, can I (perhaps by providing the right hash function)
> sort them by their codepoint? I'll have to look at the docs...

Python does not guarantee that a particular characteristic of the hash 
function will lead to a particular characteristic of the ordering of th 
eset. Though AFAICT, the current set's ordering is determined by the 
hash modulus the set's hashtable's real size, but if you rely on this 
you're on your own. It's better if you sorted() them when you want a 
sorted view (or turn to set just before finding the differences).

You can reduce the penalty of creating new data structure with something 
like:

a = [...]
b = [...]
s_a = set(a)
s_a -= set(b)

that only creates two new sets (instead of three) and probably might be 
faster too (though you'd need to profile to be sure).



More information about the Python-list mailing list