No trees in the stdlib?

João Valverde backup95 at netcabo.pt
Sat Jun 27 01:06:53 EDT 2009


João Valverde wrote:
> greg wrote:
>> João Valverde wrote:
>>
>>> What's lacking is an associative array that preserves ordering, 
>>> doesn't require a hash function and has fast insertions and 
>>> deletions in O(log(n)).
>>
>> Careful here -- you can't get away from the need for
>> hashability just by using a tree. Even if you don't
>> need to actually hash the values, it's still important
>> that the criterion for ordering between objects doesn't
>> change while they're in the tree, otherwise they'll
>> be in the wrong place and won't be found by subsequent
>> lookups.
>
> I'm aware :) Technically it's necessary to define a total ordering on 
> the set of keys.
>>
>> > I'm genuinely surprised to know
>>> there are no data structures that efficiently support such a common 
>>> need in Python.
>>
>> Is it really all that common? If it truly were common,
>> there probably *would* be something for it in the
>> stdlib by now.
> Obviously my experience differs, but those were my expectations.
>>
>> What sort of things are you doing that you want such
>> a structure for? Maybe we can suggest a way of using
>> the existing data structures to achieve the same
>> goal.
>>
> To answer the question of what I need the BSTs for, without getting 
> into too many boring details it is to merge and sort IP blocklists, 
> that is, large datasets of ranges in the form of (IP address, IP 
> address, string). Originally I was also serializing them in a binary 
> format (but no more after a redesign). I kept the "merge and sort" 
> part as a helper script, but that is considerably simpler to implement.
Crap, this sentence is totally confusing. I meant kept the merge code as 
a helper script and moved the rest to C, see next paragraph.

> Please note that I'm happy with my code, it works well. I intended to 
> implement it in C all along (for a system daemon), even before trying 
> Python. The python code was a side thing for testing/curiosity/fun. It 
> prompted my original question but that was really about Python and the 
> standard library itself, and I don't wish to waste anyone's time.
>
> As an anecdotal data point (honestly not trying to raise the "Python 
> is slow" strawman), I implemented the same algorithm in C and Python, 
> using pyavl. Round numbers were 4 mins vs 4 seconds, against Python 
> (plus pyavl). Even considering I'm a worse Python programmer than C 
> programmer, it's a lot. I know many will probably think I tried to do 
> "C in Python" but that's not the case, at least I don' t think so. 
> Anyway like I said, not really relevant to this discussion.
>




More information about the Python-list mailing list