[Numpy-discussion] efficient way to manage a set of floats?

Anne Archibald aarchiba at physics.mcgill.ca
Mon May 10 19:43:08 EDT 2010


On 10 May 2010 18:53, Dr. Phillip M. Feldman <pfeldman at verizon.net> wrote:
>
> I have an application that involves managing sets of floats.  I can use
> Python's built-in set type, but a data structure that is optimized for
> fixed-size objects that can be compared without hashing should be more
> efficient than a more general set construct.  Is something like this
> available?

You might not find this as useful as you think - on a 32-bit machine,
the space overhead is roughly a 32-bit object pointer or two for each
float, plus about twice the number of floats times 32-bit pointers for
the table. And hashing might be worthwhile anyway - you could easily
have a series of floats with related bit patterns you'd want to
scatter all over hash space. Plus python's set object has seen a fair
amount of performance tweaing.

That said, there's support in numpy for many operations which use a
sorted 1D array to represent a set of floats. There's searchsorted for
lookup, plus IIRC union and intersection operators; I'm not sure about
set difference. The big thing missing is updates, though if you can
batch them, concatenate followed by sorting should be reasonable.
Removal can be done with fancy indexing, though again batching is
recommended. Maybe these should be regarded as analogous to python's
frozensets.

In terms of speed, the numpy functions are obviously not as
asymptotically efficient as hash tables, though I suspect memory
coherence is more of an issue than O(1) versus O(log(n)). The numpy
functions allow vectorized lookups and vector operations on sets,
which could be handy.


Anne

P.S. if you want sets with fuzzy queries, it occurs to me that scipy's
kdtrees will actually do an okay job, and in compiled code. No updates
there either, though. -A

> View this message in context: http://old.nabble.com/efficient-way-to-manage-a-set-of-floats--tp28518014p28518014.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list