Mapping with continguous ranges of keys

Thu Dec 15 12:54:48 EST 2016

Steve D'Aprano wrote:

> I have some key:value data where the keys often are found in contiguous
> ranges with identical values. For example:
> 
> {1: "foo",
>  2: "foo",
>  3: "foo",
>  # same for keys 4 through 99
>  100: "foo",
>  101: "bar",
>  102: "bar",
>  103: "foobar",
>  104: "bar",
>  105: "foo",
> }
> 
> 
> So in this case, the keys 1 through 100, plus 105, all have the same
> value.
> 
> I have about a million or two keys, with a few hundred or perhaps a few
> thousand distinct values. The size of each contiguous group of keys with
> the same value can vary from 1 to perhaps a hundred or so.
> 
> Has anyone dealt with data like this and could give a recommendation of
> the right data structure to use?
> 
> The obvious way is to explicitly list each key, in a regular dict. But I
> wonder whether there are alternatives which may be better (faster, more
> memory efficient)?

Use a list, either directly if there are no big holes

>>> r = range(10**5)
>>> sys.getsizeof(list(r))/sys.getsizeof(dict(zip(r, r)))
0.14306676635590074

or indirectly:

ranges_list = [
    1,
    101,
    103,
    104,
    105,
    106,
]

index_to_value = {
    1: "foo",
    2: "bar",
    3: "foobar",
    4: "bar",
    5: "foo",
}

def get(key, default="missing"):
    index = bisect.bisect(ranges_list, key)
    return index_to_value.get(index, default)