Data-structure for multiway associativity in Python

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Jan 28 09:42:43 EST 2018


On Sat, 27 Jan 2018 10:01:47 -0800, qrious wrote:

> I need a data structure and a corresponding (hopefully fast) mechanism
> associated with it to do the following. While I am looking for the
> concept first, my preference for implementation of this will be in
> Python.
> 
> [c1, c2,..., cn] is a list of strings (for my own implementation, but
> could be any other type for the generic problem). There will be many
> hundreds, if not thousands, of such lists with no shared member.
> 
> The method getAssocList(e) will return lists of the lists for which e is
> an element.

Since you specified that there are no lists with shared members, why 
bother returning a list of lists? There will only ever be a single 
matching list.

data = [
    [c1, c2, ..., cn],
    [d1, d2, ..., dn], 
    # hundreds more...
    [z1, z2, ..., zn],
    ]

def getAssocList(element):
    for L in data:
        if element in L:
            return L
    raise ValueError('not found')


For faster results, use sets {c1, c2, ..., cn} rather than lists. The 
outer list still has to be a list though.

To speed it up more, we'd need to know more information about how you are 
using this. For example, if the values c1, ... d1, ... etc have some sort 
of relationship, you might be able to generate some kind of multiway tree 
that avoids having to search all of the thousands of lists before giving 
up.

Are searches going to typically hit the same set c1... over and over 
again? If so, then after matching, bring it to the front of the master 
list. (You might want to use a deque for that.)



> Here a hash may be a way to go, but need help in figuring it out. Also,
> can there be a faster and more memory efficient solution other than
> hashes?

Probably not, not unless you have some sort of internal structure you can 
take advantage of. For example, if all the strings in any group start 
with the same letter, then you can dispatch directly to the right list:

data = {'c': {c1, c2, c3, ..., cn},
        'd': {d1, ... } # and so on...
        }

def getAssocList(element):
    if element in data[element[0]]:
        return L
    raise ValueError('not found')


But if there is no structure at all, and the elements in each list are 
completely arbitrary, then there is nothing better than a linear search 
through the entire collection of lists, looking at every single element 
until you've matched what you're after.

But your description is pretty vague and for all I know what you actually 
want is already a solved problem. Can you give more detail and cut-down 
example of your data set? Say, two or three values per list, and two or 
three lists.


-- 
Steve




More information about the Python-list mailing list