How to create a binary tree hierarchy given a list of elements as its leaves

marc nicole mk1853387 at gmail.com
Sun Jan 28 13:16:10 EST 2024


So I am trying to build a binary tree hierarchy given numerical elements
serving for its leaves (last level of the tree to build). From the leaves I
want to randomly create a name for the higher level of the hierarchy and
assign it to the children elements. For example: if the elements inputted
are `0,1,2,3` then I would like to create firstly 4 elements (say by random
giving them a label composed of a letter and a number) then for the second
level (iteration) I assign each of 0,1 to a random name label (e.g. `b1`)
and `2,3` to another label (`b2`) then for the third level I assign a
parent label to each of `b1` and `b2` as `c1`.

An illustation of the example is the following tree:


[image: tree_exp.PNG]

For this I use numpy's `array_split()` to get the chunks of arrays based on
the iteration needs.
for example to get the first iteration arrays I use `np.array_split(input,
(input.size // k))` where `k` is an even number. In order to assign a
parent node to the children the array range should enclose the children's.
For example to assign the parent node with label `a1` to children `b1` and
`b2` with range respectively [0,1] and [2,3], the parent should have the
range [0,3].

All is fine until a certain iteration (k=4) returns parent with range [0,8]
which is overlapping to children ranges and therefore cannot be their
parent.

My question is how to evenly partition such arrays in a binary way and
create such binary tree so that to obtain for k=4 the first range to be
[0,7] instead of [0,8]?

My code is the following:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import string
import random
import numpy as np


def generate_numbers_list_until_number(stop_number):
    if str(stop_number).isnumeric():
        return np.arange(stop_number)
    else:
        raise TypeError('Input should be a number!')


def generate_node_label():
    return random.choice(string.ascii_lowercase) \
        + str(random.randint(0, 10))


def main():
    data = generate_numbers_list_until_number(100)
    k = 1
    hierarchies = []
    cells_arrays = np.array_split(data, data.size // k)
    print cells_arrays
    used_node_hierarchy_name = []
    node_hierarchy_name = [generate_node_label() for _ in range(0,
                           len(cells_arrays))]
    used_node_hierarchy_name.extend(node_hierarchy_name)
    while len(node_hierarchy_name) > 1:
        k = k * 2

# bug here in the following line

        cells_arrays = list(map(lambda x: [x[0], x[-1]],
                            np.array_split(data, data.size // k)))
        print cells_arrays
        node_hierarchy_name = []

# node hierarchy names should not be redundant in another level

        for _ in range(0, len(cells_arrays)):
            node_name = generate_node_label()
            while node_name in used_node_hierarchy_name:
                node_name = generate_node_label()
            node_hierarchy_name.append(node_name)
            used_node_hierarchy_name.extend(node_hierarchy_name)
        print used_node_hierarchy_name
        hierarchies.append(list(zip(node_hierarchy_name, cells_arrays)))


More information about the Python-list mailing list