[Tutor] extracting information from a complex dict.

Peter Otten __peter__ at web.de
Sun Jan 31 06:23:43 EST 2021


On 31/01/2021 07:16, mhysnm1964 at gmail.com wrote:

> I need to convert a complex dict into a list. Then I am saving the list as a
> CSV. I know there is more than likely a library which can convert the tree
> (dict) into a CSV file. But I want to learn recursive functions. I have
> learnt the basics when using maths functions. Thus I understand the basic
> logic of recursive functions. Now I am trying to do something more
> difficult.
> 
>   
> 
> The basic dict structure is:

I'd like to suggest an approach where the dict structure doesn't matter 
much. It's sufficient that you may have arbitrarily nested dicts and 
lists, and scalar values.

When you encounter a list you want

[first-item]
[second-item]
[third-item]
...

and for a dict you want

[first-key, first-value]
[second-key, second-value]
...

For any other value you want just that

[value]

Now, how to combine these parts in such a way that e. g.

tree = {1: [2, 3, 4]}

gives

[1, 2]
[1, 3]
[1, 4]

? If the structure were fixed you'd use

for key, value in tree.items():
     for item in value:
         print([key, item])

but since you don't know the actual nesting you need to check

if isinstance(tree, dict):
     for key, value in tree.items():
         print([key, value])
elif isinstance(tree, list):
     for value in tree:
         print([value])
else:
     print([tree])

That works not just for a dict but for an arbitrary node in our tree -- 
let's turn it into a function

def process(tree, path=()):

(As we need a way to remember the upwards part of the tree we provide 
the path argument. I used a tuple to appease the linters, but a list 
would work here, too, as long as you do not actually mutate the path 
variable.)

     if isinstance(tree, dict):
         for key, value in tree.items():
             process(value, path + (key,))
     elif isinstance(tree, list):
         for value in tree:
             process(value, path)
             # the above line flattens nested lists
             # you might instead want
     else:
         print(path + (tree,))


This is classical recursion, and as written it's not very flexible. If 
you want to filter out None values and dump to CSV instead of printing 
the tuples you need to rewrite the process() function.
One way to make the function more flexible is to have it accept a 
callback function:

def process_cb(tree, path=(), process_leaf=print):
     if isinstance(tree, dict):
         for key, value in tree.items():
             process_cb(value, path + (key,), process_leaf)
     elif isinstance(tree, list):
         for value in tree:
             process_cb(value, path, process_leaf)
     else:
         process_leaf(path + (tree,))


# example usage
writer = csv.writer(sys.stdout)
def dump(path):
     if path[-1] is None:
         print("suppressed", path, file=sys.stderr)
     else:
         writer.writerow(path)

process_cb(sample, process_leaf=dump)

The cleaner option is to return the tuples upwards. You may try your 
hands at rewriting process() in such a way. However, the disadvantage is 
that you build the complete list before you start printing anything.

A structurally identical but more elegant solution is to use a generator 
instead of a regular function:

def process_gen(tree):
     if isinstance(tree, dict):
         for key, value in tree.items():
             for rest in process_gen(value):
                 yield (key,) + rest
     elif isinstance(tree, list):
         for value in tree:
             yield from  process_gen(value)
     else:
         yield((tree,))

writer = csv.writer(sys.stdout)
writer.writerows(
     path for path in process_gen(sample) if path[-1] is not None
)

This "returns" a row as it is seen so that i can be processed 
immediately on the top level of the script. As the call stack is 
preserved execution of the generator code can then continue where it paused.

This is the technique that you have already seen in os.walk().



More information about the Tutor mailing list