[Tutor] extracting information from a complex dict.

Cameron Simpson cs at cskk.id.au
Sun Jan 31 01:54:15 EST 2021


On 31Jan2021 17:16, Sean Murphy <mhysnm1964 at gmail.com> wrote:
>Using windows and python 3.8.

Thanks for this context.

>I need to convert a complex dict into a list. Then I am saving the list 
>as a CSV.

That part is easy. Prepare the list first. Then later see the builtin 
'csv" module.

>I know there is more than likely a library which can convert the tree
>(dict) into a CSV file.

Less likely - your nested dict structure seems quite special purpose.

>But I want to learn recursive functions. I have
>learnt the basics when using maths functions. Thus I understand the basic
>logic of recursive functions. Now I am trying to do something more
>difficult.
>
>The basic dict structure is:
>
>{
>    "A": {
>        "A A Attanasio": {
>            "A A Attanasio": [
>                "Wyvern plot.txt",
>                "Wyvern.mp3"
>            ]
>        }
>    }
>}
>
>I extracted the above from a JSON file which the Dict is based upon. 

Good. This means it will only contain dicts and lists (and strings and 
numbers and None). No funny classes, just the basics.

>Basic
>structure:
>
>*	The first dict has a maximum of 26 entereies. A to z. Calling this
>level 1.

Ok, a mapping of starting letter to dicts mapping author names to 
something.

>*	The first child can have an unknown number of dict's. Only dicts
>exist at this level. For ease of reading I will call this level 2 dict.

A mapping of author names to something.

>*	Level 2 dicts can have a value of None. Easily tested for.

Yes.

>*	The child of level 2 dict, ease of reading level 3 dict:
>
>*	Level 3 dict's contains a list or
>*	Another dict containing another level 3 dict structure using my
>convention is Level 4 dict. Nothing is deeper than this.

So a list of book titles (filenames?), or possibly another nested 
author->list mapping? Do you know why the extra depth?

>Outcome: I want to build a list that contains the following list 
>structure:
>
>[
>              ['a', 'author name', author name', bookt title],
>              ['a', 'author name', author name', bookt title]
>]

I'm going to assume you mistyped something there. Do you mean:

    ['a', 'author name', 'book title']

such as:

    ['A', 'A A Attanasio', 'Wyvern plot.txt']

?

>This might be easier if there was a parent key / value. But there is 
>not.
>The file is to large to manually change this. I would have to write some
>code to insert the parent key. I am hoping there is a way not to do that.
>
>
>My logic (thinking) is (starting from the top of the tree and working your
>way down):
>
>1.	The following variables need to be passed into the function (I
>think):
>
>a.	List containing the above structure that I want.

Right. Pass this same list to every recursive call, that way your 
function can add rows to the list as they are found.

>b.	A list of all the keys used to get to the value which contains the
>list.

Ah, so it might be:

    ['A', 'A A Attanasio', 'A A Attanasio', 'Wyvern plot.txt', 'Wyvern.mp3']

for the Wyvern stuff 3 levels deep? Or even:

    ['A', 'A A Attanasio', 'A A Attanasio', 'Wyvern plot.txt']
    ['A', 'A A Attanasio', 'A A Attanasio', 'Wyvern.mp3']

>c.	The current key which either contains another dict or the list.

Sounds good.

>d.	The First call is the full dict and each call after this point is
>the dict for the next level down. Basically popping the dict.
>
>2.	Test the dict value to see if it contains a None. If does, then only
>store the first three elements of the array as there is no books titles.
>3.	Test the dict value to see if it contains a list. If so, then append
>to the above list structure with all values.
>4.	If a dict, then add to the key list and only return the dict
>contained in the value.
>
>What I am not sure if the above logic will work or if I should be using a
>loops within the recursive function.

If your structure was fixed depth (eg 'A' => 'Author Name' => 
list-of-filenames) a couple of nested loops would be easier. However, it 
varies. Even though only slightly, you're still well off with a 
recursive function.

>I don't have any code to show as I am
>not sure how to start. Done searching and seen examples. Still confused.

Fortunately, you have pretty much written out all the logic correctly 
above.

Your plan is to call your function whenever you have a dict, but 
otherwise stop there and add to your list if it isn't None.

Your plan to collect the keys is also good.

Start by writing the function itself _first_, handing it the state 
you've identified as required at any given point:

    def gather_titles(big_list, previous_keys, subitem):

i.e. the list you're adding things to, the ancestral keys, and the item 
you're examining at this point in the tree of dicts: it might be a 
deeper dict, or a list or None per your description above.

When you start, the parameters have obvious values:


    author_list = []
    gather_titles(author_list, [], the_dict_from_the_json)

i.e. your intially empty list, no previous keys, the top level dict.

So what do you do inside the function? You wrote it out above:

- if subitem is None, do nothing
- if subitem is a list, add it along with the ancestral keys to big_list
- if subitem is a dict, recurse for each element in the dict
- otherwise something unexpected! raise an exception!

That's an if-statement:

    if subitem is None:
        pass
    elif isinstance(subitem, list):
        add to big_list here
    elif isinstance(subitem, dict):
        recurse for each item in the dict
    else:
        raise RuntimeError(
            "at keys %r got an unexpected type %s:%r"
            % (previous_keys, type(subitem), subitem))

Plenty of people wouldn't bother with the exception, but it is good 
defensive programming: handle the known cases, then explode with a 
informative exception if something is somehow unhandled. Much better 
than having the unhandled type silently ignored.

The if-statement is the entire body of your function.

Now fill in the bodies of the second and third if-statement choices. If 
you get stuck, put in print statements to see what values you're working 
against.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list