[Tutor] python: extracting nested json object from multiple files, write to separate text files

Gary LaRose garylarose at outlook.com
Fri Oct 4 14:32:36 EDT 2019


Thank you Cameron, this works nicely - and thanks for pointing me to os.pathsplitext and repr functions
The 'with open...as' ran faster on my local machine.

Best regards


-----Original Message-----
From: Cameron Simpson <cs at cskk.id.au> 
Sent: October 3, 2019 7:27 PM
To: Gary LaRose <garylarose at outlook.com>
Cc: tutor at python.org
Subject: Re: [Tutor] python: extracting nested json object from multiple files, write to separate text files

On 03Oct2019 22:57, Gary LaRose <garylarose at outlook.com> wrote:
>Thank you for you guidance.
>I am attempting to extract nested json object in multiple json files and write to individual text files.
>I have been able to get a non-nested element ['text'] from the json files and write to text files using:
>
>import os, json
>import glob
>
>filelist = glob.glob('./*.json')

No need for the leading "./" here. "*.json" will do.

>for fname in filelist:
>     FI = open(fname, 'r', encoding = 'UTF-8')
>     FO = open(fname.replace('json', 'txt'), 'w', encoding = 'UTF-8')

Minor remark: this is not robust; consider the filename "some-json-in-here.json". Have a glance at the os.pathsplitext function.

>     json_object = json.load(FI)
>     FO.write(json_object['text'])
>
>FI.close()
>FO.close()

Second minor remark: these are better written:

    with open(fname, 'r', encoding = 'UTF-8') as FI:
        json_object = json.load(FI)
    with open(fname.replace('json', 'txt'), 'w', encoding = 'UTF-8') as FO:
        FO.write(json_object['text'])

which do the closes for you (even if an exception happens).

>I have set the working directory to the folder that contains the json files.
>Below is example json file. For each file (2,900), I need to extract 'entities' and write to a separate text file:
>
>{'author': 'Reuters Editorial',
>'crawled': '2018-02-02T12:58:39.000+02:00',
>'entities': {'locations': [{'name': 'sweden', 'sentiment': 'none'},
>                            {'name': 'sweden', 'sentiment': 'none'},
>                            {'name': 'gothenburg', 'sentiment': 'none'}],
>              'organizations': [{'name': 'reuters', 'sentiment': 'negative'},
>                                {'name': 'skanska ab', 'sentiment': 'negative'},
>                                {'name': 'eikon', 'sentiment': 'none'}],
>              'persons': [{'name': 'anna ringstrom', 'sentiment': 
>'none'}]},
[...]

Well, the entities come in from the JSON as a dictionary mapping str to list. Thus:

    entities = json_object['entities']

FOr example, with the example data above, the expression entities['locations'] has the value:

    [
        {'name': 'sweden', 'sentiment': 'none'},
        {'name': 'sweden', 'sentiment': 'none'},
        {'name': 'gothenburg', 'sentiment': 'none'}
    ]

Which is just a list of dictionaries. You just need to access whatever you need as required. When you went:

    FO.write(json_object['text'])

that has the advantage that json_object['text'] is a simple string. If you need to write out the values from entities then you _likely_ want to print it in some more meaningful way. However, just to get off the ground you would go:

    FO.write(repr(entities))

as a proff of concept. When happy, write something more elaborate to get the actual output format you desire.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list