[Tutor] python: extracting nested json object from multiple files, write to separate text files

Gary LaRose garylarose at outlook.com
Thu Oct 3 18:57:52 EDT 2019


Thank you for you guidance.
I am attempting to extract nested json object in multiple json files and write to individual text files.
I have been able to get a non-nested element ['text'] from the json files and write to text files using:

import os, json
import glob

filelist = glob.glob('./*.json')

for fname in filelist:
     FI = open(fname, 'r', encoding = 'UTF-8')
     FO = open(fname.replace('json', 'txt'), 'w', encoding = 'UTF-8')
     json_object = json.load(FI)
     FO.write(json_object['text'])

FI.close()
FO.close()

I have set the working directory to the folder that contains the json files.
Below is example json file. For each file (2,900), I need to extract 'entities' and write to a separate text file:

{'author': 'Reuters Editorial',
'crawled': '2018-02-02T12:58:39.000+02:00',
'entities': {'locations': [{'name': 'sweden', 'sentiment': 'none'},
                            {'name': 'sweden', 'sentiment': 'none'},
                            {'name': 'gothenburg', 'sentiment': 'none'}],
              'organizations': [{'name': 'reuters', 'sentiment': 'negative'},
                                {'name': 'skanska ab', 'sentiment': 'negative'},
                                {'name': 'eikon', 'sentiment': 'none'}],
              'persons': [{'name': 'anna ringstrom', 'sentiment': 'none'}]},
'external_links': ['http://thomsonreuters.com/en/about-us/trust-principles.html'],
'highlightText': '',
'highlightTitle': '',
'language': 'english',
'locations': [],
'ord_in_thread': 0,
'organizations': [],
'persons': [],
'published': '2018-02-01T15:02:00.000+02:00',
'text': 'Feb 1 (Reuters) - Skanska Ab:\n'
         '* SKANSKA DIVEST OFFICE BUILDINGS IN GOTHENBURG, SWEDEN, FOR ABOUT '
         'SEK 1 BILLION Source text for Eikon: Further company coverage: '
         '(Reporting By Anna Ringstrom)\n'
         ' ',
'thread': {'country': 'US',
            'domain_rank': 408,
            'main_image': 'https://s4.reutersmedia.net/resources_v2/images/rcom-default.png',
            'participants_count': 1,
            'performance_score': 0,
            'published': '2018-02-01T15:02:00.000+02:00',
            'replies_count': 0,
            'section_title': 'Archive News & Video for Thursday, 01 Feb '
                             '2018 | Reuters.com',
            'site': 'reuters.com',
            'site_full': 'www.reuters.com',
            'site_section': 'http://www.reuters.com/resources/archive/us/20180201.html',
            'site_type': 'news',
            'social': {'facebook': {'comments': 0, 'likes': 0, 'shares': 0},
                       'gplus': {'shares': 0},
                       'linkedin': {'shares': 0},
                       'pinterest': {'shares': 0},
                       'stumbledupon': {'shares': 0},
                       'vk': {'shares': 0}},
            'spam_score': 0.21,
            'title': 'BRIEF-Skanska sells office buildings in Sweden for '
                     'around 1 bln SEK',
            'title_full': '',
            'url': 'https://www.reuters.com/article/brief-skanska-sells-office-buildings-in/brief-skanska-sells-office-buildings-in-sweden-for-around-1-bln-sek-idUSASM000IRO',
            'uuid': 'c83c8bf46fdb8d597e6c10ad16f221379c1c0705'},
'title': 'BRIEF-Skanska sells office buildings in Sweden for around 1 bln SEK',
'url': 'https://www.reuters.com/article/brief-skanska-sells-office-buildings-in/brief-skanska-sells-office-buildings-in-sweden-for-around-1-bln-sek-idUSASM000IRO',
'uuid': 'c83c8bf46fdb8d597e6c10ad16f221379c1c0705'}


More information about the Tutor mailing list