[BangPypers] JSON PARSER

lokesh bobby lokeshbobbys at yahoo.com
Fri Mar 21 11:41:30 CET 2014


Hi Prashant,

I have 2 concerns here

1. As I mentioned earlier, my file consists of more than 2 lac lines. So it is not possible to load that many number of lines inside json.loads(). We need to pass the file, which itself is a limitation as explained earlier.

2. It is not an ideal solution for me - creating a list of the values of duplicate keys, because my values are not just single word strings. They are again big dicts.

Thanks,
Lokesh




On Friday, 21 March 2014 3:35 PM, Prashant Gaur <91prashantgaur at gmail.com> wrote:
 
EDIT  :

import simplejson as json
from collections import defaultdict

def duplicate_key_lookup(ordered_pairs):

     """Convert duplicate keys values to lists."""
     # read all values into lists
     d = defaultdict(list)
     for k, v in ordered_pairs:
         d[k].append(v)
     # unpack lists that have only 1 item
     for k, v in d.items():
         if len(v) == 1:
             d[k] = v[0]
      return dict(d)

>>> json.loads('''{"test":"hi","test":"bye", "test1": "tata"}''', object_pairs_hook=duplicate_key_lookup)
{'test': ['hi', 'bye'], 'test1': 'tata'}




On Fri, Mar 21, 2014 at 3:34 PM, Prashant Gaur <91prashantgaur at gmail.com> wrote:

Hello Lokesh ,
>
>
>as we know json.loads return data in form of dict and dict can never have same keys .
>
>
>so we can do one thing and that is to return a list of all values which are having same names.
>
>import simplejson as json
>from collections import defaultdict
>
>def duplicate_key_lookup(ordered_pairs):
>    """Convert duplicate keys values to lists."""
>    # read all values into lists
>     d = defaultdict(list)
>     for k, v in ordered_pairs:
>         d[k].append(v)
>         # unpack lists that have only 1 item
>     for k, v in d.items():
>         if len(v) == 1:
>             d[k] = v[0]
>      return dict(d)
>
>>>> json.loads('''{"test":"hi","test":"bye", "test1": "tata"}''', object_pairs_hook=multidict)
>{'test': ['hi', 'bye'], 'test1': 'tata'}
>
>
>
>
>On Fri, Mar 21, 2014 at 3:22 PM, lokesh bobby <lokeshbobbys at yahoo.com> wrote:
>
>Hi Jayanth/Prashant,
>>
>>
>>Either "the duplicates to be loaded" or "An ouput of all the duplicate key names in a JSON file" should be fine for me :-)
>>
>>
>>NOTE: JSON file consits of more than 2 lakhs LOC
>>
>>
>>Lokesh
>>
>>
>>
>>On Friday, 21 March 2014 3:17 PM, Jayanth Koushik <jnkoushik at gmail.com> wrote:
>> 
>>Hi Prashant
>>
>>I think he wants the duplicates to be loaded, not cause an exception.
>>
>>Jayanth
>>
>>
>>
>>
>>On Fri, Mar 21, 2014 at 3:15 PM, Prashant Gaur <91prashantgaur at gmail.com> wrote:
>>
>>Hi Lokesh,
>>>
>>>
>>>we can pass lookup while parsing your json file which will make sure that name is repetitive or not .
>>>
>>>
>>>import json
>>>
>>>
>>>def duplicate_checking_hook(pairs):
>>>
>>>    ''' lookup for duplicate names'''
>>>
>>>    result = dict()
>>>
>>>    for key, val in pairs:
>>>
>>>        if key in result:
>>>
>>>            raise KeyError('Duplicate Key specified: %s % key)
>>>
>>>        result[key] = val
>>>
>>>    return result
>>>
>>>>>> json.loads('''{"test":"hi","test":"bye"}''', object_pairs_hook=duplicate_checking_hook)
>>>Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>>  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
>>>    return cls(encoding=encoding, **kw).decode(s)
>>>  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
>>>    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>>  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
>>>    obj, end = self.scan_once(s, idx)
>>>  File "<stdin>", line 5, in duplicate_checking_hook
>>>KeyError: u'Duplicate key specified: a'
>>>
>>>
>>>While
>>>
>>>>>> json.loads('''{"test":"hi","test1":"bye"}''', object_pairs_hook=duplicate_checking_hook)
>>>{u'test': u'hi', u'test1': u'bye'}
>>>>>>
>>>
>>>
>>>  
>>>
>>>
>>>defdupe_checking_hook(pairs):result =dict()forkey,val inpairs:ifkey inresult:raiseKeyError("Duplicate key specified: %s"%key)result[key]=val returnresult
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Fri, Mar 21, 2014 at 2:59 PM, lokesh bobby <lokeshbobbys at yahoo.com> wrote:
>>>
>>>Hi Jayanth,
>>>>
>>>>Ideally speaking a JSON shouldn't be created with repetitive key names. But manually it is possible that a proper JSON file can be appended with a duplicate key. We need to catch that duplicate key. If we are going to use json.load(), the repetitive keys of the JSON file wont get loaded. Only the last occurrence of the key will be there in that loaded data.
>>>>
>>>>Instead I want that entire data in JSON file should be loaded. Hoping that you got the point what I want :-)
>>>>
>>>>Thanks,
>>>>Lokesh.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Friday, 21 March 2014 2:45 PM, Jayanth Koushik <jnkoushik at gmail.com> wrote:
>>>>
>>>>Hi Lokesh
>>>>
>>>>
>>>>The 'problem' that you talk about isn't really a problem. Since the JSON specification does not say what is to be done for repeated names, it is up to the implementation to decide. What is your requirement for handling repeated names?
>>>>
>>>>
>>>>Jayanth
>>>>
>>>>
>>>>
>>>>
>>>>On Fri, Mar 21, 2014 at 2:30 PM, lokesh bobby <lokeshbobbys at yahoo.com> wrote:
>>>>
>>>>Hi Noufal,
>>>>>
>>>>>Thanks for your reply. I am not looking for loading the JSON file. There is a limitation in it. Go thru the links
>>>>>
>>>>>http://docs.python.org/2/library/json.html#repeated-names-within-an-object
>>>>>http://docs.python.org/3.2/library/json.html#repeated-names-within-an-object
>>>>>
>>>>>In order to get rid of that problem, I am looking for some JSON stream parsers.
>>>>>
>>>>>Thanks,
>>>>>Lokesh.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Friday, 21 March 2014 2:09 PM, Noufal Ibrahim KV <noufal at nibrahim.net.in> wrote:
>>>>>
>>>>>On Fri, Mar 21 2014, lokesh bobby wrote:
>>>>>
>>>>>
>>>>>> Hi ALL,
>>>>>>
>>>>>> Can you share your thoughts on how to parse a JSON file by using
>>>>>> python?
>>>>>
>>>>>import json
>>>>>
>>>>>with open("data.json") as f:
>>>>>     json.load(f)
>>>>>
>>>>>
>>>>>[...]
>>>>>
>>>>>
>>>>>--
>>>>>Cordially,
>>>>>Noufal
>>>>>http://nibrahim.net.in
>>>>>_______________________________________________
>>>>>BangPypers mailing list
>>>>>BangPypers at python.org
>>>>>https://mail.python.org/mailman/listinfo/bangpypers
>>>>>
>>>>_______________________________________________
>>>>BangPypers mailing list
>>>>BangPypers at python.org
>>>>https://mail.python.org/mailman/listinfo/bangpypers
>>>>
>>>
>>>
>>>
>>>-- 
>>>
>>>Prashant Gaur 
>>>
>>>Mobile : +91 9717353657
>>>http://gaurprashant.blogspot.in/ 
>>>http://stackoverflow.com/users/1850358/prashant-gaur
>>>http://www.about.me/prashantgaur/
>>>
>>>
>>
>>
>>
>
>
>-- 
>
>Prashant Gaur 
>
>Mobile : +91 9717353657
>http://gaurprashant.blogspot.in/ 
>http://stackoverflow.com/users/1850358/prashant-gaur
>http://www.about.me/prashantgaur/
>
>


-- 

Prashant Gaur 

Mobile : +91 9717353657
http://gaurprashant.blogspot.in/ 
http://stackoverflow.com/users/1850358/prashant-gaur
http://www.about.me/prashantgaur/


More information about the BangPypers mailing list