[Pandas-dev] Unexpected error using pd.read_json with chunksize

Vicki Brown vlb at cfcl.com
Sun Jun 23 22:54:23 EDT 2019


TL;DR

I am getting an error I don't understand from my pd.read_json reader:

    ValueError: Unexpected character found when decoding 'false'


Details

I am working a set of code exercises that use a Yelp reviews dataset. At this point in the exercises I am supposed to read in review.json which has one JSON record per line (and 6 million lines! ;-). 
   
Because the complete dataset file is enormous, the tutorial recommends using chunksize and building a json reader. When I try this, I get an error, even with my test input.

I am befuddled.

TESTS:

I have smaller versions of the JSON file, with 100, 3, and 1 records, for testing. I can read all of the test files into a pandas dataframe and examine them, e.g..

    path = 'file://localhost/Users/vlb/Learn/DSC_Intro/'
    filename = path + 'yelp_dataset/review_test.json'

    # read the entire file
    reviews = pd.read_json(filename, lines=True)

I cannot read any of the test files in chunks. I get the same error.

CODE: 

My code currently looks like this

    path 'file://localhost/Users/.../DSC_Intro/'
    filename = path + 'yelp_dataset/review_test.json'

    # create a reader to read in chunks

    review_reader = pd.read_json(StringIO(filename), lines=True, chunksize=1)


`type(review_reader)` returns

    pandas.io.json.json.JsonReader

as expected.  However

    for chunk in review_reader:
    print(chunk)

throws an error on all test files.

    /Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _parse_no_numpy(self)
        869     if orient == "columns":
        870	    self.obj = DataFrame(
    --> 871             loads(json, precise_float=self.precise_float), dtype=None)
        872     elif orient == "split":
        873         decoded = {str(k): v for k, v in compat.iteritems(

    ValueError: Unexpected character found when decoding 'false'

INPUT: Sample JSON

{"review_id":"Amo5gZBvCuPc_tZNpHwtsA","user_id":"DzZ7piLBF-WsJxqosfJgtA","business_id":"qx6WhZ42eDKmBchZDax4dQ","stars":5.0,"useful":1,"funny":0,"cool":0,"text":"Our family LOVES the food here. Quick, friendly, delicious, and a great restaurant to take kids to. 5 stars!","date":"2017-03-27 01:14:37"}

references: 
    https://www.yelp.com/dataset/documentation/main
    https://courses.springboard.com/courses/448797




More information about the Pandas-dev mailing list