[Pandas-dev] Unexpected error using pd.read_json with chunksize

William Ayd william.ayd at icloud.com
Mon Jun 24 09:11:01 EDT 2019


Can you determine which line in the file is throwing the error?

> On Jun 23, 2019, at 10:54 PM, Vicki Brown <vlb at cfcl.com> wrote:
> 
> TL;DR
> 
> I am getting an error I don't understand from my pd.read_json reader:
> 
>    ValueError: Unexpected character found when decoding 'false'
> 
> 
> Details
> 
> I am working a set of code exercises that use a Yelp reviews dataset. At this point in the exercises I am supposed to read in review.json which has one JSON record per line (and 6 million lines! ;-). 
> 
> Because the complete dataset file is enormous, the tutorial recommends using chunksize and building a json reader. When I try this, I get an error, even with my test input.
> 
> I am befuddled.
> 
> TESTS:
> 
> I have smaller versions of the JSON file, with 100, 3, and 1 records, for testing. I can read all of the test files into a pandas dataframe and examine them, e.g..
> 
>    path = 'file://localhost/Users/vlb/Learn/DSC_Intro/'
>    filename = path + 'yelp_dataset/review_test.json'
> 
>    # read the entire file
>    reviews = pd.read_json(filename, lines=True)
> 
> I cannot read any of the test files in chunks. I get the same error.
> 
> CODE: 
> 
> My code currently looks like this
> 
>    path 'file://localhost/Users/.../DSC_Intro/'
>    filename = path + 'yelp_dataset/review_test.json'
> 
>    # create a reader to read in chunks
> 
>    review_reader = pd.read_json(StringIO(filename), lines=True, chunksize=1)
> 
> 
> `type(review_reader)` returns
> 
>    pandas.io.json.json.JsonReader
> 
> as expected.  However
> 
>    for chunk in review_reader:
>    print(chunk)
> 
> throws an error on all test files.
> 
>    /Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _parse_no_numpy(self)
>        869     if orient == "columns":
>        870	    self.obj = DataFrame(
>    --> 871             loads(json, precise_float=self.precise_float), dtype=None)
>        872     elif orient == "split":
>        873         decoded = {str(k): v for k, v in compat.iteritems(
> 
>    ValueError: Unexpected character found when decoding 'false'
> 
> INPUT: Sample JSON
> 
> {"review_id":"Amo5gZBvCuPc_tZNpHwtsA","user_id":"DzZ7piLBF-WsJxqosfJgtA","business_id":"qx6WhZ42eDKmBchZDax4dQ","stars":5.0,"useful":1,"funny":0,"cool":0,"text":"Our family LOVES the food here. Quick, friendly, delicious, and a great restaurant to take kids to. 5 stars!","date":"2017-03-27 01:14:37"}
> 
> references: 
>    https://www.yelp.com/dataset/documentation/main
>    https://courses.springboard.com/courses/448797
> 
> 
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev



More information about the Pandas-dev mailing list