[Pandas-dev] Unexpected error using pd.read_json with chunksize
William Ayd
william.ayd at icloud.com
Mon Jun 24 09:11:01 EDT 2019
Can you determine which line in the file is throwing the error?
> On Jun 23, 2019, at 10:54 PM, Vicki Brown <vlb at cfcl.com> wrote:
>
> TL;DR
>
> I am getting an error I don't understand from my pd.read_json reader:
>
> ValueError: Unexpected character found when decoding 'false'
>
>
> Details
>
> I am working a set of code exercises that use a Yelp reviews dataset. At this point in the exercises I am supposed to read in review.json which has one JSON record per line (and 6 million lines! ;-).
>
> Because the complete dataset file is enormous, the tutorial recommends using chunksize and building a json reader. When I try this, I get an error, even with my test input.
>
> I am befuddled.
>
> TESTS:
>
> I have smaller versions of the JSON file, with 100, 3, and 1 records, for testing. I can read all of the test files into a pandas dataframe and examine them, e.g..
>
> path = 'file://localhost/Users/vlb/Learn/DSC_Intro/'
> filename = path + 'yelp_dataset/review_test.json'
>
> # read the entire file
> reviews = pd.read_json(filename, lines=True)
>
> I cannot read any of the test files in chunks. I get the same error.
>
> CODE:
>
> My code currently looks like this
>
> path 'file://localhost/Users/.../DSC_Intro/'
> filename = path + 'yelp_dataset/review_test.json'
>
> # create a reader to read in chunks
>
> review_reader = pd.read_json(StringIO(filename), lines=True, chunksize=1)
>
>
> `type(review_reader)` returns
>
> pandas.io.json.json.JsonReader
>
> as expected. However
>
> for chunk in review_reader:
> print(chunk)
>
> throws an error on all test files.
>
> /Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _parse_no_numpy(self)
> 869 if orient == "columns":
> 870 self.obj = DataFrame(
> --> 871 loads(json, precise_float=self.precise_float), dtype=None)
> 872 elif orient == "split":
> 873 decoded = {str(k): v for k, v in compat.iteritems(
>
> ValueError: Unexpected character found when decoding 'false'
>
> INPUT: Sample JSON
>
> {"review_id":"Amo5gZBvCuPc_tZNpHwtsA","user_id":"DzZ7piLBF-WsJxqosfJgtA","business_id":"qx6WhZ42eDKmBchZDax4dQ","stars":5.0,"useful":1,"funny":0,"cool":0,"text":"Our family LOVES the food here. Quick, friendly, delicious, and a great restaurant to take kids to. 5 stars!","date":"2017-03-27 01:14:37"}
>
> references:
> https://www.yelp.com/dataset/documentation/main
> https://courses.springboard.com/courses/448797
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
More information about the Pandas-dev
mailing list