[Pandas-dev] Unexpected error using pd.read_json with chunksize
Vicki Brown
vlb at cfcl.com
Sun Jun 23 22:54:23 EDT 2019
TL;DR
I am getting an error I don't understand from my pd.read_json reader:
ValueError: Unexpected character found when decoding 'false'
Details
I am working a set of code exercises that use a Yelp reviews dataset. At this point in the exercises I am supposed to read in review.json which has one JSON record per line (and 6 million lines! ;-).
Because the complete dataset file is enormous, the tutorial recommends using chunksize and building a json reader. When I try this, I get an error, even with my test input.
I am befuddled.
TESTS:
I have smaller versions of the JSON file, with 100, 3, and 1 records, for testing. I can read all of the test files into a pandas dataframe and examine them, e.g..
path = 'file://localhost/Users/vlb/Learn/DSC_Intro/'
filename = path + 'yelp_dataset/review_test.json'
# read the entire file
reviews = pd.read_json(filename, lines=True)
I cannot read any of the test files in chunks. I get the same error.
CODE:
My code currently looks like this
path 'file://localhost/Users/.../DSC_Intro/'
filename = path + 'yelp_dataset/review_test.json'
# create a reader to read in chunks
review_reader = pd.read_json(StringIO(filename), lines=True, chunksize=1)
`type(review_reader)` returns
pandas.io.json.json.JsonReader
as expected. However
for chunk in review_reader:
print(chunk)
throws an error on all test files.
/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py in _parse_no_numpy(self)
869 if orient == "columns":
870 self.obj = DataFrame(
--> 871 loads(json, precise_float=self.precise_float), dtype=None)
872 elif orient == "split":
873 decoded = {str(k): v for k, v in compat.iteritems(
ValueError: Unexpected character found when decoding 'false'
INPUT: Sample JSON
{"review_id":"Amo5gZBvCuPc_tZNpHwtsA","user_id":"DzZ7piLBF-WsJxqosfJgtA","business_id":"qx6WhZ42eDKmBchZDax4dQ","stars":5.0,"useful":1,"funny":0,"cool":0,"text":"Our family LOVES the food here. Quick, friendly, delicious, and a great restaurant to take kids to. 5 stars!","date":"2017-03-27 01:14:37"}
references:
https://www.yelp.com/dataset/documentation/main
https://courses.springboard.com/courses/448797
More information about the Pandas-dev
mailing list