how to handle response data that is streaming and chunked?

Thu Mar 21 17:35:24 EDT 2019

On 2019-03-21 21:09, Artie Ziff wrote:
> Some more info....
> 
> I wanted to add this relevant stack overflow post:
> https://stackoverflow.com/questions/17822342/understanding-python-http-streaming
> 
> The code splitting on text produces the following sample output:
> 
> {"status":"connected"}{"quote":{"ask":"195.95","asksz":"1000","bid":"195.93","bidsz":"1000","datetime":"2019-03-21T14:31:50-04:00","exch":{},"qcond":"REGULAR","symbol":"AAPL","timestamp":"1553193110"
> {"quote":{"ask":"195.95","asksz":"100","bid":"195.94","bidsz":"300","datetime":"2019-03-21T14:31:50-04:00","exch":{},"qcond":"REGULAR","symbol":"AAPL","timestamp":"1553193110"
> {"trade":{"cvol":"33872593","datetime":"2019-03-21T14:31:51-04:00","exch":{},"last":"195.9383","symbol":"AAPL","timestamp":"1553193111","vl":"100","vwap":"193.7212"
> {"quote":{"ask":"195.95","asksz":"900","bid":"195.93","bidsz":"1000","datetime":"2019-03-21T14:31:51-04:00","exch":{},"qcond":"REGULAR","symbol":"AAPL","timestamp":"1553193111"
> {"quote":{"ask":"195.95","asksz":"100","bid":"195.94","bidsz":"200","datetime":"2019-03-21T14:31:52-04:00","exch":{},"qcond":"REGULAR","symbol":"AAPL","timestamp":"1553193112"
> {"quote":{"ask":"195.95","asksz":"100","bid":"195.93","bidsz":"1000","datetime":"2019-03-21T14:31:53-04:00","exch":{},"qcond":"REGULAR","symbol":"AAPL","timestamp":"1553193113"
> {"trade":{"cvol":"33880369","datetime":"2019-03-21T14:31:53-04:00","exch":{},"last":"195.9449","symbol":"AAPL","timestamp":"1553193113","vl":"130","vwap":"193.7217"
> {"quote":{"ask":"195.95","asksz":"100","bid":"195.93","bidsz":"1000","datetime":"2019-03-21T14:31:53-04:00","exch":{},"qcond":"REGULAR","symbol":"AAPL","timestamp":"1553193113"
> {"trade":{"cvol":"33886479","datetime":"2019-03-21T14:31:54-04:00","exch":{},"last":"195.94","symbol":"AAPL","timestamp":"1553193114","vl":"100","vwap":"193.7221"
> {"trade":{"cvol":"33886650","datetime":"2019-03-21T14:31:55-04:00","exch":{},"last":"195.95","symbol":"AAPL","timestamp":"1553193115","vl":"20","vwap":"193.7221"
> {"trade":{"cvol":"33886779","datetime":"2019-03-21T14:31:55-04:00","exch":{},"last":"195.95","symbol":"AAPL","timestamp":"1553193115","vl":"100","vwap":"193.7221"
> {"trade":{"cvol":"33888294","datetime":"2019-03-21T14:31:56-04:00","exch":{},"last":"195.9489","symbol":"AAPL","timestamp":"1553193116","vl":"1464","vwap":"193.7222"
> {"quote":{"ask":"195.97","asksz":"200","bid":"195.95","bidsz":"300","datetime":"2019-03-21T14:31:56-04:00","exch":{},"qcond":"REGULAR","symbol":"AAPL","timestamp":"1553193116"
> 
> This seems like a tedious way to move to forward. That is why I am trying
> to understand what my options are for chunked data and hopefully running
> the response data through a JSON decoder ring. I want well-known data
> structure objects. I believe it to be possible. I just do not know how to
> get there from here.
> 
> Everybody may agree that doing it this way, and further, extracting this
> data from text with regex is the cornerstone of a poor implementation. My
> current journey is to learn a better way. :-)
> 
I see that you lost the final '}}'!

OK, try splitting on '}', decode the JSON, and if that fails (because of 
nested dicts), split on the next '}', etc, until it decodes successfully:

# Start at the beginning.
pos = 0

while True:
     pos = lines.find('}', pos)

     if pos < 0:
         # No (more) '}'.
         break

     # Keep the '}'.
     pos += 1

     try:
         # Try decoding it.
         line = json.loads(lines[ : pos])
     except ValueError:
         # Probably truncated; try again.
         pass
     else:
         # Success!
         print(line)

         # The remainder.
         lines = lines[pos : ]

         # Reset.
         pos = 0