Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Mon May 27 16:47:40 EDT 2013

Hey, everyone! 

I'm very new to Python and have only been using it for a couple of days, but have some experience in programming (albeit mostly statistical programming in SAS or R) so I'm hoping someone can answer this question in a technical way, but without using an abundant amount of jargon.

The issue I'm having is that I'm trying to pull information from a website to practice Python with, but I'm having trouble getting the data in a timely fashion. If I use the following code:

<code>
import json
import urllib

urlStr = "https://stream.twitter.com/1/statuses/sample.json"

twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]
</code>

I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that helps at all.

If I use the following code:

<code>
import urllib

urlStr = "https://stream.twitter.com/1/statuses/sample.json"

fileHandle = urllib.urlopen(urlStr)

twtrText = fileHandle.readlines()
</code>

It takes hours (upwards of 6 or 7, if not more) to finish computing the last command.

With that being said, my question is whether there is a more efficient manner to do this. I'm worried that if it's taking this long to process the .readlines() command, trying to work with the data is going to be a computational nightmare.

Thanks in advance for any insights or advice!