Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Bryan Britten britten.bryan at gmail.com
Mon May 27 23:11:34 EDT 2013


On Monday, May 27, 2013 7:58:05 PM UTC-4, Dave Angel wrote:
> On 05/27/2013 04:47 PM, Bryan Britten wrote:
> 
> > Hey, everyone!
> 
> >
> 
> > I'm very new to Python and have only been using it for a couple of days, but have some experience in programming (albeit mostly statistical programming in SAS or R) so I'm hoping someone can answer this question in a technical way, but without using an abundant amount of jargon.
> 
> >
> 
> > The issue I'm having is that I'm trying to pull information from a website to practice Python with, but I'm having trouble getting the data in a timely fashion. If I use the following code:
> 
> >
> 
> > <code>
> 
> > import json
> 
> > import urllib
> 
> >
> 
> > urlStr = "https://stream.twitter.com/1/statuses/sample.json"
> 
> >
> 
> > twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]
> 
> > </code>
> 
> >
> 
> > I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that helps at all.
> 
> 
> 
> Which OS?

I'm operating on Windows 7.

> 
> The first question I'd ask is how big this file is.  I can't tell, since 
> 
> it needs a user name & password to actually get the file.  

If you have Twitter, you can just use your log-in information to access the file.

> But it's not unusual to need at least double that space in memory, and in Windoze 
> 
> you're limited to two gig max, regardless of how big your hardware might be.
> 
> 
> 
> If you separately fetch the file, then you can experiment with it, 
> 
> including cutting it down to a dozen lines, and see if you can deal with 
> 
> that much.
> 
> 
> 
> How could you fetch it?  With wget, with a browser (and saveAs), with a 
> 
> simple loop which uses read(4096) repeatedly and writes each block to a 
> 
> local file.  Don't forget to use 'wb', as you don't know yet what line 
> 
> endings it might use.
> 
I'm not familiar with using read(4096), I'll have to look into that. When I tried to just save the file, my computer just sat in limbo for some time and didn't seem to want to process the command. 
> 
> Once you have an idea what the data looks like, you can answer such 
> 
> questions as whether it's json at all, whether the lines each contain a 
> 
> single json record, or what.
> 
Based on my *extremely* limited knowledge of JSON, that's definitely the type of file this is. Here is a snippet of what is seen when you log in:

{"created_at":"Tue May 28 03:09:23 +0000 2013","id":339216806461972481,"id_str":"339216806461972481","text":"RT @aleon_11: Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"\u003ca href=\"http:\/\/blackberry.com\/twitter\" rel=\"nofollow\"\u003eTwitter for BlackBerry\u00ae\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":310910123,"id_str":"310910123","name":"\u2661","screen_name":"LaMarielita_","location":"","url":null,"description":"MERCADOLOGA & PUBLICISTA EN PROCESO, AMO A MI DIOS & MI FAMILIA\u2665 ME ENCANTA REIRME , MOLESTAR & HABLAR :D BFF, pancho, ale & china :) LY\u2661","protected":false,"followers_count":506,"friends_count":606,"listed_count":1,"created_at":"Sat Jun 04 15:24:19 +0000 2011","favourites_count":207,"utc_offset":-25200,"time_zone":"Mountain Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":17241,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"FF6699","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_link_color":"B40B43","profile_sidebar_border_color":"CC3366","profile_sidebar_fill_color":"E5507E","profile_text_color":"362720","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Tue May 28 02:57:40 +0000 2013","id":339213856922537984,"id_str":"339213856922537984","text":"Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"web","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":105252134,"id_str":"105252134","name":"Alejandra Le\u00f3n","screen_name":"aleon_11","location":"Guatemala","url":null,"description":"La vida se disfruta m\u00e1s, cuando no se le pone tanta importancia.","protected":false,"followers_count":143,"friends_count":251,"listed_count":0,"created_at":"Fri Jan 15 20:49:38 +0000 2010","favourites_count":83,"utc_offset":-28800,"time_zone":"Pacific Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":1863,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"F8F2FC","profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/811443451\/81abf2f37ee3e37deda396befa7fb557.jpeg","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/811443451\/81abf2f37ee3e37deda396befa7fb557.jpeg","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3578979563\/e973196904e25af5d960f2971616eb61_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3578979563\/e973196904e25af5d960f2971616eb61_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/105252134\/1364957374","profile_link_color":"F01A1A","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"7AC3EE","profile_text_color":"3D1957","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":2,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[],"user_mentions":[]},"favorited":false,"retweeted":false,"lang":"es"},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[],"user_mentions":[{"screen_name":"aleon_11","name":"Alejandra Le\u00f3n","id":105252134,"id_str":"105252134","indices":[3,12]}]},"favorited":false,"retweeted":false,"filter_level":"low"}

> 
> For all we know, the file might be a few terabytes in size.
> 
> 
> 
> 
> 
> -- 
> 
> DaveA



More information about the Python-list mailing list