[Tutor] Downloading S3 Logs

Randy Raymond randyeraymond at mchsi.com
Thu Feb 11 17:54:14 CET 2010


Mr. Mao, I am not a professional programmer.  However, I do think that you want to reduce the number of "things" that you have to do in loops that go through the transaction (log entries in your case).  Since you are converting to a datetime, perhaps rather than doing a calculation, you should remove the calculation and compare the datetime to yesterday (yesterday=now-1, whatever the datetime needs to make it yesterday) in the "if" statement (I.e. change "time_differnce < 1" to recordDate (<-make this instead of time_difference).  

You can try changing things a bit to see if it speeds up.  For example, if a lot of messages are failing the "key.name.startswith("log/access_log")" test, then maybe make a nested if, testing "key.name.startswith("log/access_log")" before you actually test the date difference.  Or, switch it around and test the date first, the the "key.name.startswith("log/access_log")".

Sincerely,
Randy Raymond


From: Lao Mao 
Sent: Thursday, February 11, 2010 10:23 AM
To: tutor at python.org 
Subject: [Tutor] Downloading S3 Logs


Hello,

I've written the below to get the previous day's logs from an Amazon S3 bucket.


#!/usr/bin/python 
import time 
from datetime import datetime 
import boto 


daily_s3_log = open("/tmp/s3logs", "w+") 
now = datetime.now() 
connection = boto.connect_s3() 
bucket = connection.get_bucket("downloads.sekrit.co.uk") 
todays_keys = [] 


for key in bucket: 
  time_difference = (now - 
datetime(*time.strptime(key.last_modified.split(".")[0], "%Y-%m-%dT%H: 
%M:%S")[0:6])).days 
  if time_difference < 1 and key.name.startswith("log/access_log"): 
    todays_keys.append(key) 


for key in todays_keys: 
  key.get_file(daily_s3_log) 


daily_s3_log.close() 


This takes about 2 mins to download a day's logs (about 25M). 


I'd appreciate any improvements or feedback on the above. 


For example, would it make sense to make the first loop into a generator function that yields the interesting keys?  Also is there a cleaner way to do the date comparison in Python 2.4?

Thanks,

Laomao







--------------------------------------------------------------------------------


_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100211/93367926/attachment.htm>


More information about the Tutor mailing list