[Tutor] short url processor

Sat May 14 01:49:34 CEST 2011

Hey folks,

I'm rewriting a short url processor for my job. I had originally written 
it as a multi-threaded Perl script, which works, but has socket problems 
causing memory leaks. Since I'm rebuilding it to use memcache, and since 
I was learning Python outside of work anyway, figured I'd rewrite it in 
Python.

I'm using BaseHTTPServer, overriding do_GET and do_POST, and want to set 
up a custom logging mechanism so I don't have to rewrite a separate log 
parser, which I'll eventually rewrite in Python as well.

The problem I'm having, though, is that the BaseHTTPServer setup is 
outputting what appears to be an apache-style log to STDOUT, but the 
logging.debug or logging.info calls I make in the code are also going to 
STDOUT despite my attempt to use logging.basicConfig() overrides and 
setting a filename, etc.

Here's the basics of what I'm doing. Forgive my code, I've already been 
told it's "ugly", I'm new to Python and come from a background of Perl/PHP.

import struct
import string,cgi,time
import psycopg
import logging
import re
import memcache
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
from time import strftime,localtime

class clientThread(BaseHTTPRequestHandler):
         def 
log_my_request(self,method,request,short_url,http_code,long_url,cached,notes):
                 logging.debug("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s",
                         self.client_address[0],
                         time.strftime("%Y-%m-%d %H:%M:%S",localtime()),
                         method, # get or post
                         request, # requested entity
                         short_url, # matching short_url based on 
entity, if any
                         http_code, # 200, 301, 302, 404, etc
                         long_url, # url to redirect to, if there was one
                         cached, # 'hit', 'miss', 'miss-db', 'error'
                         notes # extra notes for the log file only
                         )
                 return

         def do_GET(self)
                 # logic goes here for finding a short url form 
memcache, then writing the appropriate
                 # output data to the socket, then logging happens:

self.log_my_request(getpost,orig_short_url,short_url,'302',long_url,'hit','')
                 return

def main():
         if mc.get('dbcheck'): # memcache already has some data
                 print("memcache already primed with data")
         else: # nothing in memcache, so load it up from database
                 print('Connecting to PG')
                 cur.execute("SELECT count(*) FROM short_urls") ;
                 mycount = cur.fetchone() ;
                 print("fetching %s entries", mycount)
                 cur.execute("SELECT short_url,long_url FROM short_urls")
                 giant_list = cur.fetchall()

                 # cache a marker that tells us we've already 
initialized memcache with db data
                 mc.set('dbcheck','databasetest',0)

                 # I'm sure there's a MUCH more efficient way of doing 
this ... multi-set of some sort?
                 for i in giant_list:
                         if i[0]:
                                 if i[1]:
                                         mc.set(i[0], i[1])

                 print("finished retrieving %s entries plus set up a new 
dictionary with all values" % mycount)

         #{{ set up the socket, bind to port, and wait for incoming 
connections
         try:
                 server = HTTPServer(('',8083), clientThread)
                 print 'short url processing has begun'

                 # this is where I try to tell Python that I only want 
my message in my log:
                 # no INFO:username prefix, etc., and also to write it 
to a file
                 logging.basicConfig(level=logging.DEBUG)
                 logging.basicConfig(format='%(message)s', 
filename='/tmp/ian.txt')

                 server.serve_forever()
         except KeyboardInterrupt:
                 print '^C received, shutting down server'
                 server.socket.close()

My code runs without any errors, though I have left some code out of 
this Email that I didn't feel was relevant such as the logic of seeing 
if a short url exists in memcache, trying to fetch from the db if there 
was no match, and if the db lookup also fails, force-deleting short urls 
from memcache based on other instructions, that sort of thing. None of 
it deals with logging or the BaseHTTPServer code.

To recap, the code runs, redirects are working, but ALL output goes to 
STDOUT. I can understand that print statements would go to STDOUT, but 
the BaseHTTPServer seems to want to write the Apache-style log to 
STDOUT, and my logging.info() call also prints to STDOUT instead of my file.

I'd love to hear any thoughts from people that have had to deal with 
this. The logging is the last piece of the puzzle for me.

Thanks,
Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110513/3b4dc218/attachment-0001.html>