[Tutor] short url processor
ian douglas
ian.douglas at iandouglas.com
Sat May 14 01:49:34 CEST 2011
Hey folks,
I'm rewriting a short url processor for my job. I had originally written
it as a multi-threaded Perl script, which works, but has socket problems
causing memory leaks. Since I'm rebuilding it to use memcache, and since
I was learning Python outside of work anyway, figured I'd rewrite it in
Python.
I'm using BaseHTTPServer, overriding do_GET and do_POST, and want to set
up a custom logging mechanism so I don't have to rewrite a separate log
parser, which I'll eventually rewrite in Python as well.
The problem I'm having, though, is that the BaseHTTPServer setup is
outputting what appears to be an apache-style log to STDOUT, but the
logging.debug or logging.info calls I make in the code are also going to
STDOUT despite my attempt to use logging.basicConfig() overrides and
setting a filename, etc.
Here's the basics of what I'm doing. Forgive my code, I've already been
told it's "ugly", I'm new to Python and come from a background of Perl/PHP.
import struct
import string,cgi,time
import psycopg
import logging
import re
import memcache
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
from time import strftime,localtime
class clientThread(BaseHTTPRequestHandler):
def
log_my_request(self,method,request,short_url,http_code,long_url,cached,notes):
logging.debug("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s",
self.client_address[0],
time.strftime("%Y-%m-%d %H:%M:%S",localtime()),
method, # get or post
request, # requested entity
short_url, # matching short_url based on
entity, if any
http_code, # 200, 301, 302, 404, etc
long_url, # url to redirect to, if there was one
cached, # 'hit', 'miss', 'miss-db', 'error'
notes # extra notes for the log file only
)
return
def do_GET(self)
# logic goes here for finding a short url form
memcache, then writing the appropriate
# output data to the socket, then logging happens:
self.log_my_request(getpost,orig_short_url,short_url,'302',long_url,'hit','')
return
def main():
if mc.get('dbcheck'): # memcache already has some data
print("memcache already primed with data")
else: # nothing in memcache, so load it up from database
print('Connecting to PG')
cur.execute("SELECT count(*) FROM short_urls") ;
mycount = cur.fetchone() ;
print("fetching %s entries", mycount)
cur.execute("SELECT short_url,long_url FROM short_urls")
giant_list = cur.fetchall()
# cache a marker that tells us we've already
initialized memcache with db data
mc.set('dbcheck','databasetest',0)
# I'm sure there's a MUCH more efficient way of doing
this ... multi-set of some sort?
for i in giant_list:
if i[0]:
if i[1]:
mc.set(i[0], i[1])
print("finished retrieving %s entries plus set up a new
dictionary with all values" % mycount)
#{{ set up the socket, bind to port, and wait for incoming
connections
try:
server = HTTPServer(('',8083), clientThread)
print 'short url processing has begun'
# this is where I try to tell Python that I only want
my message in my log:
# no INFO:username prefix, etc., and also to write it
to a file
logging.basicConfig(level=logging.DEBUG)
logging.basicConfig(format='%(message)s',
filename='/tmp/ian.txt')
server.serve_forever()
except KeyboardInterrupt:
print '^C received, shutting down server'
server.socket.close()
My code runs without any errors, though I have left some code out of
this Email that I didn't feel was relevant such as the logic of seeing
if a short url exists in memcache, trying to fetch from the db if there
was no match, and if the db lookup also fails, force-deleting short urls
from memcache based on other instructions, that sort of thing. None of
it deals with logging or the BaseHTTPServer code.
To recap, the code runs, redirects are working, but ALL output goes to
STDOUT. I can understand that print statements would go to STDOUT, but
the BaseHTTPServer seems to want to write the Apache-style log to
STDOUT, and my logging.info() call also prints to STDOUT instead of my file.
I'd love to hear any thoughts from people that have had to deal with
this. The logging is the last piece of the puzzle for me.
Thanks,
Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110513/3b4dc218/attachment-0001.html>
More information about the Tutor
mailing list