Script using generators produces different results when invoked as a CGI
Barclay, Ken
Ken.Barclay at schwab.com
Sun May 4 23:31:45 EDT 2008
Hello,
I attended David Beazley's awe-inspiring tutorial on the use of
generators in systems programming:
http://www.dabeaz.com/generators/
<BLOCKED::http://www.dabeaz.com/generators/>
I used his approach to write a web tool that can display search results
from different log files. But the resulting script produced fewer
results when invoked as CGI than it did when run from the command line,
and I can't figure out why.
He showed how to 'pipeline' generators together in a form of declarative
programming. For example, put the following generators together to grep
lines out of a set of files:
# Generate a list of files
lognames = gen_find("*.bz2","/tmp/bz2_alldata/pa_new")
# Yield a sequence of file objects that have been suitably opened
logfiles = gen_open(lognames)
# Concatenate multiple generators into a single sequence
loglines = gen_cat(logfiles)
# Grep a sequence of lines that match a re pattern
searchlines = gen_grep(r'fried',loglines)
The functions are only a few lines each:
def gen_open(filenames):
for name in filenames:
if name.endswith(".bz2"):
yield bz2.BZ2File(name)
def gen_cat(sources):
for s in sources:
for item in s:
yield item
import re
def gen_grep(pat,lines):
patc = re.compile(pat)
for line in lines:
if patc.search(line): yield line
Since they're generators, processing the data doesn't start until you
kick off the iteration on the final generator.
Problem: For small sets of files this works great. But when I had 19Meg
worth of log files in a test directory, the script would return the
correct number of matching lines (288) only when it was invoked directly
from the command line. When invoked from a CGI script, it returns 220
lines instead (written to the "tempfile", below.) I don't know where
that limit is coming from. If more logs are added to the test directory,
the result is always the same 220 lines.
I'm using Python 2.5.1 on Red Hat Linux 3.2.3-47. Below is the whole
script I was testing with. It's using hard-coded values in place of ones
I'll be getting from an HTML form (generated with HTMLgen) presented to
the user.
There are no exceptions or errors of any kind. Any pointers on what
might be happening here would be welcome!
Thanks
Ken
#!/usr/local/bin/python
import tempfile
from genfind import gen_find
from genopen import gen_open
from gencat import gen_cat
from gengrep import gen_grep
tf = tempfile.mkstemp()
tmpfile = open(tf[1],'w')
lognames = gen_find("*.bz2","/tmp/bz2_alldata/pa_new")
logfiles = gen_open(lognames)
loglines = gen_cat(logfiles)
searchlines = gen_grep(r'fried',loglines)
for line in searchlines:
print >> tmpfile, line,
tmpfile.close()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080504/484025be/attachment.html>
More information about the Python-list
mailing list