sys.stdin and re.search

Jimmie Fulton jimmie-dated-1043474834.7879b0 at illumid.com
Mon Jan 20 21:35:35 EST 2003


On Sun, 19 Jan 2003 23:26:59 -0800, Erik Max Francis wrote:

> By the way, a more idiomatic way to iterate over every line in a file
> is:
> 
> 	while 1:
> 	    line = F.readline()
> 	    if not line:
> 	        break
> 	    ...
> 
> In modern versions, you can just iterate over the file object itself:
> 
> 	for line in F:
> 	    ...

Yep.  That's what I started with, but I tried different looping methods to
see if that was part of the problem.

I guess I should provide more info, although I think I know the cause
already...

Python: Python 2.2.1
System is Gentoo Linux 1.2.

Here is my code... with a few changes from the suggestions I've received.
:)


<code>
#!/usr/bin/env python
# rblLogger.py

import re
import sys
import psycopg

pattern = re.compile(r"(^\S* \S*\b).*(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b).*(http://.*)")

dsn = "host=localhost user=jimmie"

con = psycopg.connect(dsn)
cur = con.cursor()

def addEntry(datetime,ipaddress,reason):
        cur.execute("INSERT INTO rblblocks (datetime,ipaddress,reason) VALUES ('%s','%s','%s')" % (datetime,ipaddress,reason))
        con.commit()

# Does not work
f = sys.stdin
# OR
# Does work
# f = os.popen("tail -n+1 --follow=name | tai64nlocal | grep rblsmtpd","r")

for line in f:
        print line     #Testing, delete later
        match = pattern.search(line)
        if match:
                datetime,ipaddress,reason = match.groups()
                #addEntry(datetime,ipaddress,reason)
        	print match    #Testing, delete later

</code>

This is a snippet of the file I'm trying to process...

<file snippet>
@400000003e2a4dd61e2e95dc rblsmtpd: 66.129.124.181 pid 1559: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
@400000003e2a4dd6310dda8c tcpserver: end 1559 status 0
@400000003e2a4dd6310e2c94 tcpserver: status: 0/40
@400000003e2a4f60352143d4 tcpserver: status: 1/40
@400000003e2a4f60352e9dcc tcpserver: pid 1563 from 204.126.2.42
@400000003e2a4f60382aab74 tcpserver: ok 1563 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :204.126.2.42::60601
@400000003e2a4f612932593c tcpserver: end 1563 status 0
@400000003e2a4f612932af2c tcpserver: status: 0/40
@400000003e2a502e3005348c tcpserver: status: 1/40
@400000003e2a502e301253ec tcpserver: pid 1569 from 66.129.124.180
@400000003e2a502e3301de74 tcpserver: ok 1569 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :66.129.124.180::21995
@400000003e2a502f102f0b24 rblsmtpd: 66.129.124.180 pid 1569: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
</file snippet>

The command I'm issuing on the command line:
./rblLogger.py < /var/log/qmail/qmail-smtpd/current | tai64nlocal | grep rblsmtpd

tai64nlocal turns @400000003e2a502f102f0b24 into an ISO timestamp format.

After running this, I see output due to the "print line" line.  The output
matches exactly what would have been output if I used a popen with the
same command line.

I've figured out that any time the file goes through grep AND sys.stdin, it doesn't
work.  If I preprocess the file through grep to a new file, and still pass it through
tai64nlocal, it works fine.  If I use popen to process the original file
the same way, it works!

Now that I know greping is the problem, I'll just adjust my regex to make
sure 'rblsmtpd' is part of the match and only process lines that contain
this.

Any comments welcome, but I have my workaround, anyway.

Thanks,

Jimmie






More information about the Python-list mailing list