sys.stdin and re.search
Jimmie Fulton
jimmie-dated-1043474834.7879b0 at illumid.com
Mon Jan 20 21:35:35 EST 2003
On Sun, 19 Jan 2003 23:26:59 -0800, Erik Max Francis wrote:
> By the way, a more idiomatic way to iterate over every line in a file
> is:
>
> while 1:
> line = F.readline()
> if not line:
> break
> ...
>
> In modern versions, you can just iterate over the file object itself:
>
> for line in F:
> ...
Yep. That's what I started with, but I tried different looping methods to
see if that was part of the problem.
I guess I should provide more info, although I think I know the cause
already...
Python: Python 2.2.1
System is Gentoo Linux 1.2.
Here is my code... with a few changes from the suggestions I've received.
:)
<code>
#!/usr/bin/env python
# rblLogger.py
import re
import sys
import psycopg
pattern = re.compile(r"(^\S* \S*\b).*(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b).*(http://.*)")
dsn = "host=localhost user=jimmie"
con = psycopg.connect(dsn)
cur = con.cursor()
def addEntry(datetime,ipaddress,reason):
cur.execute("INSERT INTO rblblocks (datetime,ipaddress,reason) VALUES ('%s','%s','%s')" % (datetime,ipaddress,reason))
con.commit()
# Does not work
f = sys.stdin
# OR
# Does work
# f = os.popen("tail -n+1 --follow=name | tai64nlocal | grep rblsmtpd","r")
for line in f:
print line #Testing, delete later
match = pattern.search(line)
if match:
datetime,ipaddress,reason = match.groups()
#addEntry(datetime,ipaddress,reason)
print match #Testing, delete later
</code>
This is a snippet of the file I'm trying to process...
<file snippet>
@400000003e2a4dd61e2e95dc rblsmtpd: 66.129.124.181 pid 1559: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
@400000003e2a4dd6310dda8c tcpserver: end 1559 status 0
@400000003e2a4dd6310e2c94 tcpserver: status: 0/40
@400000003e2a4f60352143d4 tcpserver: status: 1/40
@400000003e2a4f60352e9dcc tcpserver: pid 1563 from 204.126.2.42
@400000003e2a4f60382aab74 tcpserver: ok 1563 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :204.126.2.42::60601
@400000003e2a4f612932593c tcpserver: end 1563 status 0
@400000003e2a4f612932af2c tcpserver: status: 0/40
@400000003e2a502e3005348c tcpserver: status: 1/40
@400000003e2a502e301253ec tcpserver: pid 1569 from 66.129.124.180
@400000003e2a502e3301de74 tcpserver: ok 1569 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :66.129.124.180::21995
@400000003e2a502f102f0b24 rblsmtpd: 66.129.124.180 pid 1569: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
</file snippet>
The command I'm issuing on the command line:
./rblLogger.py < /var/log/qmail/qmail-smtpd/current | tai64nlocal | grep rblsmtpd
tai64nlocal turns @400000003e2a502f102f0b24 into an ISO timestamp format.
After running this, I see output due to the "print line" line. The output
matches exactly what would have been output if I used a popen with the
same command line.
I've figured out that any time the file goes through grep AND sys.stdin, it doesn't
work. If I preprocess the file through grep to a new file, and still pass it through
tai64nlocal, it works fine. If I use popen to process the original file
the same way, it works!
Now that I know greping is the problem, I'll just adjust my regex to make
sure 'rblsmtpd' is part of the match and only process lines that contain
this.
Any comments welcome, but I have my workaround, anyway.
Thanks,
Jimmie
More information about the Python-list
mailing list