sys.stdin and re.search
Jimmie Fulton
jimmie-dated-1043474834.7879b0 at illumid.com
Mon Jan 20 22:12:02 EST 2003
On Mon, 20 Jan 2003 20:35:35 -0600, Jimmie Fulton wrote:
> On Sun, 19 Jan 2003 23:26:59 -0800, Erik Max Francis wrote:
>
>> By the way, a more idiomatic way to iterate over every line in a file
>> is:
>>
>> while 1:
>> line = F.readline()
>> if not line:
>> break
>> ...
>>
>> In modern versions, you can just iterate over the file object itself:
>>
>> for line in F:
>> ...
>
> Yep. That's what I started with, but I tried different looping methods to
> see if that was part of the problem.
>
> I guess I should provide more info, although I think I know the cause
> already...
>
> Python: Python 2.2.1
> System is Gentoo Linux 1.2.
>
> Here is my code... with a few changes from the suggestions I've received.
> :)
>
>
> <code>
> #!/usr/bin/env python
> # rblLogger.py
>
> import re
> import sys
> import psycopg
>
> pattern = re.compile(r"(^\S* \S*\b).*(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b).*(http://.*)")
>
> dsn = "host=localhost user=jimmie"
>
> con = psycopg.connect(dsn)
> cur = con.cursor()
>
> def addEntry(datetime,ipaddress,reason):
> cur.execute("INSERT INTO rblblocks (datetime,ipaddress,reason) VALUES ('%s','%s','%s')" % (datetime,ipaddress,reason))
> con.commit()
>
> # Does not work
> f = sys.stdin
> # OR
> # Does work
> # f = os.popen("tail -n+1 --follow=name | tai64nlocal | grep rblsmtpd","r")
>
> for line in f:
> print line #Testing, delete later
> match = pattern.search(line)
> if match:
> datetime,ipaddress,reason = match.groups()
> #addEntry(datetime,ipaddress,reason)
> print match #Testing, delete later
>
> </code>
>
> This is a snippet of the file I'm trying to process...
>
> <file snippet>
> @400000003e2a4dd61e2e95dc rblsmtpd: 66.129.124.181 pid 1559: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
> @400000003e2a4dd6310dda8c tcpserver: end 1559 status 0
> @400000003e2a4dd6310e2c94 tcpserver: status: 0/40
> @400000003e2a4f60352143d4 tcpserver: status: 1/40
> @400000003e2a4f60352e9dcc tcpserver: pid 1563 from 204.126.2.42
> @400000003e2a4f60382aab74 tcpserver: ok 1563 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :204.126.2.42::60601
> @400000003e2a4f612932593c tcpserver: end 1563 status 0
> @400000003e2a4f612932af2c tcpserver: status: 0/40
> @400000003e2a502e3005348c tcpserver: status: 1/40
> @400000003e2a502e301253ec tcpserver: pid 1569 from 66.129.124.180
> @400000003e2a502e3301de74 tcpserver: ok 1569 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :66.129.124.180::21995
> @400000003e2a502f102f0b24 rblsmtpd: 66.129.124.180 pid 1569: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
> </file snippet>
>
> The command I'm issuing on the command line:
> ./rblLogger.py < /var/log/qmail/qmail-smtpd/current | tai64nlocal | grep rblsmtpd
>
> tai64nlocal turns @400000003e2a502f102f0b24 into an ISO timestamp format.
>
> After running this, I see output due to the "print line" line. The output
> matches exactly what would have been output if I used a popen with the
> same command line.
>
> I've figured out that any time the file goes through grep AND sys.stdin, it doesn't
> work. If I preprocess the file through grep to a new file, and still pass it through
> tai64nlocal, it works fine. If I use popen to process the original file
> the same way, it works!
>
> Now that I know greping is the problem, I'll just adjust my regex to make
> sure 'rblsmtpd' is part of the match and only process lines that contain
> this.
>
> Any comments welcome, but I have my workaround, anyway.
>
> Thanks,
>
> Jimmie
Grrrr!
Just figured out that
tail -n+1 --follow=name | tai64nlocal | grep rblsmtpd | rblLogger.py
works fine. Maybe I'm just having some shell funkiness...
Jimmie
More information about the Python-list
mailing list