sys.stdin and re.search

Jimmie Fulton jimmie-dated-1043474834.7879b0 at illumid.com
Mon Jan 20 22:12:02 EST 2003


On Mon, 20 Jan 2003 20:35:35 -0600, Jimmie Fulton wrote:

> On Sun, 19 Jan 2003 23:26:59 -0800, Erik Max Francis wrote:
> 
>> By the way, a more idiomatic way to iterate over every line in a file
>> is:
>> 
>> 	while 1:
>> 	    line = F.readline()
>> 	    if not line:
>> 	        break
>> 	    ...
>> 
>> In modern versions, you can just iterate over the file object itself:
>> 
>> 	for line in F:
>> 	    ...
> 
> Yep.  That's what I started with, but I tried different looping methods to
> see if that was part of the problem.
> 
> I guess I should provide more info, although I think I know the cause
> already...
> 
> Python: Python 2.2.1
> System is Gentoo Linux 1.2.
> 
> Here is my code... with a few changes from the suggestions I've received.
> :)
> 
> 
> <code>
> #!/usr/bin/env python
> # rblLogger.py
> 
> import re
> import sys
> import psycopg
> 
> pattern = re.compile(r"(^\S* \S*\b).*(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b).*(http://.*)")
> 
> dsn = "host=localhost user=jimmie"
> 
> con = psycopg.connect(dsn)
> cur = con.cursor()
> 
> def addEntry(datetime,ipaddress,reason):
>         cur.execute("INSERT INTO rblblocks (datetime,ipaddress,reason) VALUES ('%s','%s','%s')" % (datetime,ipaddress,reason))
>         con.commit()
> 
> # Does not work
> f = sys.stdin
> # OR
> # Does work
> # f = os.popen("tail -n+1 --follow=name | tai64nlocal | grep rblsmtpd","r")
> 
> for line in f:
>         print line     #Testing, delete later
>         match = pattern.search(line)
>         if match:
>                 datetime,ipaddress,reason = match.groups()
>                 #addEntry(datetime,ipaddress,reason)
>         	print match    #Testing, delete later
> 
> </code>
> 
> This is a snippet of the file I'm trying to process...
> 
> <file snippet>
> @400000003e2a4dd61e2e95dc rblsmtpd: 66.129.124.181 pid 1559: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
> @400000003e2a4dd6310dda8c tcpserver: end 1559 status 0
> @400000003e2a4dd6310e2c94 tcpserver: status: 0/40
> @400000003e2a4f60352143d4 tcpserver: status: 1/40
> @400000003e2a4f60352e9dcc tcpserver: pid 1563 from 204.126.2.42
> @400000003e2a4f60382aab74 tcpserver: ok 1563 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :204.126.2.42::60601
> @400000003e2a4f612932593c tcpserver: end 1563 status 0
> @400000003e2a4f612932af2c tcpserver: status: 0/40
> @400000003e2a502e3005348c tcpserver: status: 1/40
> @400000003e2a502e301253ec tcpserver: pid 1569 from 66.129.124.180
> @400000003e2a502e3301de74 tcpserver: ok 1569 dsl093-024-243.hou1.dsl.speakeasy.net:66.93.24.243:25 :66.129.124.180::21995
> @400000003e2a502f102f0b24 rblsmtpd: 66.129.124.180 pid 1569: 451 Listed on SBL - see http://spamhaus.org/SBL/sbl.lasso?query=SBL5304
> </file snippet>
> 
> The command I'm issuing on the command line:
> ./rblLogger.py < /var/log/qmail/qmail-smtpd/current | tai64nlocal | grep rblsmtpd
> 
> tai64nlocal turns @400000003e2a502f102f0b24 into an ISO timestamp format.
> 
> After running this, I see output due to the "print line" line.  The output
> matches exactly what would have been output if I used a popen with the
> same command line.
> 
> I've figured out that any time the file goes through grep AND sys.stdin, it doesn't
> work.  If I preprocess the file through grep to a new file, and still pass it through
> tai64nlocal, it works fine.  If I use popen to process the original file
> the same way, it works!
> 
> Now that I know greping is the problem, I'll just adjust my regex to make
> sure 'rblsmtpd' is part of the match and only process lines that contain
> this.
> 
> Any comments welcome, but I have my workaround, anyway.
> 
> Thanks,
> 
> Jimmie

Grrrr!

Just figured out that

tail -n+1 --follow=name | tai64nlocal | grep rblsmtpd | rblLogger.py

works fine.  Maybe I'm just having some shell funkiness...

Jimmie




More information about the Python-list mailing list