regex line by line over file

Rustom Mody rustompmody at gmail.com
Thu Mar 27 00:37:35 EDT 2014


On Thursday, March 27, 2014 8:53:29 AM UTC+5:30, James Smith wrote:
> I can't get this to work.
> It runs but there is no output when I try it on a file.

> #!/usr/bin/python

> import os
> import sys
> import re
> from datetime import datetime

> #logDir = '/nfs/projects/equinox/platformTools/RTLG/RTLG_logs';
> #os.chdir( logDir );

> programName = sys.argv[0]
> fileName = sys.argv[1]

> #pattern = re.compile('\s*\\"SHELF-.*,SC,.*,:\\"Log Collection In Progress\\"')
> re.M
> p = re.compile('^\s*\"SHELF-.*,SC,.*,:\\\"Log Collection In Progress\\\"')
> l = '    "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"'

> # this works :-)
> m = p.match( l )
> if m:
>     print( l )

> # this doesn't match anything (or the if doesn't work) :-(
> with open(fileName) as f:
>     for line in f:
>         # debug code (print the line without adding a linefeed)
>         # sys.stdout.write( line )
>         if p.match(line):
>             print(line)

> The test file just has one line:
>     "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"

Some suggestions (Im far from an re expert!)
1. Use raw strings for re's
2. You probably need non-greedy '*' (among other things)
3. Better to hack out your re in the interpreter
For that
4. Avoid compile (at least while hacking)
5. Findall will show you whats happening better than match

Heres a 'hack-session'

from re import findall
>>> l = '    "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"' 


# Start simple
>>> findall(r'^\s',l)
[' ']
>>> findall(r'^\s*',l)
['    ']
>>> findall(r'^\s*"',l)
['    "']
>>> findall(r'^\s*"SHELF-',l)
['    "SHELF-']
>>> findall(r'^\s*"SHELF-.*',l)
['    "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"Log Collection In Progress",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"']
>>> findall('^\s*"SHELF-.*',l)
['    "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"Log Collection In Progress",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"']
>>> findall('^\s*"SHELF-.SC*',l)
[]
>>> findall('^\s*"SHELF-.*SC',l)
['    "SHELF-17:LOG_COLN_IP,SC']
>>> findall('^\s*"SHELF-.*?SC',l)
['    "SHELF-17:LOG_COLN_IP,SC']
>>> findall('^\s*"SHELF-.*?,SC',l)
['    "SHELF-17:LOG_COLN_IP,SC']
>>> findall('(^\s*)"SHELF-.*?,SC',l)
['    ']
>>> findall('\(^\s*\)"SHELF-.*?,SC',l)
[]
>>> findall('(^\s*)"SHELF-.*?,SC',l)
['    ']
>>> findall('(^\s*)("SHELF-.*?,SC)',l)
[('    ', '"SHELF-17:LOG_COLN_IP,SC')]



More information about the Python-list mailing list