regex line by line over file
Steven D'Aprano
steve at pearwood.info
Thu Mar 27 01:32:03 EDT 2014
On Wed, 26 Mar 2014 20:23:29 -0700, James Smith wrote:
> I can't get this to work.
> It runs but there is no output when I try it on a file.
Simplify, simplify, simplify. Either you will find the problem, or you
will find the simplest example that demonstrates the problem.
In this case, the problem is that your regex is not matching what you
expect it to match. So eliminate all the irrelevant cruft that is
just noise, complicating the problem. Start with the simplest thing that
works and add complexity until the problem returns.
Eliminate the file. You can embed your data in a string, and try to match
the regex against the string. Eliminate all the old commented-out code,
that's just irrelevant. Eliminate reading from sys.argv, that has nothing
to do with the problem.
So we get down to this:
import re
pat = re.compile('^\s*\"SHELF-.*,SC,.*,:\\\"Log Collection In Progress\\\"')
line1 = ' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"'
print(pat.match(line1))
which matches.
Now let's get rid of those leaning toothpicks. We can use print
to see the repr() of the pattern, and a raw string to clean it up.
At the interactive interpreter:
py> print(pat.pattern)
^\s*"SHELF-.*,SC,.*,:\"Log Collection In Progress\"
Similarly for line1. I'll also use implicit concatenation to split
it over multiple source lines. Raw strings, r'' or r"", don't need
to escape the backslashes. Implicit concatenation means that two
strings with no operator between them is implicitly concatenated
into a single string:
'abc' "def"
becomes 'abcdef'. By putting the pieces inside parentheses, I can
put each piece on a separate line, which makes it easier to read
compared to one giant long line.
pat = re.compile(
r'^\s*"SHELF-.*,SC,.*,:\"Log Collection In Progress\"'
)
line1 = (
' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"'
'Log Collection In Progress",NONE:1700000035-6364-1048,:'
'YEAR=2014,MODE=NONE"'
)
And at the interactive interpreter, I get a match:
py> pat.match(line1)
<_sre.SRE_Match object at 0xb721ad78>
So now we move on to the content of the one-line file. I don't have
access to the file, so all I have to go by is what you state it
contains:
[quote]
The test file just has one line:
"SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"
[end quote]
which I interpret like this:
line2 = ' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"\n'
(note the newline at the end), or if you prefer:
line2 = (
' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"'
'Log Collection In Progress",NONE:1700000035-6364-1048,:'
'YEAR=2014,MODE=NONE"\n'
)
Except for the newline, it equals line1, and it also matches the
pattern:
py> pat.match(line2)
<_sre.SRE_Match object at 0xb721ab48>
So now we know that the regex matches the data you think you have.
The next questions are:
- are you reading the right file?
- are you mistaken about the content of the file?
I can't help you with the first. But the second: try running this:
# line2 and pat as defined above
filename = sys.argv[1]
with open(filename) as f:
for line in f:
print(len(line), line==line2, repr(line))
print(repr(pat.match(line)))
which will show you what you have and whether or not it matches
what you think it has. I expect that the file contents is not what
you think it is, because the regex is matching the sample line.
Good luck!
--
Steven
More information about the Python-list
mailing list