more python3 regex?

Peter Otten __peter__ at web.de
Sun Sep 11 14:21:25 EDT 2016


Doug OLeary wrote:

> Hey
> 
> This one seems like it should be easy but I'm not getting the expected
> results.
> 
> I have a chunk of data over which I can iterate line by line and print out
> the expected results:
> 
>   for l in q.findall(data):
> #   if re.match(r'(Name|")', l):
> #     continue
>     print(l)
> 
> $ ./testies.py | wc -l
> 197
> 
> I would like to skip any line that starts with 'Name' or a double quote:
> 
> $ ./testies.py | perl -ne 'print if (m{^Name} || m{^"})'
> Name IP Address,Site,
> "",,7 of 64
> Name,IP Address,Site,
> "",,,8 of 64
> Name,IP Address,Site,
> "",,,9 of 64
> Name,IP Address,Site,
> "",,,10 of 64
> Name,IP Address,Site,
> "",,,11 of 64
> Name IP Address,Site,
> 
> $ ./testies.py | perl -ne 'print unless (m{^Name} || m{^"})' | wc -l
> 186
> 
> 
> When I run with the two lines uncommented, *everything* gets skipped:
> 
> $ ./testies.py
> $
> 
> Same thing when I use a pre-defined pattern object:
> 
> skippers = re.compile(r'Name|"')
>   for l in q.findall(data):
>     if skippers.match(l):
>       continue
>     print(l)
> 
> Like I said, this seems like it should be pretty straight forward so I'm
> obviously missing something basic.
> 
> Any hints/tips/suggestions gratefully accepted.

Add a print() to the matching case to see where you messed up, e. g.

for line in q.findall(data):
    if re.match(r'(Name|")', line):
        print("SKIPPING", repr(line))
        continue
    print(line)

In the future try to provide small self-contained scripts that others can 
run. That will also make your own debugging experience more pleasant ;)

Had you tried

$ cat oleary.py
import re

lines = """\
foo show
"bar" skip
Name skip
 Name show
 "baz" show
baz show
""".splitlines()

for line in lines:
    if re.match(r'^(Name|")', line):
        continue
    print(line)
$ python3 oleary.py 
foo show
 Name show
 "baz" show
baz show

you could easily have convinced yourself that both the loop and the regex 
work as expected.

By the way, many simple text-processing problems can be solved without 
regular expressions. Most Python users will spell a simple filter like the 
above above as

for line in lines:
    if not line.startswith(("Name", '"')):
        print(line)

or similar.





More information about the Python-list mailing list