python3 regex?

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Sat Sep 10 02:18:53 EDT 2016


dkoleary at olearycomputers.com writes:

> Hey;
>
> Long term perl ahderent finally making the leap to python.  From my
> reading, python, for the most part, uses perl regex.. except, I can't
> seem to make it work...
>
> I have a txt file from which I can grab specific titles via a perl
> one-liner:
>
> $ perl -ne 'print if (m{^("?)[1-9]*\.})' tables
> 1. ${title1}
> 2. ${title2}
> "3. ${title3}",,,
> 4. one more title
> 5. nuther title
> 6. and so on...,,
> ...
> 25. last title
>
> I can't seem to get the same titles to appear using python:
>
>
> $ python -V
> Python 3.5.2
> $ python
> Python 3.5.2 (default, Jul  5 2016, 12:43:10) 
> [GCC 5.4.0 20160609] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import os
>>>> import re
>>>> with open('tables','rt') as f:
> 	data = f.read()
>
> printing data results in the output I would expect..
>
> Trying to compile a regex and display the results does not show the
> same results I get from perl.
>
>>>> regex = r'^("?)[1-9]*\.'
>>>> re.search(regex, data)
>>>>
>
>>>> p = re.compile(r'^("?)[1-9]*\.')
>>>> p
> re.compile('^("?)[1-9]*\\.')
>>>> p.findall(data)
>
> I've tried any number of options shown on the net all with the same
> result.  Can someone point out what I'm messing up?

You need to compile (or use) your regex in multiline mode for the anchor
to match the beginnings of lines. But it's probably better to process
each line separately, as you do in Perl, so skip this point.

You have a capturing group in your regex, so findall returns what that
group matched. Typically that is an empty string (no doublequote). But
your capturing group seems unintended, so maybe skip this point.

Without that group, findall would return what the whole pattern matched,
which is still not the whole line. It's surely better to process each
line separately, as you do in Perl. So skip this point, too.

Then you don't even need the anchor. Instead, use re.match to match at
the start of the string. (The newer re.fullmatch attempts to match the
whole string.)

import re
p = re.compile('"?[1-9]*\.')
with open('tables', 'rt') as f:
    for line in f:
        if p.match(line):
            print(line, end = '')

Note that 'rt' is the default mode, and you won't catch line 10 with
your regex.

More importantly, note that you can iterate over the lines from f
directly. There's also a method, f.readlines, to read the lines into a
list, if you need them all at once, but that's not what you do in Perl
either.



More information about the Python-list mailing list