Code improvement question

Peter J. Holzer hjp-python at hjp.at
Fri Nov 17 06:17:44 EST 2023


On 2023-11-16 11:34:16 +1300, Rimu Atkinson via Python-list wrote:
> > > Why don't you use re.findall?
> > > 
> > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
> > 
> > I think I can see what you did there but it won't make sense to me - or
> > whoever looks at the code - in future.
> > 
> > That answers your specific question. However, I am in awe of people who
> > can just "do" regular expressions and I thank you very much for what
> > would have been a monumental effort had I tried it.
> 
> I feel the same way about regex. If I can find a way to write something
> without regex I very much prefer to as regex usually adds complexity and
> hurts readability.

I find "straight" regexps very easy to write. There are only a handful
of constructs which are all very simple and you just string them
together. But then I've used regexps for 30+ years, so of course they
feel natural to me.

(Reading regexps may be a bit harder, exactly because they are to
simple: There is no abstraction, so a complicated pattern results in a
long regexp.)

There are some extensions to regexps which are conceptually harder, like
lookahead and lookbehind or nested contexts in Perl. I may need the
manual for those (especially because they are new(ish) and every
language uses a different syntax for them) or avoid them altogether.

Oh, and Python (just like Perl) allows you to embed whitespace and
comments into Regexps, which helps readability a lot if you have to
write long regexps.


> You might find https://regex101.com/ to be useful for testing your regex.
> You can enter in sample data and see if it matches.
> 
> If I understood what your regex was trying to do I might be able to suggest
> some python to do the same thing. Is it just removing numbers from text?

Not "removing" them (as I understood it), but extracting them (i.e. find
and collect them).

> > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)

\b         - a word boundary.
[0-9]{2,7} - 2 to 7 digits
-          - a hyphen-minus
[0-9]{2}   - exactly 2 digits
-          - a hyphen-minus
[0-9]{2}   - exactly 2 digits
\b         - a word boundary.

Seems quite straightforward to me. I'll be impressed if you can write
that in Python in a way which is easier to read.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20231117/740f2f3e/attachment.sig>


More information about the Python-list mailing list