re.search - Pattern matching review

Mon May 30 12:38:29 EDT 2016

On 30 May 2016 at 10:03, Ganesh Pal <ganesh1pal at gmail.com> wrote:
> Thanks Matt for the reply  and lovely analysis . I was trying to complicate
> the simple task :(
>
> Here is how the code looks now , the whole idea was just to  match the
> pattern and return it
>
>
> def get_block(block):
>
>     cmd = "get_block_info -l"
>     stdout, stderr, exitcode = subprocess_run(cmd)
>     #Grab the block from the stdout
>     block = re.search('(?<=\(block )[^(]*(?=\))', stdout).group()
>     # check the pattern
>     matched = re.search(r'(\d+),(\d+),(\d+):(\d+)', block)

This is still overcomplicated actually (sorry, my fault for giving you
the silly regex with lookbehind/lookahead). You don't need 2 regex
lookups, and you don't need all those groups in your 2nd regex if
you're not going to split it up:

matched = re.search("\(block (\d+,\d+,\d+:\d+)\)")
# just 1 group, containing the bit you actually want to return
if matched:
    return block.group(1)
logging.info('block not found')

Even that's potentially overcomplicated, depending on your input. Is
it necessary to check the format of the block so strictly? Perhaps a
class like [\d,:] would be fine?

>     if matched:
>        logging.info('block found")
>        return block
>     else:
>        logging.info('block not found")
>
>
> I had one final question. I was thinking if we included a try -expect block
> to catch the failures of re.search  as shown below.

You're not catching a failure of re.search, it's behaving as expected
and returning None, which has no group method.

> what kind of specific exception can we add ( just the  AttributeError
> Exception or any thing else )

Think about the reason you're catching exceptions.

You should catch exceptions if code you're calling sometimes raises an
exception where you're actually planning to handle it or where the
exception doesn't matter.

If you're not expecting something to raise an exception then it's
nearly always better to let it through, not least because it'll help
you or anyone else debugging your program later on.

> Example :
>
> try:
>     block = re.search('(?<=\(block )[^(]*(?=\))', stdout).group()

This is the only line which could cause the AttributeError

>     matched = re.search(r'(\d+),(\d+),(\d+):(\d+)', block)
> except AttributeError
>      logging.error(' Error: while determining the block ")

But what's the benefit here to trying to call .group() on `block` vs.
the `if matched` alternative above? re.search will always return a
match object (which always has a group method) or None.

Attempting to protect against other possibilities will at best provide
no benefit, and is likely to make your job harder when you're
debugging later on.

Another thing to note for future reference is if you want to use a
try:except block to inject logging you can re-raise the original
exception:

try:
    hello
except NameError:
    logging.error('ohno!')
    raise

-- 
Matt Wheeler
http://funkyh.at