Regex Group case change

Cameron Simpson cs at cskk.id.au
Thu Oct 1 21:24:08 EDT 2020


It is good to see a nice small piece of code which we can run. Thank 
you.

So there are a number of things to comment about in the code below; 
comments inline under the relevant piece of code (we prefer the "inline 
reply" style here, it reads like a conversation):

On 01Oct2020 15:15, Raju <ch.nagaraju008 at gmail.com> wrote:
>import re
>import os
>import sys
>
>#word = "7 the world" # 7 The world
>#word = "Brian'S" # Brian's
>#word = "O'biran"# O'Brian
>#word = "Stoke-On-Trent" # Stoke-on-Trent; here i need to lower the case of middle word(i.e -On-)

There is an opinion often held that regexp are overused. To lowercase 
the "on" I would reach for str.split, for example:

    left, middle, right = word.split('-', 2)
    middle = middle.lower()
    modified_word = '-'.join([left, middle, right])

I have broken that out for readability, and it is hardwired for a 3 part 
word. See the docs for str.split and str.join:

    https://docs.python.org/3/library/stdtypes.html#str.join
    https://docs.python.org/3/library/stdtypes.html#str.split

So: no regexps, which I'm sure you now realise can be tricky to get 
correct, and are hard to read.

>def wordpattern(word):
>     output = ''
>     if re.match("^\d+|w*$",word):
>            output = word.upper()
>     elif re.match("\w+\'\w{1}$",word):
>            output = word.capitalize()
>     elif re.match("(\d+\w* )(Hello)( \w+)",word))
>            group(1)group(2).title()group(3)
>     else:
>            output.title()

First off, please try to use raw strings for regular expressions, it 
avoids many potential accidents to do with backslash treatment by Python 
and regexps. So rewritten:

>def wordpattern(word):
>     output = ''
>     if re.match(r"^\d+|w*$",word):
>            output = word.upper()
>     elif re.match(r"\w+\'\w{1}$",word):
>            output = word.capitalize()
>     elif re.match(r"(\d+\w* )(Hello)( \w+)",word))
>            group(1)group(2).title()group(3)
>     else:
>            output.title()

First up, this function does not return a value - it has no return 
statement. You probably want:

    return output

at the end. Also, your default output seems to be ''; would it not be 
better to return word unchanged? So I'd start with:

    output = word

up the front.

Then there's a bunch of small issues in the main code:

>     if re.match(r"^\d+|w*$",word):
>            output = word.upper()

You probably want "\w", not "w" (missing backslash). A plain "w" matches 
the letter "w". Also, you probabloy want "\w+", not "\w*" - meaning "at 
least one" instead of "zero or more" aka "at least 0". With the "*" it 
can match zero character (the empty string).

>     elif re.match(r"\w+\'\w{1}$",word):

The "\w{1}" can just be written "\w" - the default repetition for a 
subpattern is "exactly once", which is what "{1}" means. So not 
incorrect, just more complicated than required.

>            output = word.capitalize()
>     elif re.match(r"(\d+\w* )(Hello)( \w+)",word))

Typically people put the whitepsace outside the group, because they 
usually want the word and not the spaces around it. Of course, the cost 
of that s that you would need to put the spaces back in later. So in 
fact this works for your use case.

>            group(1)group(2).title()group(3)

You need to join these together, and assign the result to output:

             output = group(1) + group(2).title() + group(3)

>     else:
>            output.title()

You need to assign the result to output:

             output = output.title()

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list