Help needed to retrieve text from a text-file using RegEx

rdmurray at bitdance.com rdmurray at bitdance.com
Mon Feb 9 13:51:07 EST 2009


Oltmans <rolf.oltmans at gmail.com> wrote:
> Here is the scenario:
> 
> It's a command line program. I ask user for a input string. Based on
> that input string I retrieve text from a text file. My text file looks
> like following
> 
> Text-file:
> -------------
> AbcManager=C:\source\code\Modules\Code-AbcManager\
> AbcTest=C:\source\code\Modules\Code-AbcTest\
> DecConnector=C:\source\code\Modules\Code-DecConnector\
> GHIManager=C:\source\code\Modules\Code-GHIManager\
> JKLConnector=C:\source\code\Modules\Code-JKLConnector
> 
> -------------
> 
> So now if I run the program and user enters
> 
> DecConnector
> 
> Then I'm supposed to show them this text "C:\source\code\Modules\Code-
> DecConnector" from the text-file. Right now I'm retrieving using the
> following code which seems quite ineffecient and inelegant at the same
> time
> 
>  with open('MyTextFile.txt') as file:
>      for line in file:
>          if mName in line: #mName is the string that contains user input
>              Path =str(line).strip('\n')
>              tempStr=Path
>              Path=tempStr.replace(mName+'=',"",1)

I've normalized your indentation and spacing, for clarity.

> I was wondering if using RegEx will make this look better. If so, can
> you please suggest a Regular Expression for this? Any help is highly
> appreciated. Thank you.

This smells like it might be homework, but I'm hoping you'll learn some
useful python from what follows regardless of whether it is or not.

Since your complaint is that the above code is inelegant and inefficient,
let's clean it up.  The first three lines that open the file and set up
your loop are good, and I think you will agree that they are pretty clean.
So, I'm just going to help you clean up the loop body.

'line' is already a string, since it was read from a file.  No need to
wrap it in 'str':

              Path = line.strip('\n')
              tempStr=Path
              Path=tempStr.replace(mName+'=',"",1)

'strip' removes characters from _both_ ends of the string.  If you are
trying to make sure that you _only_ strip a trailing newline, then you
should be using rstrip.  If, on the other hand, you just want to get
rid of any leading or trailing whitespace, you could just call 'strip()'.
Since your goal is to print the text from after the '=', I'll assume
that stripping whitespace is desirable:

              Path = line.strip()
              tempStr=Path
              Path=tempStr.replace(mName+'=',"",1)

The statement 'tempStr=Path' doesn't do what you think it does.
It just creates an alternate name for the string pointed to by Path.
Further, there is no need to have an intermediate variable to hold a
value during transformation.  The right hand side is computed, using
the current values of any variables mentioned, and _then_ the left hand
side is rebound to point to the result of the computation.  So we can
just drop that line entirely, and use 'Path' in the 'replace' statement:

              Path = line.strip()
              Path = Path.replace(mName+'=',"",1)

However, you can also chain method calls, so really there's no need for
two statements here, since both calls are simple:

              Path = line.strip().replace(mName+'=',"",1)

To make things even simpler, Python has a 'split' function.  Given the
syntax of your input file I think we can assume that '=' never appears
in a variable name.  split returns a list of strings constructed by
breaking the input string at the split character, and it has an optional
argument that gives the maximum number of splits to make.  So by doing
'split('=', 1), we will get back a list consisting of the variable name
and the remainder of the line.  The remainder of the line is exactly
what you are looking for, and that will be the second element of the
returned list.  So now your loop body is:

              Path = line.strip().split('=', 1)[1]

and your whole loop looks like this:

    with open('MyTextFile.txt') as file:
        for line in file:
            if mName in line:
                Path = line.strip().split('=', 1)[1]

I think that looks pretty elegant.  Oh, and you might want to add a
'break' statement to the loop, and also an 'else:' clause (to the for
loop) so you can issue a 'not found' message to the user if they type
in a name that does not appear in the input file.

--RDM




More information about the Python-list mailing list