[Baypiggies] regex puzzle

Shannon -jj Behrens jjinux at gmail.com
Wed Mar 12 10:07:19 CET 2008


On Tue, Mar 11, 2008 at 7:05 PM, Matt Good <matt at matt-good.net> wrote:
> On Mar 11, 2008, at 10:47 AM, Aaron Maxwell wrote:
>
>
> > I'm writing a tool with a command line interface, using the cmd
>  > module.  It
>  > has a "log search" command, which accepts a regex pattern as an
>  > argument.
>  >
>  > After tokenizing the input, I finally get what the user typed as the
>  > regex
>  > pattern into a variable search_str:
>  >
>  >  regex = re.compile(search_str)
>  >  return [line for line in loglines if regex.search(line)]
>  >
>  > The problem is that search_str is a variable of type str, not a raw
>  > string.
>  > So the user will have to escape many characters: e.g., "\\bREPO"
>  > instead
>  > of "\bREPO" as the pattern.
>
>  I think what you're seeing is due to how the cmd module parses the
>  command input.  It tries to behave like a Unix shell and treats \ as
>  an escape character as described here:
>  http://trac.edgewall.org/ticket/994
>
>  It's easy to workaround this by overriding Cmd.onecmd and replacing
>  the backslashes:
>  http://trac.edgewall.org/changeset/3513/trunk/trac/scripts/admin.py

Yes, Matt's onto something.

I wanted to clarify your statement, "The problem is that search_str is
a variable of type str, not a raw string."

Look at the following:

>>> r'foo'
'foo'
>>> 'foo'
'foo'
>>> raw_input() # going to type foo
foo
'foo'

Whether I use a raw string, a normal string, or raw_input(), what I
get is a str object.

Now, when I *embed a string into Python source code*, some escaping
happens.  If I use a raw string, less escaping happens.  If I use
raw_input(), no escaping happens.  In all cases, what I end up with is
a string.  Look at how the escaping works:

>>> print 'tab: (\t)'  # \t -> tab
tab: (  )
>>> print r'tab: (\t)'  # \t not converted
tab: (\t)
>>> print raw_input()  # raw_input() != typing a string in Python source code
tab: (\t)
tab: (\t)

What Matt seems to be saying is that the cmd module treats lines per
the UNIX shell convention and uses shell quoting rules.  That's very
different from Python quoting.

You still have a problem that you'll need to figure out, but hopefully
I've cleared up at least one thing ;)

Happy Hacking!
-jj

-- 
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/


More information about the Baypiggies mailing list