N00b question: matching stuff with variables.

Stephen Hansen me+list/python at ixokai.io
Mon Jun 28 13:49:59 EDT 2010


On 6/28/10 10:29 AM, Ken D'Ambrosio wrote:
> Hi, all.  I've got a file which, in turn, contains a couple thousand
> filenames.  I'm writing a web front-end, and I want to return all the
> filenames that match a user-input value.  In Perl, this would be something
> like,
>
> if (/$value/){print "$_ matches\n";}
>
> But trying to put a variable into regex in Python is challenging me --
> and, indeed, I've seen a bit of scorn cast upon those who would do so in
> my Google searches ("You come from Perl, don't you?").

First of all, if you're doing this, you have to be aware that it is 
*very* possible to write a pathological regular expression which will 
can kill your app and maybe your web server.

So if you're letting them write regular expressions and they aren't 
like, smartly-trusted-people, be wary.

> Here's what I've got (in ugly, prototype-type code):
>
> file=open('/tmp/event_logs_listing.txt' 'r')   # List of filenames
> seek = form["serial"].value                    # Value from web form
> for line in file:
>     match = re.search((seek)",(.*),(.*)", line) # Stuck here

Now, if you don't need the full power of regular expressions, then what 
about:

     name, foo, bar = line.split(",")
     if seek in name:
         # do something with foo and bar

That'll return True if the word 'seek' appears in the first field of 
what appears to be the comma-delimited line.

Or maybe, if you're worried about case sensitivity:

     if seek.lower() in name.lower():
         # la la la

You can do a lot without ever bothering to mess around with regular 
expressions. A lot.

Its also faster if you're doing simpler things :)

If they don't need to do the full power of regular expressions, but just 
simple globbing? Then maybe change it to:

    seeking = re.escape(seek)
    seeking = seeking.replace("\\*", ".*")
    seeking = seeking.replace("\\?", ".")

    match = re.search(seeking + ",(.*),(.*)", line)

FIrst, we escape the user input so they can't put in any crazy regular 
expression characters. Then we go and *unescape* "\\*" and turn it into 
a ".*" -- becaues when a user enters "*", the really mean ".*" in 
traditional glob-esque. Then we do the same with the question mark 
turning into a dot.

Then! We go and run our highly restricted regular expression through 
basically just as you were doing before-- you just didn't concatenate 
the 'seek' string to the rest of your expression.

If you must give them full regex power and you know they won't try to 
bomb you, just leave out the 'seeking = ' lines and cross your fingers.


-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/




More information about the Python-list mailing list