f python?

Alex Mizrahi alex.mizrahi at gmail.com
Mon Apr 9 02:51:33 EDT 2012


>> Ok no problem. My sloppiness. After all, my implementation wasn't
>> portable. So, let's fix it. After a while, discovered there's the
>> os.sep. Ok, replace "/" to os.sep, done. Then, bang, all hell
>> went lose. Because, the backslash is used as escape in string, so any
>> regex that manipulate path got fucked majorly. So, now you need to
>> find a quoting mechanism.
>
> 	if os.altsep is not None:
> 	    sep_re = '[%s%s]' % (os.sep, os.altsep)
> 	else:
> 	    sep_re = '[%s]' % os.sep
>
> But really, you should be ranting about regexps rather than Python.
> They're convenient if you know exactly what you want to match, but a
> nuisance if you need to generate the expression based upon data which is
> only available at run-time (and re.escape() only solves one very specific
> problem).

It isn't a problem of regular expressions, but a problem of syntax for 
specification of regular expressions (i.e. them being specified as a 
string).

Common Lisp regex library cl-ppcre allows to specify regex via a parse 
tree. E.g. "(foo[/\\]bar)" becomes

(:REGISTER (:SEQUENCE "foo" (:CHAR-CLASS #\/ #\\) "bar"))

This is more verbose, but totally unambiguous and requires no escaping.

So this definitely is a problem of Python's regex library, and a problem 
of lack of support for nice parse tree representation in code.

cl-ppcre supports both textual perl-compatible regex specification and 
parse tree. I would start with a simple string specification, then when 
shit hits fan I can call cl-ppcre::parse-string to get those parse trees 
and replaces forward slash with back slash. Moreover, I can 
automatically convert regexes:

(defun scan-auto/ (regex target-string)
    (let ((fixed-parse-tree (subst '(:char-class #\/ #\\) '(:char-class #\/)
                                   (cl-ppcre::parse-string regex)
                                   :test 'equal)))
     (cl-ppcre:scan-to-strings fixed-parse-tree target-string)))


CL-USER> (scan-auto/ "foo[/]bar" "foo\\bar")
"foo\\bar"
#()



More information about the Python-list mailing list