When not to use an RE -- an example

John Machin sjmachin at lexicon.net
Sat Apr 19 21:53:50 EDT 2003


On 19 Apr 2003 18:15:56 -0700, imcmeans at telus.net (Ian McMeans) wrote:

>How about using a dictionary? I wish we had dictionary comprehensions
>:(
>
>>>> user_input = " ZZZZZZZ\n\n"
>>>> d = dict( [(x, None) for x in list(user_input.strip())] )
>>>> len(d.keys())
>1
>

I don't understand what your dictionary solution is doing. My need is
to detect a large list of possible junk strings -- say 95 printable
ASCII characters by say 30 characters width of the text field; loading
all of those into a dictionary is not exactly an elegant solution.

>I don't really understand your regular expression...
>"^(?:(.)(?=\1))+\1\Z"
>Why the lookahead assertion following the first capture group,
>followed by the capture group outside the parantheses? It seems crazy
>to me.

Crazily overcomplicated, as already pointed out by Alexander, yes. But
it worked. Whether the groups are capture groups or not is irrelevant
to the meaning.

>
>".{2,}" is repeated characters, ".{2,}\s*$" is repeated characters
>followed by optional whitespace, with nothing following it.

".{2,}" does *not* match repeated characters. It matches *any* string
of length two or more.

All of the above  strengthens my original implicit proposition: don't
*try* to use an RE when you don't need to.

Of course there will always be the folks who think that Friedl's big
e-mail address matcher is a shining example of software engineering
good practice :-)








More information about the Python-list mailing list