Validating regexp

Cameron Simpson cs at cskk.id.au
Tue Aug 8 22:03:08 EDT 2017


On 08Aug2017 17:31, Jon Ribbens <jon+usenet at unequivocal.eu> wrote:
>On 2017-08-08, Chris Angelico <rosuav at gmail.com> wrote:
>> On Wed, Aug 9, 2017 at 2:57 AM, Larry Martell <larry.martell at gmail.com> wrote:
>>> Yeah, it does not throw for 'A|B|' - but mysql chokes on it with empty
>>> subexpression for regexp' I'd like to flag it before it gets to SQL.
>>
>> Okay, so your definition of validity is "what MySQL will accept". In
>> that case, I'd feed it to MySQL and see if it accepts it. Regexps are
>> sufficiently varied that you really need to use the same engine for
>> validation as for execution.
>
>... but bear in mind, there have been ways of doing denial-of-service
>attacks with valid-but-nasty regexps in the past, and I wouldn't want
>to rely on there not being any now.

The ones I've seen still require some input length (I'm thinking exponential 
rematch backoff stuff here). I suspect that if your test query matches the RE 
against a fixed empty string it is hard to be exploited. i.e. I think most of 
this stuff isn't expensive in terms of compiling the regexp but in executing it 
against text.

Happy to hear to falsifications to my beliefs here.

Cheers,
Cameron Simpson <cs at cskk.id.au> (formerly cs at zip.com.au)



More information about the Python-list mailing list