[Mailman-Developers] Topic regexps

Mark Sapiro msapiro at value.net
Fri May 26 05:29:33 CEST 2006


Barry Warsaw wrote:
>
>On Tue, 2006-05-23 at 08:00 -0700, Mark Sapiro wrote:
>
>> The regexp entry is compiled in re.VERBOSE mode. This can be good, but
>> also causes problems as the 'help' doesn't mention this. Further, the
>> description says 'Topic keywords, one per line, to match against each
>> message.' The implication is if I put
>>
>>  one
>>  two
>>  three
>>
>> in the regexp box, that this topic will match keywords 'one', 'two' or
>> 'three', but actually it matches only 'onetwothree'.
>>
>> I see several ways to address this.
>>
>> 1) change the processing of this field to effectively join the lines
>> with '|' - presumably this will break existing multiline entries, but
>> possibly they (at least ones which already contain '|') can be
>> converted.
>>
<snip>
>
>I have a hard time imagining that anyone would enter
>
>one
>two
>three
>
>and not expect it to match 'one|two|three', so I think I'd opt for 1.
>I'm not in favor of yet another configuration variable to control this.
>OTOH, I've never really received much feedback on the whole topics
>features (thus the dearth of responses to your question ;) so I don't
>really have a good sense of how people are using this, if they are at
>all.


I've thought about this some more and what I'm currently thinking is if
the topic regexp is multiline, leave it as is in topics, but before
compiling it for use, split the lines and then rejoin them with "|",
and compile not in VERBOSE mode.

I think this would be the natural interpretation from the existing
explanation.

Then, in order to handle existing multiline topic regexps without
breaking them, add code to versions.py to test
stored_state.data_version and if it's less than the appropriate value,
go through topics and convert any multiline regexp to a single line by
deleting unescaped whitespace and comments.


>I'm not sure the verbose interpretation of the text box is the most
>useful.  The other option is to use some special prefix character at the
>front of the regexp to indicate whether it should be verbose or not.  It
>would have to be something that is impossible in the first position, and
>it seems like | would be a good choice.  Thus if | were in the first
>position, you'd interpret that to mean each line should be joined with |
>but if not, then you interpret the entire regexp as a verbose pattern.


This seems to me to be more of a kludge than my idea of converting the
old multiline regexps and just dropping verbose mode all together.

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Developers mailing list