[Baypiggies] quick question: regex to stop naughty control characters

Shannon -jj Behrens jjinux at gmail.com
Thu Apr 26 20:38:06 CEST 2007


On 4/25/07, Dennis Reinhardt <DennisR at dair.com> wrote:
> At 12:44 PM 4/25/2007, Shannon -jj Behrens wrote:
> >using FormEncode with a regex that looks like r".{1,128}$".
>
> First we have to re-write this.  This regex matches on any string having
> between 1 and 128 characters immediately preceding the end of string.  So,
> a string 10 characters long would match because a length 10 string has 1
> character preceding end.  A string 1000 characters long would match as well.

"re.match" implies "^", whereas "re.search" doesn't ;)

> A regex of r"^.{1,128}$" would match only when there are between 1 and 128
> characters and no more or no less between start and end.
>
> Now, we can define exceptions which prevent matching.  A regex of
> r"^[^\r\n]{1,128}$" will match the same size strings but fail if either an
> embedded CR or LF is included.  The extension to other control characters
> is easy.

Yeah, but I don't have a perfectly authoritative set.  However, I know
that Unicode does.

> Yeah, I know I used "^" twice to mean two different things (start of string
> and not, respectively).  That is just regex language, not a mistake on my part.

I know ;)

Thanks,
-jj

-- 
http://jjinux.blogspot.com/


More information about the Baypiggies mailing list