rejecting newlines with re.match

MRAB google at mrabarnett.plus.com
Thu Nov 27 08:52:42 EST 2008


r0g wrote:
> Hi,
> 
> I want to use a regex to match a string "poo" but not "poo\n" or
> "poo"+chr(13) or "poo"+chr(10) or "poo"+chr(10)+chr(13)
> 
"\n" is the same as chr(10).

> According to http://docs.python.org/library/re.html
> 
> '.' (Dot.) In the default mode, this matches any character except a
> newline. If the DOTALL flag has been specified, this matches any
> character including a newline.
> 
> 
> So I tried
> a = re.compile(r'^.{1,50}$')
> print a.match("poo\n")
> <_sre.SRE_Match object at 0xb7767988>
> 
> :-(
> 
> The library says...
> 
> '$' Matches the end of the string or just before the newline at the end
> of the string, and in MULTILINE mode also matches before a newline. foo
> matches both ‘foo’ and ‘foobar’, while the regular expression foo$
> matches only ‘foo’. More interestingly, searching for foo.$ in
> 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode;
> searching for a single $ in 'foo\n' will find two (empty) matches: one
> just before the newline, and one at the end of the string.
> 
> 
> So that explains it but what am I to do then? I assume it isn't matching
> the newline itself as the returned string does not contain one but is
> there a switch that can stop $ matching 'just before the newline at the
> end of the string' or is there another character class I could use here?
> Any ideas greatly appreciated!
> 
There is also "\Z" which matches only at the end of the string:

 >>> a = re.compile(r'^.{1,50}\Z')
 >>> print a.match("poo\n")
None
 >>>

I don't know what your use case is, but do you actually need to use 
regex? Sometimes is simpler and faster if you don't.



More information about the Python-list mailing list