Weird problem matching with REs

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun May 29 09:09:47 EDT 2011


On Sun, 29 May 2011 06:45:30 -0500, Andrew Berg wrote:

> I have an RE that should work (it even works in Kodos [1], but not in my
> code), but it keeps failing to match characters after a newline.

Not all regexes are the same. Different regex engines accept different 
symbols, and sometimes behave differently, or have different default 
behavior. That your regex works in Kodos but not Python might mean you're 
writing a Kodus regex instead of a Python regex.

> I'm writing a little program that scans the webpage of an arbitrary
> application and gets the newest version advertised on the page.

Firstly, most of the code you show is irrelevant to the problem. Please 
simplify it to the shortest, most simple example you can give. That would 
be a simplified piece of text (not the entire web page!), the regex, and 
the failed attempt to use it. The rest of your code is just noise for the 
purposes of solving this problem.

Secondly, you probably should use a proper HTML parser, rather than a 
regex. Resist the temptation to use regexes to rip out bits of text from 
HTML, it almost always goes wrong eventually.


> I was able to make a regex that matches in my code, but it shouldn't:
> http://x264.nl/x264/64bit/8bit_depth/revision.\n{1,3}[0-9]{4}.\n{1,3}/
x264.\n{1,3}.\n{1,3}.exe

What makes you think it shouldn't match?

By the way, you probably should escape the dots, otherwise it will match 
strings containing any arbitrary character, rather than *just* dots:

http://x264Znl ...blah blah blah



-- 
Steven



More information about the Python-list mailing list