I think for a quick hack, this is as good as a parser. A simple parser would miss some cases as well. RE are nearly not extendable though, so your critic is valid. The point is, what George wants to do. A mixture would be possible as well: Getting all <a ...> by a RE and then extracting the url with something like a parser.