[Tutor] simple regex question

Danny Yoo dyoo at hashcollision.org
Mon May 2 21:54:22 EDT 2016


On Sun, May 1, 2016 at 9:49 AM, bruce <badouglas at gmail.com> wrote:

> I've created a test regex. However, after spending time/google.. can't
> quite figure out how to then get the "complete" line containing the
> returned regex/pattern.
>
> Pretty sure this is simple, and i'm just missing something.


A few people have mentioned "beautiful soup"; I agree: you should look
into using that instead of regular expressions alone.

The docs for beautiful soup are pretty good, and should help you on your way:

    https://www.crummy.com/software/BeautifulSoup/bs4/doc/

This is not to say that regular expressions are useless.  Far from it!
 You can tell beautiful soup to search with regexes:

    https://www.crummy.com/software/BeautifulSoup/bs4/doc/#a-regular-expression
    https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all

For your case, you can probably say something like:

    soup.find_all(id=pattern)

where the pattern is precisely the regex in your original program.
You can then get the results back as structured portions of the HTML
tree.  The point is that if you use a parser that understands HTML,
you can do table-row-oriented things without having to worry about the
actual string lines.


That's often a much better situation than trying to deal with a flat
string and trying to use regular expressions to parse tree structure.
You do not want to write code that contributes to the summoning of the
Nameless One.  (http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454,
http://blog.codinghorror.com/parsing-html-the-cthulhu-way/)


More information about the Tutor mailing list