Regex single quotes in scraper script?

Terry Reedy tjreedy at udel.edu
Sat Jul 17 00:27:09 EDT 2004


"Unknown" <unknown at unknown.invalid> wrote in message
news:10fh3u3plns1n3b at corp.supernews.com...
> Hi, I started using a python based screen scraper called newsscraper I
> downloaded from sourceforge.
> http://sourceforge.net/projects/newsscraper/.   I have created many
python
> templates that work just fine from their examples however I ran into a
road
> block with sites that use single quotes instead of double quotes for
> specifying url in their web pages.
>
> For example:  <a href='http://www.foo/'>
>
> instead of the usual
>                      <a href="http://www.foo/">
>
> Being a real newbie with this I think I found the area of code that
parses
> the href.  It is in a file called parsefns.py
> the full excerpt is listed below but here is the regex line that I
believe
> is not dealing with single quote.
>
> m = re.search(r'href\s*=\s*"?([^>" ]+)["> ]', text, re.I)
>
> I have tried many different variations but no luck and no luck getting
hold
> of the author.  Any ideas?  Thx.

Did you try reversing all single and double quotes?  ie r"...'...'...'..."
If that doesn't work, you need someone else to answer.
A list of the variations not working might also help someone to answer.

TJR






More information about the Python-list mailing list