parsing "&A" in a string..

Tino Wildenhain tino at wildenhain.de
Mon Sep 1 00:45:56 EDT 2008


Tim Roberts wrote:
> "bruce" <bedouglas at earthlink.net> wrote:
>> it's the beautifulsoup() that's taking the "&E" and giving the "&E;"...
> 
> Right, as it should.  "A&E" is not valid HTML, and beautifulsoup expects
> valid HTML.
> 
> This can be difficult to fix in the general case, because your page might
> already contain "&".  If it is possible that some of them might be
> wrong while some are right, you can do something like:
> 
>     s = s.replace( '&', '&' ).replace( '&', '&' )

Yeah, but what about ä and friend then? As you said, its not really
easy to fix.

Regards
Tino
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3241 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20080901/5f6bbb0e/attachment-0001.bin>


More information about the Python-list mailing list