Replacing utf-8 characters

Steve Holden steve at holdenweb.com
Wed Oct 5 15:08:39 EDT 2005


Unknown wrote:
> For example this is what I am trying to do that is not working.
> 
> The contents of link is the reuters web page, containing
> 
> "/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml"
> 
> link = link.replace('&','&')
> 
> But if I now view the the contents link it shows it the same as when it 
> was assigned.
> 
> 
> 
> 
> Richard Brodie wrote:
> 
>>"Mike" <no at spam> wrote in message news:1128522921.72009 at nntp.acecape.com...
>>
>>
>>
>>>However when I pull it into python the URL ends up looking like this
>>>(notice the & instead of just & in the URL)
>>>
>>>Any ideas?
>>
>>
>>Some code would be helpful: the "&" is in the page source to start
>>with (which is as it ought to be). What are you using to parse the HTML?
>>
>>
You must be doing *something* wrong:

  >>> link = 
"/news/newsArticle.aspx?type=businessNews&amp;storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml"
  >>> link = link.replace('&amp;','&')
  >>> link
'/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml'
  >>>

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/




More information about the Python-list mailing list