Translating unicode data

Mon Mar 23 19:30:31 EDT 2009

CaptainMcCrank wrote:
> Hi list,
> 
> I'm struggling with a problem analyzing large amounts of unicode data
> in an http wireshark capture.
> I've solved the problem with the interpreter, but I'm not sure how to
> do this in an automated fashion.
> 
> I'd like to grab a line from a text file & translate the unicode
> sections of it to ascii.  So, for example
> I'd like to take
> "\u003cb\u003eMar 17\u003c/b\u003e"
> 
> and turn it into
> 
> "<b>Mar 17</b>"
> 
> I can handle this from the interpreter as follows:
> 
>>>> import unicodedata
>>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e"
>>>> print mystring
> <b>Mar 17</b>
> 
> But I don't know what I need to do to automate this!  The data that is
> in the quotes from line 2 will have to come from a variable.  I am
> unable to figure out how to do this using a variable rather than a
> literal string.
> 
> Please help!
> 

You really need to say what version of Python you are working with,
how the code you tried, and the results you got.
Using Python 3.1, I get:
     >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'
     True

--Scott David Daniels
Scott.Daniels at Acm.Org