exhaustive mapping from html entities to unicode ?

Steven Taschuk staschuk at telusplanet.net
Fri Mar 7 14:53:04 EST 2003


Quoth shagshag13:
  [...]
> for example : why could i write these @ dos prompt while idle send me a
> "UnicodeError: ASCII encoding error: ordinal not in range(128)" ?
> 
> dos = {'Œ' : 'O', 'œ' : 'o', 'Š' : 'S', 'š' : 's',
  [...]
> and if i write this as unicode string, i won't be able to print them @ dos
> prompt...

I'm not sure what you're trying to do.

What does "write these @ dos prompt" mean?  You can type them in
at your DOS prompt?  You can print them from a script invoked from
the command line?

What do you mean by "write this as unicode string"?  "This" is
presumably the dictionary you've shown; do you convert it somehow
to a Unicode string and try to print it?

If you show us code which is failing, we will be better able to
help you.

> also is there a way other that checking <?xml version="1.0"
> encoding="'iso-8859-1'"?> to know, from a dowloaded web page, its encoding ?

Lots of HTML is not XML and won't have an XML declaration.

The (pre-XML) HTML 4.01 spec mandates the use of the HTTP
Content-Type header for this purpose, or, failing that, the <meta
http-equiv='Content-Type' ...> element.

See <http://www.w3.org/TR/html4/charset.html>.

-- 
Steven Taschuk                                     staschuk at telusplanet.net
Receive them ignorant; dispatch them confused.  (Weschler's Teaching Motto)





More information about the Python-list mailing list