exhaustive mapping from html entities to unicode ?
Steven Taschuk
staschuk at telusplanet.net
Fri Mar 7 14:53:04 EST 2003
Quoth shagshag13:
[...]
> for example : why could i write these @ dos prompt while idle send me a
> "UnicodeError: ASCII encoding error: ordinal not in range(128)" ?
>
> dos = {'Œ' : 'O', 'œ' : 'o', 'Š' : 'S', 'š' : 's',
[...]
> and if i write this as unicode string, i won't be able to print them @ dos
> prompt...
I'm not sure what you're trying to do.
What does "write these @ dos prompt" mean? You can type them in
at your DOS prompt? You can print them from a script invoked from
the command line?
What do you mean by "write this as unicode string"? "This" is
presumably the dictionary you've shown; do you convert it somehow
to a Unicode string and try to print it?
If you show us code which is failing, we will be better able to
help you.
> also is there a way other that checking <?xml version="1.0"
> encoding="'iso-8859-1'"?> to know, from a dowloaded web page, its encoding ?
Lots of HTML is not XML and won't have an XML declaration.
The (pre-XML) HTML 4.01 spec mandates the use of the HTTP
Content-Type header for this purpose, or, failing that, the <meta
http-equiv='Content-Type' ...> element.
See <http://www.w3.org/TR/html4/charset.html>.
--
Steven Taschuk staschuk at telusplanet.net
Receive them ignorant; dispatch them confused. (Weschler's Teaching Motto)
More information about the Python-list
mailing list