xml processing and sys.setdefaultencoding (more info)

christof hoeke csad7 at yahoo.com
Sun Jul 20 18:15:53 EDT 2003


hi,
first thanks for the infos. i need to try the encoding declaration in the
python module.

some more details about the problem i had (regarding the posts by Alan and
Martin):

the original problem with the app was that the Pyana transformation
complained about the string "xml" when it came over as unicode. so i used
str(xml) but that gave the usual "ordinal not in range" error when the xslt
contained e.g. german umlauts. i did not tried that before...
setting the default encoding to utf-8 fixed that. the reason is not entirely
clear to me yet though.

- the used xslt stylesheets should have been in utf-8 as i did not state an
encoding explicitly
- xslt with latin-1 (iso8859-1) encoding should work too though
- xslt contains german umlauts öäü etc.
- i did extract parts of the xslt in python strings, yes

i read the other threads about unicode and also about PEP 0263. i have not
tried to set the encoding of the python file yet. but sounds promising.
i am wondering though, if i set the python file encoding to e.g. utf-8 and
then use a stylesheet with, lets say latin-1 encoding, i still have a
mismatch, havn't i?

if you are interested in the code, download it from
http://cthedot.de/pyxsldoc/
it is my first "bigger" python project, so the code is not the best i guess
and the version which does not work is still online. i need to put on the
version with the changed default encoding.

chris



christof hoeke wrote:
> hi,
> i wrote a small application which extracts a javadoc similar
> documentation for xslt stylesheets using python, xslt and pyana.
> using non-ascii characters was a problem. so i set the defaultending
> to UTF-8 and now everything works (at least it seems so, need to do
> more testing though).
>
> it may not be the most elegant solution (according to python in a
> nutshell) but it almost seems when doing xml processing it is
> mandatory to set the default encoding. xml processing should almost
> only work with unicode strings and this seems the easiest solution.
>
> any comments on this? better ways to work
>
> thanks
> chris






More information about the Python-list mailing list