error when parsing xml

Diez B. Roggisch deets at nospam.web.de
Mon Sep 5 09:06:44 EDT 2005


> I have found that some people refuse to stick to standards, so whenever I
> parse XML files I remove any characters that fall in the range 
> <= 0x1f
> 
>>= 0xf0

Now of what help shall that be? Get rid of all accented characters? 
Sorry, but that surely is the dumbest thing to do here - and has 
_nothing_ to do with standards! Charactersets with codepoints > 128 are 
pretty common and well standarized, just not "ascii". I suggset you read 
up on the topic of unicode & encodings a bit - and then fix some code of 
yours...

Diez



More information about the Python-list mailing list