[I18n-sig] UTF-8 decoder in CVS still buggy

François Pinard pinard@iro.umontreal.ca
02 Sep 2000 09:34:51 -0400


[mal@lemburg.com]

> Please keep us informed of any quirks you may experience during this
> conversion.  We can use some real life reports for the new Unicode
> support in Python to polish up the implementation and design.

Hi, people.  I just recently subscribed to i18n-sig, and started to
read the archives.  Let me hope you will tolerate that I jump in some
conversations without having matured all the background.

On the above topic, I did not check what Python exactly does, but I wanted to
share that my `recode' program is not perfect in that area.  In particular,
there is a requirement for UTF-8 to be valid that the sequence be minimal,
which `recode' currently does not check on input.  Roughly said, an UTF-8
sequence is not valid if it could have been expressed in fewer bytes.

I've nothing against Python beating me at it! :-)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard