[XML-SIG] Re: Issues with Unicode type

Daniel Veillard veillard@redhat.com
Mon, 23 Sep 2002 17:26:26 -0400


On Mon, Sep 23, 2002 at 10:50:34PM +0200, Eric van der Vlist wrote:
> Except that it's not the only location where it's broken and that won't
> work with regular expressions. If I define a pattern such as ".{5}" I
> want to check that this is 5 unicode characters, not 5 words of 16
> bits...

  I don't know about Relax regexp, but for schemas I had to rewrite
an engine to cope with the full regexps of the beast.

> I am starting to think that compiling Python for 32 bits might be the
> safest way to solve this issue.

  You can't make that assumption, it's the safest for your developper
but becomes an user nightmare. If you develop a library I assume
it's ultimately to have people use it, if they first need to recompile
python and handle multiple version, it's a serious mess.

> Can you confirm that this is what RedHat does by default as mentioned
> Uche and do you know the motivations (and eventually downsides) for this
> decision?

  By default Red Hat compiles python with unicode support in UTF-16.
I'm not in charge of this, I assume it's the default compilation option.

IMHO it's a wrong assumption to think that UTF16 is a good cut, because
you end up with variable lenght encoding anyway, and UCS32 would seriously
bloat the app I'm afraid.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/