From gabriel.becedillas at gmail.com Fri Feb 5 03:39:19 2010 From: gabriel.becedillas at gmail.com (Gabriel Becedillas) Date: Thu, 4 Feb 2010 23:39:19 -0300 Subject: [Expat-discuss] Expat on 64 bit Linux Message-ID: <2D9F6906-A811-4567-B0D6-1C0CA53B2C35@gmail.com> Hi, I'm trying to use expat (the last version) on a 64 bit Linux (Intel). Both XML_UNICODE_WCHAR_T and XML_UNICODE are defined, and sizeof(wchar_t) == sizeof(XML_Char) == 4 bytes I understand that when my callbacks get called, each XML_Char should hold one character (at least that is the way it is working on a 32 bit Windows). Instead of that, what I'm getting is 2 characters in each XML_Char element. Like if it was treating a 4 byte XML_Char array as a 2 byte XML_Char array. Can anyone help me with this ? Thanks in advance. From karl at waclawek.net Fri Feb 5 14:16:23 2010 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 05 Feb 2010 08:16:23 -0500 Subject: [Expat-discuss] Expat on 64 bit Linux In-Reply-To: <2D9F6906-A811-4567-B0D6-1C0CA53B2C35@gmail.com> References: <2D9F6906-A811-4567-B0D6-1C0CA53B2C35@gmail.com> Message-ID: <4B6C1A27.9070906@waclawek.net> On 04/02/2010 9:39 PM, Gabriel Becedillas wrote: > Hi, > I'm trying to use expat (the last version) on a 64 bit Linux (Intel). > Both XML_UNICODE_WCHAR_T and XML_UNICODE are defined, and > sizeof(wchar_t) == sizeof(XML_Char) == 4 bytes > I understand that when my callbacks get called, each XML_Char should > hold one character (at least that is the way it is working on a 32 bit > Windows). > Instead of that, what I'm getting is 2 characters in each XML_Char > element. Like if it was treating a 4 byte XML_Char array as a 2 byte > XML_Char array. > Can anyone help me with this ? > Thanks in advance. Expat reports back to the application in either UTF-8 or UTF-16 encoding. The options above generate UTF-16 output by Expat: - XML_UNICODE: defines XML_LChar as char and XML_Char as unsigned short, which means that even if the application itself works with 8bit strings, it can still generate UTF-16 encoded XML output - XML_UNICODE_WCHAR_T: defines XML_LChar and XML_Char as wchar_t The second one requires wchar_t to be 16bit wide, so that UTF-16 can be generated. Expat does not support UTF-32 output. You should check the -fshort-wchar option for compiling. Karl From jeremy.kloth at gmail.com Fri Feb 5 15:28:04 2010 From: jeremy.kloth at gmail.com (Jeremy Kloth) Date: Fri, 5 Feb 2010 07:28:04 -0700 Subject: [Expat-discuss] Expat on 64 bit Linux In-Reply-To: <4B6C1A27.9070906@waclawek.net> References: <2D9F6906-A811-4567-B0D6-1C0CA53B2C35@gmail.com> <4B6C1A27.9070906@waclawek.net> Message-ID: <201002050728.04544.jeremy.kloth@gmail.com> On Friday 05 February 2010 06:16:23 am Karl Waclawek wrote: > The second one requires wchar_t to be 16bit wide, so that UTF-16 can be > generated. Expat does not support UTF-32 output. Is there an interest having Expat able to produce UCS-4/UTF-32? 4Suite and now Amara 2 use a patched Expat that can produce 32-bit Unicode values. It has been in use for quite some time now (as in many years) without issue. It was done to allow Expat output to be mapped directly to Python's unicode objects (which can be either UCS-2 or UCS-4). If desired, I can produce the patches required to add that support to the Expat mainline. -- Jeremy Kloth http://4suite.org