[Expat-discuss] Trouble building with XML_UNICODE_WCHAR_T

Karl Waclawek karl@waclawek.net
Wed, 11 Sep 2002 22:25:58 -0400


> Ah, there's the problem. The Linux box defines a 32-bit wchar_t... which
> seems excessive to me, but then again I don't speak any Asian languages!
> I'd prefer to stay 16-bit for internal purposes, but then none of my
> wide string handling functions would be available. 

Have you looked into the compiler option -fshort-wchar_t?
Should that not change your wide string handling functions to 16bit?
 
> I'll probably stick to Windows-based development for a while until I
> have more time to figure this out.

Expat differntiates between two types of strings:
1) XML_LChar: application strings like error messages, version string,
   feature descriptions
2) XML_Char: XML output

There are two UTF-16 compile options in Expat, which affect these strings
differently: 

- XML_UNICODE: defines XML_LChar as char and XML_Char as unsigned short,
  which means that even if the application itself works with 8bit strings,
  it can still generate UTF-16 encoded XML output
- XML_UNICODE_WCHAR_T: defines XML_LChar and XML_Char as wchar_t

The second one requires wchar_t to be 16bit wide - check the -fshort-wchar option.

> Is 32-bit wchar_t compatibility on the roadmap for expat? Is it anything
> I'd be able to help with?

I am not sure if compiling without the -fshort-wchar option would work.
If not, we would certainly appreciate it if you had a closer look.
Expat relies on volunteer contributions!!!

However, the main problem is that even if it works, output
would still be encoded as UTF-16, for which 32bit characters
are not appropriate. So, to really get 32bit wchar_t going,
one would need to add UTF-32 as another output encoding.
You are, of course, most welcome to make such a contribution. :-)

Karl