languages with full unicode support

David Hopwood david.nospam.hopwood at blueyonder.co.uk
Wed Jun 28 07:03:05 EDT 2006


Tim Roberts wrote:
> "Xah Lee" <xah at xahlee.org> wrote:
> 
>>Languages with Full Unicode Support
>>
>>As far as i know, Java and JavaScript are languages with full, complete
>>unicode support. That is, they allow names to be defined using unicode.
>>(the JavaScript engine used by FireFox support this)
>>
>>As far as i know, here's few other lang's status:
>>
>>C ? No.
> 
> This is implementation-defined in C.  A compiler is allowed to accept
> variable names with alphabetic Unicode characters outside of ASCII.

It is not implementation-defined in C99 whether Unicode characters are
accepted; only how they are encoded directly in the source multibyte character
set.

Characters escaped using \uHHHH or \U00HHHHHH (H is a hex digit), and that
are in the sets of characters defined by Unicode for identifiers, are required
to be supported, and should be mangled in some consistent way by a platform's
linker. There are Unicode text editors which encode/decode \u and \U on the fly,
so you can treat this essentially like a Unicode transformation format (it
would have been nicer to require support for UTF-8, but never mind).


C99 6.4.2.1:

# 3 Each universal character name in an identifier shall designate a character
#   whose encoding in ISO/IEC 10646 falls into one of the ranges specified in
#   annex D. 59) The initial character shall not be a universal character name
#   designating a digit. An implementation may allow multibyte characters that
#   are not part of the basic source character set to appear in identifiers;
#   which characters and their correspondence to universal character names is
#   implementation-defined.
#
# 59) On systems in which linkers cannot accept extended characters, an encoding
#     of the universal character name may be used in forming valid external
#     identifiers. For example, some otherwise unused character or sequence of
#     characters may be used to encode the \u in a universal character name.
#     Extended characters may produce a long external identifier.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>



More information about the Python-list mailing list