languages with full unicode support
David Hopwood
david.nospam.hopwood at blueyonder.co.uk
Wed Jun 28 07:03:05 EDT 2006
Tim Roberts wrote:
> "Xah Lee" <xah at xahlee.org> wrote:
>
>>Languages with Full Unicode Support
>>
>>As far as i know, Java and JavaScript are languages with full, complete
>>unicode support. That is, they allow names to be defined using unicode.
>>(the JavaScript engine used by FireFox support this)
>>
>>As far as i know, here's few other lang's status:
>>
>>C ? No.
>
> This is implementation-defined in C. A compiler is allowed to accept
> variable names with alphabetic Unicode characters outside of ASCII.
It is not implementation-defined in C99 whether Unicode characters are
accepted; only how they are encoded directly in the source multibyte character
set.
Characters escaped using \uHHHH or \U00HHHHHH (H is a hex digit), and that
are in the sets of characters defined by Unicode for identifiers, are required
to be supported, and should be mangled in some consistent way by a platform's
linker. There are Unicode text editors which encode/decode \u and \U on the fly,
so you can treat this essentially like a Unicode transformation format (it
would have been nicer to require support for UTF-8, but never mind).
C99 6.4.2.1:
# 3 Each universal character name in an identifier shall designate a character
# whose encoding in ISO/IEC 10646 falls into one of the ranges specified in
# annex D. 59) The initial character shall not be a universal character name
# designating a digit. An implementation may allow multibyte characters that
# are not part of the basic source character set to appear in identifiers;
# which characters and their correspondence to universal character names is
# implementation-defined.
#
# 59) On systems in which linkers cannot accept extended characters, an encoding
# of the universal character name may be used in forming valid external
# identifiers. For example, some otherwise unused character or sequence of
# characters may be used to encode the \u in a universal character name.
# Extended characters may produce a long external identifier.
--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>
More information about the Python-list
mailing list