Unicode normalisation [was Re: [beginner] What's wrong?]

Rustom Mody rustompmody at gmail.com
Fri Apr 8 14:20:12 EDT 2016


Adding link

On Friday, April 8, 2016 at 11:48:07 PM UTC+5:30, Rustom Mody wrote:
<Quote>
> 5.12 Deprecation
> 
> In the Unicode Standard, the term deprecation is used somewhat differently than it is in some other standards. Deprecation is used to mean that a character or other feature is strongly discouraged from use. This should not, however, be taken as indicating that anything has been removed from the standard, nor that anything is planned for removal from the standard. Any such change is constrained by the Unicode Consortium Stability Policies [Stability].
> 
> For the Unicode Character Database, there are two important types of deprecation to be noted. First, an encoded character may be deprecated. Second, a character property may be deprecated.
> 
> When an encoded character is strongly discouraged from use, it is given the property value Deprecated=True. The Deprecated property is a binary property defined specifically to carry this information about Unicode characters. Very few characters are ever formally deprecated this way; it is not enough that a character be uncommon, obsolete, disliked, or not preferred. Only those few characters which have been determined by the UTC to have serious architectural defects or which have been determined to cause significant implementation problems are ever deprecated. Even in the most severe cases, such as the deprecated format control characters (U+206A..U+206F), an encoded character is never removed from the standard. Furthermore, although deprecated characters are strongly discouraged from use, and should be avoided in favor of other, more appropriate mechanisms, they may occur in data. Conformant implementations of Unicode processes such a Unicode normalization must handle even deprecated characters correctly.

</Quote>

Link: http://unicode.org/reports/tr44/#Deprecation



More information about the Python-list mailing list