[Python-Dev] Unicode 8.0 and 3.5

MRAB python at mrabarnett.plus.com
Fri Jun 19 02:55:07 CEST 2015


On 2015-06-19 00:56, Steven D'Aprano wrote:
> On Thu, Jun 18, 2015 at 08:34:14PM +0100, MRAB wrote:
>> On 2015-06-18 19:33, Larry Hastings wrote:
>> >On 06/18/2015 11:27 AM, Terry Reedy wrote:
>> >>Unicode 8.0 was just released.  Can we have unicodedata updated to
>> >>match in 3.5?
>> >>
>> >
>> >What does this entail?  Data changes, code changes, both?
>> >
>> It looks like just data changes.
>
> At the very least, there is a change to the casefolding algorithm.
> Cherokee was classified as unicameral but is now considered bicameral
> (two cases, like English). Unusually, case-folding Cherokee maps to
> uppercase rather than lowercase.
>
Doesn't the case-folding just depend on the data and the algorithm
remains the same?

> The full set of changes is listed here:
>
> http://unicode.org/versions/Unicode8.0.0/
>
> Apart from the addition of 7716 characters and changes to
> str.casefold(), I don't think any of the changes will make a big
> difference to Python's implementation. But it would be good to support
> Unicode 8 (to the degree that Python actually does support Unicode,
> rather than just that character set part of it).
>
>
>> There are additional codepoints and a renamed property (which the
>> standard library doesn't support anyway).
>
> Which one are you referring to, Indic_Matra_Category renamed to
> Indic_Positional_Category?
>
Yes.



More information about the Python-Dev mailing list