[New-bugs-announce] [issue30717] str.center() is not unicode aware

Guillaume Sanchez report at bugs.python.org
Tue Jun 20 15:15:22 EDT 2017


New submission from Guillaume Sanchez:

"a⃑".center(width=5, fillchar=".")
produces
'..a⃑.' instead of '..a⃑..'

The reason is that "a⃑" is composed of two code points (2 UCS4 chars), one 'a' and one combining code point "above arrow". str.center() counts the size of the string and fills it both sides with `fillchar` until the size reaches `width`. However, this size is certainly intended to be the number of characters and not the number of code points.

The correct way to count characters is to use the grapheme clustering algorithm from UAX TR29.

Turns out I implemented this myself already, and might do the PR if asked so, with a little help to make the C <-> Python glue.

Thanks for your time.

----------
components: Library (Lib)
messages: 296478
nosy: Guillaume Sanchez
priority: normal
severity: normal
status: open
title: str.center() is not unicode aware
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30717>
_______________________________________


More information about the New-bugs-announce mailing list