Instagram: 40% Py3 to 99% Py3 in 10 months (Posting On Python-List Prohibited)

Chris Angelico rosuav at gmail.com
Thu Jun 22 09:57:43 EDT 2017


On Thu, Jun 22, 2017 at 11:33 PM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> and besides some Unicode code points are not
> characters at all).
>
> http://www.unicode.org/faq/private_use.html#noncharacters

AIUI, "noncharacters" are like the IEEE floating point value
"not-a-number". If you ask for the type of it in Python, it's "float",
which is a numeric type. (It's funnier in JavaScript, where 'typeof
NaN' is "number".) They're completely well-defined in terms of pretty
much everything you would use a string for, the sole exception being
displaying it to a human (at which point a boatload of other
complexities kick in too, eg directionality (LTR/RTL), combining
characters, fonts lacking certain glyphs, text wrapping, etc). So a
character count should normally *include* any noncharacters in the
string.

But honestly, I don't know where a character count is the right choice
of measurement. If you're limiting the size of user input, you
probably want to count codepoints (so people don't just put five
billion combining characters onto a single base), and if you're going
to count combined characters, you often want to be measuring in glyphs
(or maybe pixels) so it actually corresponds to the displayed text.
Got any examples of where you want to count characters? And if so, do
those situations govern the definition of "character"?

ChrisA



More information about the Python-list mailing list