Could you verify this, Oh Great Unicode Experts of the Python-List?
Joshua Landau
joshua at landau.ws
Sun Aug 11 02:17:42 EDT 2013
Basically, I think Twitter's broken.
For my full discusion on the matter, see:
http://www.reddit.com/r/learnpython/comments/1k2yrn/help_with_len_and_input_function_33/cbku5e8
Here's the first post of mine, ineffectually edited for this list:
"""
<strikethrough>The obvious solution [to getting the length of a tweet]
is wrong. Like, slightly wrong¹.</strikethrough>
Given tweet = b"caf\x65\xCC\x81".decode():
>>> tweet
'café'
But:
>>> len(tweet)
5
So the solution is:
>>> import unicodedata
>>> len(unicodedata.normalize("NFC", tweet))
4
<strikethrough>Read twitter's commentary¹ for proof.</strikethrough>
<strikethrough>There are additional complications I'm trying to sort
out.</strikethrough>
________________________________
After further testing (I don't actually use Twitter) it seems the
whole thing was just smoke and mirrors. The linked article is a lie,
at least on the user's end.
On Linux you can prove this by running:
>>> p = subprocess.Popen(['xsel', '-bi'], stdin=subprocess.PIPE)
>>> p.communicate(input=b"caf\x65\xCC\x81")
(None, None)
"café" will be in your Copy-Paste buffer, and you can paste it in to
the tweet-box. It takes 5 characters. So much for testing ;).
________________________________
¹ https://dev.twitter.com/docs/counting-characters#Definition_of_a_Character
"""
I know this isn't *really* Python-related, but there's Python involved
and you're the sort of people who'll be able to tell me what I've done
wrong, if anything.
More information about the Python-list
mailing list