[Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

Steven D'Aprano steve at pearwood.info
Thu Aug 10 09:01:18 EDT 2017


On Mon, Aug 07, 2017 at 10:04:21PM -0500, Zachary Ware wrote:

> Next, take a dive into the wonderful* world of Unicode:
> 
> https://nedbatchelder.com/text/unipain.html
> https://www.youtube.com/watch?v=7m5JA3XaZ4k

Another **Must Read** resource for unicode is:

The Absolute Minimum Every Software Developer Absolutely Positively Must 
Know About Unicode (No Excuses!)

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

(By the way, it is nearly 14 years later, and PHP still believes that 
the world is ASCII.)


Python 3 makes Unicode about as easy as it can get. To include a unicode 
string in your source code, you just need to ensure your editor saves 
the file as UTF-8, and then insert (by whatever input technology you 
have) the character you want. You want a Greek pi?

pi = "π"

How about an Israeli sheqel?

money = "₪1000"

So long as your editor knows to save the file in UTF-8, it will Just 
Work.


-- 
Steve


More information about the Tutor mailing list