Python was designed (was Re: Multi-threading in Python vs Java)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Oct 26 00:46:12 EDT 2013


On Fri, 25 Oct 2013 19:05:09 +0100, Mark Lawrence wrote:

> On 25/10/2013 07:14, wxjmfauth at gmail.com wrote:
> 
>> Use one of the coding schemes endorsed by Unicode.
> 
> As I personally know nothing about unicode for the unenlightened such as
> myself please explain this statement with respect to the fsr.

Please don't encourage JMF. You know he'll just continue with his 
ridiculous vendetta against Python 3.3's Unicode handling.


>> If a dev is not able to see a non ascii char may use 10 bytes more than
>> an ascii char
> 
> Are you saying that an ascii char takes a byte but a non ascii char
> takes up to 11?  

He's talking about the fact that strings in Python are objects, and hence 
carry a certain amount of overhead. Just to prove it's not specific to 
Python 3.3, or Unicode, here's an empty byte-string in 2.6:

py> sys.getsizeof('')
24

On the other hand, this overhead becomes trivial as the string gets 
bigger:

py> sys.getsizeof('x'*10**6)
1000024


Unicode is no different. Here is the hated 3.3 again:

py> sys.getsizeof('')  # Unicode, not byte-string
25
py> sys.getsizeof('ó'*10**6)
1000037


Again, a totally trivial amount of overhead. If you aren't willing to pay 
that overhead for the convenience of an OOP language like Python, you 
shouldn't be using an OOP language like Python.



-- 
Steven



More information about the Python-list mailing list