The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

BartC bc at freeuk.com
Sat Mar 12 08:18:00 EST 2016


On 12/03/2016 12:13, Marko Rauhamaa wrote:
> BartC <bc at freeuk.com>:
>
>> If you're looking at fast processing of language source code (in a
>> thread partly about efficiency), then you cannot ignore the fact that
>> the vast majority of characters being processed are going to have
>> ASCII codes.
>
> I don't know why you would optimize for inputting program source code.
> Text in general has left ASCII behind a long time ago. Just go to
> Wikipedia and click on any of the other languages.
>
> Why, look at the *English* page on Hillary Clinton:
>
>     Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/ (born
>     October 26, 1947) is an American politician.
>     <URL: https://en.wikipedia.org/wiki/Hillary_Clinton>
>
> You couldn't get past the first sentence in ASCII.

I saved that page locally as a .htm file in UTF-8 encoding. I ran a 
modified version of my benchmark, and it appeared that 99.7% of the 
bytes had ASCII codes. The other 0.3% presumably were multi-byte 
sequences, so that the actual proportion of Unicode characters would be 
even less.

I then saved the Arabic version of the page, which visually, when 
rendered, consists of 99% Arabic script. But the .htm file was still 80% 
ASCII!

So what were you saying about ASCII being practically obsolete ... ?

-- 
Bartc



More information about the Python-list mailing list