The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)
BartC
bc at freeuk.com
Sat Mar 12 08:18:00 EST 2016
On 12/03/2016 12:13, Marko Rauhamaa wrote:
> BartC <bc at freeuk.com>:
>
>> If you're looking at fast processing of language source code (in a
>> thread partly about efficiency), then you cannot ignore the fact that
>> the vast majority of characters being processed are going to have
>> ASCII codes.
>
> I don't know why you would optimize for inputting program source code.
> Text in general has left ASCII behind a long time ago. Just go to
> Wikipedia and click on any of the other languages.
>
> Why, look at the *English* page on Hillary Clinton:
>
> Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/ (born
> October 26, 1947) is an American politician.
> <URL: https://en.wikipedia.org/wiki/Hillary_Clinton>
>
> You couldn't get past the first sentence in ASCII.
I saved that page locally as a .htm file in UTF-8 encoding. I ran a
modified version of my benchmark, and it appeared that 99.7% of the
bytes had ASCII codes. The other 0.3% presumably were multi-byte
sequences, so that the actual proportion of Unicode characters would be
even less.
I then saved the Arabic version of the page, which visually, when
rendered, consists of 99% Arabic script. But the .htm file was still 80%
ASCII!
So what were you saying about ASCII being practically obsolete ... ?
--
Bartc
More information about the Python-list
mailing list