The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Sat Mar 12 08:40:11 EST 2016

BartC <bc at freeuk.com>:

> On 12/03/2016 12:13, Marko Rauhamaa wrote:
>> BartC <bc at freeuk.com>:
>>
>>> If you're looking at fast processing of language source code (in a
>>> thread partly about efficiency), then you cannot ignore the fact
>>> that the vast majority of characters being processed are going to
>>> have ASCII codes.
>>
>> I don't know why you would optimize for inputting program source
>> code. Text in general has left ASCII behind a long time ago. Just go
>> to Wikipedia and click on any of the other languages.
>>
>> Why, look at the *English* page on Hillary Clinton:
>>
>>     Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/
>>     (born October 26, 1947) is an American politician. <URL:
>>     https://en.wikipedia.org/wiki/Hillary_Clinton>
>>
>> You couldn't get past the first sentence in ASCII.
>
> I saved that page locally as a .htm file in UTF-8 encoding. I ran a
> modified version of my benchmark, and it appeared that 99.7% of the
> bytes had ASCII codes. The other 0.3% presumably were multi-byte
> sequences, so that the actual proportion of Unicode characters would
> be even less.
>
> I then saved the Arabic version of the page, which visually, when
> rendered, consists of 99% Arabic script. But the .htm file was still
> 80% ASCII!
>
> So what were you saying about ASCII being practically obsolete ... ?

Yes, HTML markup is all ASCII. However, as you say, the text content is
often anything but.

What I'm saying is that if you are designing a new programming language
and associated ecosystem, you are well advised to take Unicode into
account from the start. Take advantage of the hindsight; Python, Linux,
C, Java and Windows were not so lucky.

Marko