The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Marko Rauhamaa marko at pacujo.net
Sat Mar 12 07:13:34 EST 2016


BartC <bc at freeuk.com>:

> If you're looking at fast processing of language source code (in a
> thread partly about efficiency), then you cannot ignore the fact that
> the vast majority of characters being processed are going to have
> ASCII codes.

I don't know why you would optimize for inputting program source code.
Text in general has left ASCII behind a long time ago. Just go to
Wikipedia and click on any of the other languages.

Why, look at the *English* page on Hillary Clinton:

   Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/ (born
   October 26, 1947) is an American politician.
   <URL: https://en.wikipedia.org/wiki/Hillary_Clinton>

You couldn't get past the first sentence in ASCII.

> Language syntax could anyway stipulate that certain tokens can only
> consist of characters within the ASCII range.

Many programming languages do stipulate that. Nowadays, the main reason
for the limitation is that all keyboards can produce ASCII and no
keyboard can produce all of Unicode.

Actually, when I was in college, not all keyboards could produce ASCII.
That's why the Pascal programming language offers digraphs:

   (* here is a comment *)

for:

   { here is a comment }

and:

   someArray(.7,3.)

for:

   someArray[7,3]

(The weird American symbols {}[]\|#$^~ were abandoned and replaced with
something more relevant on European keyboards. Even the Brits would have
£ instead of #.)

In fact, the current C standard supports trigraphs for the same reason:

   ??=   #
   ??/   \
   ??'   ^
   ??(   [
   ??)   ]
   ??!   |
   ??<   {
   ??>   }
   ??-   ~

   [...]

   To safely place two consecutive question marks within a string
   literal, the programmer can use string concatenation "...?""?..."

   <URL: https://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C>

So be careful out there...


Marko



More information about the Python-list mailing list