Indentation and optional delimiters

Tue Feb 26 18:22:16 EST 2008

Steven D'Aprano:
> Usability for beginners is a good thing, but not at the expense of
> teaching them the right way to do things. Insisting on explicit requests
> before copying data is a *good* thing. If it's a gotcha for newbies,
> that's just a sign that newbies don't know the Right Way from the Wrong
> Way yet. The solution is to teach them, not to compromise on the Wrong
> Way. I don't want to write code where the following is possible:
> ...
> ... suddenly my code hits an unexpected performance drop
> ... as gigabytes of data get duplicated

I understand your point of view, and I tend to agree.
But let me express my other point of view. Computer languages are a
way to ask a machine to do some job. As time passes, computers become
faster, and people find that it becomes possible to create languages
that are higher level, that is often more distant from how the CPU
actually performs the job, allowing the human to express the job in a
way closer to how less trained humans talk to each other and perform
jobs. Probably many years ago a language like Python was too much
costly in terms of CPU, making it of little use for most non-toy
purposes. But there's a need for higher level computer languages.
Today Ruby is a bit higher-level than Python (despite being rather
close). So my mostly alternative answers to your problem are:
1) The code goes slow if you try to perform that operation? It means
the JIT is "broken", and we have to find a smarter JIT (and the user
will look for a better language). A higher level language means that
the user is more free to ignore what's under the hood, the user just
cares that the machine will perform the job, regardless how, the user
focuses the mind on what job to do, the low level details regarding
how to do it are left to the machine. It's a job of the JIT writers to
allow the user to do such job anyway. So the JIT must be even smarter,
and for example it partitions the 1 GB of data in blocks, each one of
them managed with copy-on-write, so maybe it just copies few megabytes
or memory. Such language may need to be smart enough. Despite that I
think today lot of people that have a 3GHZ CPU that may accept to use
a language 5 times slower than Python, that for example uses base-10
floating point numbers (they are different from Python Decimal
numbers). Almost every day on the Python newsgroup a newbie asks if
the round() is broken seeing this:
>>> round(1/3.0, 2)
0.33000000000000002
A higher level language (like Mathematica) must be designed to give
more numerically correct answers, even if it may require more CPU. But
such language isn't just for newbies: if I write a 10 lines program
that has to print 100 lines of numbers I want it to reduce my coding
time, avoiding me to think about base-2 floating point numbers. If the
language use a higher-level numbers by default I can ignore that
problem, and my coding becomes faster, and the bugs decrease. The same
happens with Python integers: they don't overflow, so I may ignore lot
of details (like taking care of possible oveflows) that I have to
think about when I use the C language. C is faster, but such speed
isn't necessary if I need to just print 100 lines of output with a 3
GHz PC. What I need in such situation is a language that allows me to
ignore how numbers are represented by the CPU, and prints the correct
numbers on the file. This is just a silly example, but it may show my
point of view (another example is below).
2) You don't process gigabytes of data with this language, it's
designed to solve smaller problems with smaller datasets. If you want
to solve very big problems you have to use a lower level language,
like Python, or C, or assembly. Computers allow us to solve bigger and
bigger problems, but today the life is full of little problems too,
like processing a single 50-lines long text file.
3) You buy an even faster computer, where even copying 1 GB of data is
fast enough.

Wolfram:
>Have a look at Tools/Scripts/pindent.py

Oh, that's it, almost. Thank you.
Bye,
bearophile

-----------------------

Appendix:

Another example, this is a little problem from this page:
http://www.faqs.org/docs/abs/HTML/writingscripts.html

>Find the sum of all five-digit numbers (in the range 10000 - 99999) containing exactly two  out of the following set of digits: { 4, 5, 6 }. These may repeat within the same number, and if so, they count once for each occurrence.<

I can solve it in 3.3 seconds on my old PC with Python like this:

print sum(n for n in xrange(10000, 100000) if len(set(str(n)) &
set("456")) == 2)

[Note: that's the second version of the code, the first version was
buggy because it contained:
... & set([4, 5, 6])

So I have used the Python shell to see what set(str(12345))&set("456")
was, the result was an empty set. So it's a type bug. A statically
language like D often can't catch such bugs anyway, because chars are
seen as numbers.]

In Python I can write a low-level-style code like this that requires
only 0.4 seconds with Psyco (it's backported from the D version,
because it has allowed me to think at lower-level. I was NOT able to
reach such low level and high speed writing a progam just for Psyco):

def main():
    digits = [0] * 10
    tot = 0
    for n in xrange(10000, 100000):
        i = n
        digits[4] = 0
        digits[5] = 0
        digits[6] = 0
        digits[i % 10] = 1; i /= 10
        digits[i % 10] = 1; i /= 10
        digits[i % 10] = 1; i /= 10
        digits[i % 10] = 1; i /= 10
        digits[i % 10] = 1
        if (digits[4] + digits[5] + digits[6]) == 2:
            tot += n
    print tot
import psyco; psyco.bind(main)
main()

Or I can solve it in 0.07 seconds in D language (and about 0.05
seconds in very similar C code with -O3 -fomit-frame-pointer):

void main() {
    int tot, d, i;
    int[10] digits;
    for (uint n = 10_000; n < 100_000; n++) {
        digits[4] = 0;
        digits[5] = 0;
        digits[6] = 0;
        i = n;
        digits[i % 10] = 1; i /= 10;
        digits[i % 10] = 1; i /= 10;
        digits[i % 10] = 1; i /= 10;
        digits[i % 10] = 1; i /= 10;
        digits[i % 10] = 1;
        if ((digits[4] + digits[5] + digits[6]) == 2)
            tot += n;
    }
    printf("%d\n", tot);
}

Assembly may suggest a bit lower level ways to solve the same problem
(using an instruction to compute div and mod at the same time, that
can go in EAX and EDX?), etc.

But if I just need to solve that "little" problem once, I may want to
reduce the sum of programming time + running time, so the in such
situation the first Python version wins (despite the quickly fixed
bug). That's why today people often use Python instead of C for small
problems. Similar things can be said about a possible language that is
a little higher level than Python.

Bye,
bearophile