Indentation and optional delimiters

Tue Feb 26 20:43:09 EST 2008

On Feb 26, 5:22 pm, bearophileH... at lycos.com wrote:
> Steven D'Aprano:
>
> > Usability for beginners is a good thing, but not at the expense of
> > teaching them the right way to do things. Insisting on explicit requests
> > before copying data is a *good* thing. If it's a gotcha for newbies,
> > that's just a sign that newbies don't know the Right Way from the Wrong
> > Way yet. The solution is to teach them, not to compromise on the Wrong
> > Way. I don't want to write code where the following is possible:
> > ...
> > ... suddenly my code hits an unexpected performance drop
> > ... as gigabytes of data get duplicated
>
> I understand your point of view, and I tend to agree.
> But let me express my other point of view. Computer languages are a
> way to ask a machine to do some job. As time passes, computers become
> faster, and people find that it becomes possible to create languages
> that are higher level, that is often more distant from how the CPU
> actually performs the job, allowing the human to express the job in a
> way closer to how less trained humans talk to each other and perform
> jobs. Probably many years ago a language like Python was too much
> costly in terms of CPU, making it of little use for most non-toy
> purposes. But there's a need for higher level computer languages.
> Today Ruby is a bit higher-level than Python (despite being rather
> close). So my mostly alternative answers to your problem are:
> 1) The code goes slow if you try to perform that operation? It means
> the JIT is "broken", and we have to find a smarter JIT (and the user
> will look for a better language). A higher level language means that
> the user is more free to ignore what's under the hood, the user just
> cares that the machine will perform the job, regardless how, the user
> focuses the mind on what job to do, the low level details regarding
> how to do it are left to the machine. It's a job of the JIT writers to
> allow the user to do such job anyway. So the JIT must be even smarter,
> and for example it partitions the 1 GB of data in blocks, each one of
> them managed with copy-on-write, so maybe it just copies few megabytes
> or memory. Such language may need to be smart enough. Despite that I
> think today lot of people that have a 3GHZ CPU that may accept to use
> a language 5 times slower than Python, that for example uses base-10
> floating point numbers (they are different from Python Decimal
> numbers). Almost every day on the Python newsgroup a newbie asks if
> the round() is broken seeing this:>>> round(1/3.0, 2)
>
> 0.33000000000000002
> A higher level language (like Mathematica) must be designed to give
> more numerically correct answers, even if it may require more CPU. But
> such language isn't just for newbies: if I write a 10 lines program
> that has to print 100 lines of numbers I want it to reduce my coding
> time, avoiding me to think about base-2 floating point numbers. If the
> language use a higher-level numbers by default I can ignore that
> problem, and my coding becomes faster, and the bugs decrease. The same
> happens with Python integers: they don't overflow, so I may ignore lot
> of details (like taking care of possible oveflows) that I have to
> think about when I use the C language. C is faster, but such speed
> isn't necessary if I need to just print 100 lines of output with a 3
> GHz PC. What I need in such situation is a language that allows me to
> ignore how numbers are represented by the CPU, and prints the correct
> numbers on the file. This is just a silly example, but it may show my
> point of view (another example is below).
> 2) You don't process gigabytes of data with this language, it's
> designed to solve smaller problems with smaller datasets. If you want
> to solve very big problems you have to use a lower level language,
> like Python, or C, or assembly. Computers allow us to solve bigger and
> bigger problems, but today the life is full of little problems too,
> like processing a single 50-lines long text file.
> 3) You buy an even faster computer, where even copying 1 GB of data is
> fast enough.
>
> Wolfram:
>
> >Have a look at Tools/Scripts/pindent.py
>
> Oh, that's it, almost. Thank you.
> Bye,
> bearophile
>
> -----------------------
>
> Appendix:
>
> Another example, this is a little problem from this page:http://www.faqs.org/docs/abs/HTML/writingscripts.html
>
> >Find the sum of all five-digit numbers (in the range 10000 - 99999) containing exactly two  out of the following set of digits: { 4, 5, 6 }. These may repeat within the same number, and if so, they count once for each occurrence.<
>
> I can solve it in 3.3 seconds on my old PC with Python like this:
>
> print sum(n for n in xrange(10000, 100000) if len(set(str(n)) &
> set("456")) == 2)
>
> [Note: that's the second version of the code, the first version was
> buggy because it contained:
> ... & set([4, 5, 6])
>
> So I have used the Python shell to see what set(str(12345))&set("456")
> was, the result was an empty set. So it's a type bug. A statically
> language like D often can't catch such bugs anyway, because chars are
> seen as numbers.]
>
> In Python I can write a low-level-style code like this that requires
> only 0.4 seconds with Psyco (it's backported from the D version,
> because it has allowed me to think at lower-level. I was NOT able to
> reach such low level and high speed writing a progam just for Psyco):
>
> def main():
>     digits = [0] * 10
>     tot = 0
>     for n in xrange(10000, 100000):
>         i = n
>         digits[4] = 0
>         digits[5] = 0
>         digits[6] = 0
>         digits[i % 10] = 1; i /= 10
>         digits[i % 10] = 1; i /= 10
>         digits[i % 10] = 1; i /= 10
>         digits[i % 10] = 1; i /= 10
>         digits[i % 10] = 1
>         if (digits[4] + digits[5] + digits[6]) == 2:
>             tot += n
>     print tot
> import psyco; psyco.bind(main)
> main()
>
> Or I can solve it in 0.07 seconds in D language (and about 0.05
> seconds in very similar C code with -O3 -fomit-frame-pointer):
>
> void main() {
>     int tot, d, i;
>     int[10] digits;
>     for (uint n = 10_000; n < 100_000; n++) {
>         digits[4] = 0;
>         digits[5] = 0;
>         digits[6] = 0;
>         i = n;
>         digits[i % 10] = 1; i /= 10;
>         digits[i % 10] = 1; i /= 10;
>         digits[i % 10] = 1; i /= 10;
>         digits[i % 10] = 1; i /= 10;
>         digits[i % 10] = 1;
>         if ((digits[4] + digits[5] + digits[6]) == 2)
>             tot += n;
>     }
>     printf("%d\n", tot);
>
> }
>
> Assembly may suggest a bit lower level ways to solve the same problem
> (using an instruction to compute div and mod at the same time, that
> can go in EAX and EDX?), etc.
>
> But if I just need to solve that "little" problem once, I may want to
> reduce the sum of programming time + running time, so the in such
> situation the first Python version wins (despite the quickly fixed
> bug). That's why today people often use Python instead of C for small
> problems. Similar things can be said about a possible language that is
> a little higher level than Python.
>
> Bye,
> bearophile

You're looking at a few variables.
1) Time to code as a function of person / personal characteristic and
program
2) Time to run as a function of machine and program
3) Bugs (distinct bugs) as a function of person / personal
characteristic and program
3a) Bug's obviousness upon running ... ( person, program ) -- the
program screwed up, but person can't tell 'til later -- ( for program
with exactly one bug, or func. of ( person, program, bug ) )
3b) Bug's time to fix ( person, program [, bug ] )
3c) Bug incidence -- count of bugs the first time through ( person,
program )

(3) assumes you have experts and you're measuring number of bugs &c.
compared to a bug-free ideal in a lab.  If no one knows if a program
(say, if it's large) has bugs, high values for (3a) might be
important.
(1)-(3) define different solutions to the same problem as different
programs, i.e. the program states its precise implementation, but then
the only thing that can vary data point to data point is variable
names, i.e. how precise the statement, and only to a degree: you might
get bugs in a memory manager even if you reorder certain ("ideally
reorderable") sequences of statements; and you might get variations if
you use paralell arrays vs. structures vs. paralell variable names.
Otherwise, you can specify an objective, deterministic, not
necessarily binary, metric of similarity and identity of programs.
Otherwise yet, a program maps input to output (+ residue, the
difference in machine state start to completion), so use descriptive
statistics (mean, variance, quartiles, outliers, extrema) on the
answers.  E.g., for (2), the fastest C program (sampled) (that maps I-
>O) way surpasses the fastest Perl program (sampled), and it was
written by Steve C. Guru, and we couldn't find Steve Perl Guru; and
besides, the means across programs in C and Perl show no statistically
significant difference at the 96% confidence level.  And besides,
there is no algorithm to generate even the fastest-running program (of
a problem/spec) for a machine in a language, much less (1) and (3)!
So you're looking at ( coder with coder trait or traitless, program
problem, program solution, language implementation, op'ing sys.,
hardware, inital state ) for variables in your answers.  That's one of
the obstructions anyway to rigorous metrics of languages: you never
run the language.  (Steve Traitless Coder-- v. interesting.-- given
nothing but the problem, the install and machine, and the internet--
throw doc. traits and internet connection speed in!-- how good is a
simple random sample?-- or Steve Self-Proclaimed Non-Zero Experience
and Familiarity Perl Coder, or Steve Self-Proclaimed Non-Trivial
Experience and Familiarity Perl Coder.)

And don't forget a bug identity metric too-- if two sprout up while
fixing one, is that one, two, or three?  Do the answers to (1) and (2)
vary with count of bugs remaining?  If a "program" maps input to
output, then Python has never been written.

That doesn't stop you from saying what you want though - what your
priorities are:
1) Time to code.  Important.
2) Time to run.  Unimportant.
3a) Bug obviousness.  Important.
3b) Bug time to fix.  Important.
3c) Bug incidence.  Less important.

Ranked.
1) Time to code.
2) Bug obviousness.  It's ok if Steve Proposed Language Guru rarely
codes ten lines without a bug, so long as he can always catch them
right away.
3) Bug time to fix.
4) Bug incidence.
unranked) Time to run.

Are you wanting an interpreter that runs an Amazon Cloud A.I. to catch
bugs?  That's another $0.10, please, ma'am.

> b.append(1)
> ... suddenly my code hits an unexpected performance drop

Expect it, or use a different data structure.