[Python-Dev] AlternativeImplementation forPEP292:SimpleString Substitutions

Mon Sep 13 14:32:00 CEST 2004

On Saturday 2004-09-11 08:35, Stephen J. Turnbull wrote:

>     >> But [efficiency], as such, is important only to efficiency
>     >> fanatics.
> 
>     Gareth> No, it's important to ... well, people to whom efficiency
>     Gareth> matters. There's no need for them to be fanatics.
> 
> If it matters just because they care, they're fanatics.  If it matters
> because they get some other benefit (response time less than the
> threshold of hotice, twice as many searches per unit time, half as
> many boxes to serve a given load), they're not.  </F>'s talk of many
> ways to do things "and Python should account for most of them" strikes
> me as fanaticism by that definition; the vast majority of developers
> will never deal with the special cases, or write apps that anticipate
> dealing with huge ASCII strings.  Those costs should be borne by the
> developers who do, and their clients.

I am unconvinced that "the vast majority of developers"
will not have work to do that involves a large volume of
ASCII data ... but I'm not sure this is something either
of us is in a position to know. (If it turns out that
you're just completing a PhD thesis entitled "Use of
large-volume string data among software developers",
or something, then please accept my apologies for guessing
wrong and enlighten me!)

> I apologize for shoehorning that into my reply to you.

That's OK.

>     >> The question is, how often are people going to notice that when
>     >> they have pure ASCII they get a 100% speedup [...]?
> 
>     Gareth> Why is that the question, rather than "how often are
>     Gareth> people going to benefit from getting a 100% speedup when
>     Gareth> they have pure ASCII"?
> 
> Because "benefit" is very subjective for _one_ person, and I don't
> want to even think about putting coefficients on your benefit versus
> mine.  If the benefit is large enough, a single person will be willing
> to do the extra work.  The question is, should all Python users and
> developers bear some burden to make it easier for that person to do
> what he needs to do?

"Burden" is just as subjective as "benefit". But let's take
a look at these burdens and benefits.

  - Burden for a very small number of Python developers:
    having to write and maintain a larger body of code,
    with duplication (at least of purpose) between Unicode
    and ASCII strings.

      - Consequent burden on all Python users: more risk
        of those developers getting burned out and giving
        up, less time for them to work on other aspects of
        Python, more danger of bugs in code, larger executables.

        They won't notice this, of course.

  + Benefit for a small (but nearly so small) number of
    Python users: important code runs twice as fast, and
    this makes a real difference to them.

      + Consequent benefit for all Python users: more
        use of Python means more people contributing
        code, bug reports, useful libraries, etc.

        They won't notice this, either.

  + Benefit for all Python users: some of their code runs
    a little faster.

    They won't notice this, either.

Perhaps I'm being obtuse, but it's far from clear to me that
this is a net loss for Python users at large. In any case,
the burdens seem less likely to be noticed than the benefits.

> I think "notice" is something you can get consensus on.  If a lot of
> people are _noticing_ the difference, I think that's a reasonable rule
> of thumb for when we might want to put "it", or facilities for making
> individual efforts to deal with "it" simpler, into "standard Python"
> at some level.  If only a few people are noticing, let them become
> expert at dealing with it.

But even if "noticing the difference" is the key point,
it is a mistake (I think) to make it specifically "noticing
that when they have pure ASCII they get a 100% speedup".
Hence my comment quoted below:

>     Gareth> Or even "how often are people going to try out Python on
>     Gareth> an application that uses pure-ASCII strings, and decide to
>     Gareth> use some other language that seems to do the job much
>     Gareth> faster"?
> 
> See?  You're now using a "notice" standard, too.  I don't think that's
> an accident.

It isn't. It's because I was replying to someone who apparently
took "notice" standards as the only relevant ones, in order to
point out that even with that assumptions there are relevant
questions other than "will anyone notice getting a speedup when
their data are pure ASCII?".

And I, in turn, apologize for shoehorning all *that* into the
word "even". :-)

I still think, though, that a "notice" standard makes for
bad designs. Most people would not notice if all floating-point
operations gave results with the last couple of bits wrong,
but it is a good thing that they don't. Some people wouldn't
notice but would get badly unsatisfactory results. Some people
would notice but would find it impractical to work around the
problems because that would mean tons of code and major losses
in speed.

Most people would not notice if by inserting the magic word
"wibble" at the start of their programs they could make them
10 times faster, but if for some weird reason it were possible
to make that so (but not possible to provide the speedup for
programs without "wibble") then it should be done.

What people notice is easier to define and to measure
than what actually makes a difference to them. That is
not enough reason to treat it as the only criterion.

-- 
g