Speed revisited

Sun Jan 9 15:39:32 EST 2005

Bulba! wrote:
> On 8 Jan 2005 18:25:56 -0800, "John Machin" <sjmachin at lexicon.net>
> wrote:
>
> >Secondly, you are calling cmp() up to THREE times when once is
enough.
> >Didn't it occur to you that your last elif needed an else to finish
it
> >off, and the only possible action for the else suite was "assert
> >False"?
>
> Sure, but I was trying to make it shorter and absolutely clear
> what I mean in this place (when people see the comparison in every
> place, they see immediately what it was, they don't have to
> recall the variable). Obviously I'd optimise it in practice. It
> was supposed to be "proof of concept" rather than any working
> code.

Three is shorter than one? See immediately? I have to disagree. People
would be more put off by looking at your overly-complicated comparison
three times and having to check character-by-character that it was
doing exactly the same thing each time (i.e. your copy/paste had
worked) than by "recalling" a sensibly-named variable like
"cmp_result".

>
> BTW, this particular example didn't work out, but I still found
> Python to be the most convenient language for prototyping, I
> wrote smth like five versions of this thing, just to learn
> dictionaries, lists, sets, etc. In no other language I'd do
> it so quickly.

NOW we're on the same bus.

>
> >It would appear after reading your "snippet 2" a couple of times
that
> >you are trying to implement the old 3-tape update method.
>
> Well, ahem, I admit you got me curious just how it would work
> with tapes (never used them), so I was sort of  trying to simulate
> that - it's just a bit weird undertaking, I did it rather to explore
> the issue and try to learn smth rather than to get working code.

The 3-tape technique is worth understanding, for handling datasets that
won't fit into memory. More generally, when learning, you *need* to get
working code out of your practice exercises. Otherwise what are you
learning? You don't want to wait until you have two 10GB datasets to
"diff" before you start on thinking through, implementing and testing
what to do on end-of-file.

>
> Deleting the first item from a list was to be a convenient
> equivalent of forwarding the  reading head on the tape one
> record ahead, because I assumed that any deletion from the
> list was to take more or less the same time, just like reading
> heads on tapes were probably reading the records with similar
> speeds regardless of what length of tape was already read.

A reasonable assumption; however if the reading stopped and restarted,
an inordinate amount of time was taken accelerating the drive up to
reading speed again. The idea was to keep the tape moving at all times.
This required techiques like double-buffering, plus neat clean brief
fast processing routines.

... but you mixed in moving the indexes (with e.g. o+=1) with confusing
results.

Deleting the first item of a list in this circumstance reminds me of
the old joke: "Q: How many members of military organisation X does it
take to paint the barracks? A: 201, 1 to hold the paint-brush and 200
to lift the barracks and rotate it."

Tip 1: Once you have data in memory, don't move it, move a pointer or
index over the parts you are inspecting.

Tip 2: Develop an abhorrence of deleting data.

>
> >It would also appear that you are assuming/hoping that there are
never
> >more than one instance of a phrase in either list.
>
> Sure. Because using advice of Skip Montanaro I initially used sets
> to eliminate duplicates.

I see two problems with your implementation of Skip's not-unreasonable
suggestion. One: You've made the exercise into a multi-pass algorithm.
Two: As I posted earlier, you need to get all the instances of the same
key together at the same time. Otherwise you get problems. Suppose you
have in each list, two translations apple -> polish1 and apple ->
polish2. How can you guarantee that you remove the same duplicate from
each list, if you remove duplicates independently from each list?

> It's just not shown for brevity.

Then say so. On the evidence, brevity was not a plausible explanation
for the absence. :-)

> If the
> keys are guaranteed to be unique, it makes it easier to think
> about the algorithm.

Unfortunately in the real world you can't just imagine the problems
away. Like I said, you have to think about the requirements first -- 9
cases of (0, 1, many) x (0, 1, many); what do you need to do? Then
think about an implementation.

>
> >Note that in your snippet2 it was not very obvious what you want to
do
> >in the case where a phrase is in "new" but not in "old", and vice
versa
> >-- under one circumstance (you haven't met "end of file") you do
> >nothing but in the the other circumstance you do something but seem
to
> >have not only a polarity problem but also a copy-paste-edit problem.
>
> What exactly is a "polarity problem"?

You appeared to be treating "old" like I would have expected you to
treat "new" and vice versa.

>
> I concede that the code is rather contrived, but I didn't bother
> to do smth like writing classes or functions that would simulate
> a tape reader,
> so the result is rather ugly. I only posted it
> because I could not explain why it were so slow.
>

Was what I posted ugly? Did you see in it any classes or functions
simulating tape drives?