is there a better way?

Sat Feb 11 11:39:30 EST 2006

Charles Krug <cdkrug at aol.com> wrote:

> On 2006-02-11, Alex Martelli <aleaxit at yahoo.com> wrote:
> > markscala at gmail.com <markscala at gmail.com> wrote:
> >
> >> Problem:
> >> 
> >> You have a list of unknown length, such as this: list =
> >> [X,X,X,O,O,O,O].  You want to extract all and only the X's.  You know
> >> the X's are all up front and you know that the item after the last X is
> >> an O, or that the list ends with an X.  There are never O's between
> >> X's.
> >
> > If the list is incredibly long, you should use a bisection approach.
> > Standard module bisect in the Python library could help, but mostly as
> > an _example_, since it unfortunately relies on normal alphabetic order,
> > and alphabetically speaking X should come _after_ O, not _before_.
> 
> Isn't every call to list.index() an O(n) operation?  We certainly want
> to avoid multiple calls there if we can.

Why should we have ANY call to index whatsoever?

> What happens if your split occurs in the middle of your block of Xs?
> Then the split before/after fails --the stated algorithm says, "If the
> split is an X, choose the front half," so perhaps the statement was
> inprecise?

What statement?  What are you TALKING about?!  What I said, and you
don't quote, was:

> > 2. if it's an X, so are all previous items -- recurse to second half

how can you POSSIBLY read this as "choose the front half"?!  I say to
recurse (iteration works, too, but it's even trickier to code) to the
SECOND half, to find the first-if-any non-'X'.

> The only way you'll know if you have an X in a particular block is using
> a linear search method, either in Python or with list.index()

Reread markscala's problem statement: all the Xs are up front followed
by 0 or more Os.  So, the list is L = ['X']*N + ['O']*M for unknown N>=0
and M>=0.  All we have to do is find N and M (if we know either, the
other is immediately computed, since N+M==len(L) and len() is O(1)).

> If (as the OP seemed to state) we KNOW that there's only one block of
> X's in the list:
> 
> 1. Find the index of the first X

Why would we do that?  He stated, very clearly, and you and I have both
been quoting:

> >> You know
> >> the X's are all up front 

So why would we do any work to find out what we altrady know?

> 2. Find the index of the last X.

Yep, the only certain task, and it's O(log(N+M)).

> 3. Delete the block we've found.

And the deletion is definitely linear time (and trivial), of course.  I
was focusing on the only interesting part, (2).

> Judging from way the OP worded the question, I'd advise making something
> that works and that you understand how it works.
> 
> After that, s/he can worry about whether or not its performance is
> suboptimal.

And indeed, part of what I said (and again you're snipping it rather
than quoting it was:

> > If N is not too huge, O(N) might be OK, and is, of course, way simpler
> > to code!-)

However, even though the O(N) in the deletion subtask would appear to
justify this shortcut, I think the task is way too trivial to justify a
linear-time approach to point 2 -- the obvious N = L.count('X'), of
course.  It seems likely that the whole purpose of the exercise
(probably homework) is to have the student identify and develop a
bisection (a notoriously tricky-to-code thing).

Alex