is there a better way?

Sat Feb 11 08:48:55 EST 2006

On 2006-02-11, Alex Martelli <aleaxit at yahoo.com> wrote:
> markscala at gmail.com <markscala at gmail.com> wrote:
>
>> Problem:
>> 
>> You have a list of unknown length, such as this: list =
>> [X,X,X,O,O,O,O].  You want to extract all and only the X's.  You know
>> the X's are all up front and you know that the item after the last X is
>> an O, or that the list ends with an X.  There are never O's between
>> X's.
>
> If the list is incredibly long, you should use a bisection approach.
> Standard module bisect in the Python library could help, but mostly as
> an _example_, since it unfortunately relies on normal alphabetic order,
> and alphabetically speaking X should come _after_ O, not _before_.
>

Isn't every call to list.index() an O(n) operation?  We certainly want
to avoid multiple calls there if we can.

What happens if your split occurs in the middle of your block of Xs?
Then the split before/after fails --the stated algorithm says, "If the
split is an X, choose the front half," so perhaps the statement was
inprecise?

The only way you'll know if you have an X in a particular block is using
a linear search method, either in Python or with list.index()

If (as the OP seemed to state) we KNOW that there's only one block of
X's in the list:

1. Find the index of the first X
2. Find the index of the last X.
3. Delete the block we've found.

That relies on the underlying language, which means we're working in
"Linear Time in C", more or less.

If we make no such guarantee, then I can do the operation in linear
"Python Time" by scanning the list once, finding each instance and
calling list.del() as I find each block, keeping track of my current
position so I don't have to start over again.

Judging from way the OP worded the question, I'd advise making something
that works and that you understand how it works.

After that, s/he can worry about whether or not its performance is
suboptimal.

How large must the list be before "logarithmic Python algorithm" is
faster than "linear C algorithm"?  I've never measured, but it may be a
question worth exploring if one has a huge pile of data to chew on--like
US Census or UN budget-sized.