Moving list entries from one list to another

Sun Jul 14 03:09:26 EDT 2002

On Sat, 13 Jul 2002 22:32:32 +0200, JB <jblazi at hotmail.com> wrote:
[...]
>
>f will be *slow*. The list entries are tuples and the tuple 
>entries are strings. Then f is "in" for strings, that is it 
>checks, if one string contains another one. What I want to 
>implement, is a list view with incr4emental search. There
What kind of incremental search? Just matching leading characters?
Or matching additional words anywhere? Or arbitrary word fragments? Or?

>will be up to 250000 entrie in the list view at the moment 
>(for example all headers of the news group 
>alt.binaries.mp3.classical) and this number is going to 
>increase in the  future as news servers become bigger and 
>bigger. I thought that when f is slow, then at least the 
>merging should be as fast as possible.
This sounds like the data changes fairly infrequently compared
to how often it's searched. Would a costly indexing job run
infrequently be worth while if it paid off in search speed?

There are probably better ways to generate a selection than
brute scanning of python lists with strings, but just to get
the requirements down, does this do the kind of incremental
search you are interested in?
--
 >>> class IncSrch(object):
 ...     def __init__(self, master_list):
 ...         self.master = master_list
 ...         self.curr = []
 ...     def new_search(self, patt):
 ...         self.curr = [x for x in self.master if x[1].find(patt)!= -1]
 ...     def inc_search(self, patt):
 ...         self.curr = [x for x in self.curr if x[1].find(patt)!= -1]
 ...     def __repr__(self):
 ...         return '\n'.join([`x` for x in self.curr])
 ...

Here we just generate a list of example tuples with ids and random-suffix strings:
 >>> import random
 >>> r = random.Random()
 >>> master1 = [(i,'item_%s%s' % (r.choice('ABCD'),r.choice('XYZW'))) for i in xrange(10)]
 >>> master1
 [(0, 'item_AZ'), (1, 'item_BX'), (2, 'item_BY'), (3, 'item_BZ'), (4, 'item_AZ'), (5, 'item
 _BW'), (6, 'item_BW'), (7, 'item_DY'), (8, 'item_BY'), (9, 'item_CY')]

(Is this like your data? Or how does it differ?)

Here we initialize a search object with a master starting list and nothing selected
 >>> is1 = IncSrch(master1)
 >>> is1

 >>> `is1`
 ''
I.e., there is nothing selected to the curr list yet.

Here we do a first search based on containing substring 'D':
 >>> is1.new_search('D')
Just typing the name calls repr to print the current state of the search object:
 >>> is1
 (7, 'item_DY')

That wasn't much to do a secondary search on, so we do a new search on the master list for 'B':
 >>> is1.new_search('B')
 >>> is1
 (1, 'item_BX')
 (2, 'item_BY')
 (3, 'item_BZ')
 (5, 'item_BW')
 (6, 'item_BW')
 (8, 'item_BY')

That got a few, so we incrementally search for 'Y':
 >>> is1.inc_search('Y')
 >>> is1
 (2, 'item_BY')
 (8, 'item_BY')

And so forth. Is that the kind of thing you want to do?
I presume you have to pass a list of items in some form to the
display widget, and that you will not want to pass more than
say MAX_FOR_WIDGET items for a reasonable scrolling display,
and you may(??) need the items as a list of strings, not tuples,
so you might want to add a first_for_widget() and next_for_widget()
method to the class as part of the definition (but I'll do it separately
here since I have some state above to use):

 >>> def first_for_widget(self):
 ...     self.first = 0
 ...     return ['id:%s -- %s' % (x[0],`x[1]`) for x in self.curr[0:MAX_FOR_WIDGET]]
 ...
 >>> def next_for_widget(self):
 ...     self.first += MAX_FOR_WIDGET
 ...     return ['id:%s -- %s' % (x[0],`x[1]`) for x in self.curr[self.first:self.first+MAX_FOR_WIDGET]]
 ...
 >>> MAX_FOR_WIDGET = 4

Here we'll dynamically add in the methods above that normally would be defined in the class:
 >>> IncSrch.first_for_widget = first_for_widget
 >>> IncSrch.next_for_widget = next_for_widget

Let's get the bigger first search results again:
 >>> is1.new_search('B')
 >>> is1
 (1, 'item_BX')
 (2, 'item_BY')
 (3, 'item_BZ')
 (5, 'item_BW')
 (6, 'item_BW')
 (8, 'item_BY')

Now the first items for a widget:
 >>> is1.first_for_widget()
 ["id:1 -- 'item_BX'", "id:2 -- 'item_BY'", "id:3 -- 'item_BZ'", "id:5 -- 'item_BW'"]

That was MAX_FOR_WIDGET (4) items, now get some more:
 >>> is1.next_for_widget()
 ["id:6 -- 'item_BW'", "id:8 -- 'item_BY'"]

That used them up without making a full four. Try for next:
 >>> is1.next_for_widget()
 []

Nothing left.
Now do the incremental search again, but let's go for the W's:
 >>> is1.inc_search('W')
 >>> is1
 (5, 'item_BW')
 (6, 'item_BW')

Get the first part of the list for the widget
 >>> is1.first_for_widget()
 ["id:5 -- 'item_BW'", "id:6 -- 'item_BW'"]

That was all of it:
 >>> is1.next_for_widget()
 []

If you will comment on what this simple class does and doesn't do that
you need functionally, that will establish unambiguous requirements. Then
we can make it fast one way or another.

Regards,
Bengt Richter