regexp search on infinite string?

Paddy paddy3118 at googlemail.com
Sat Sep 15 11:58:59 EDT 2007


On Sep 15, 2:07 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Sep 15, 10:56 pm, Paddy <paddy3... at googlemail.com> wrote:
>
>
>
> > On Sep 14, 9:49 pm, Paddy <paddy3... at googlemail.com> wrote:
>
> > > Lets say i have a generator running that generates successive
> > > characters of a 'string'>From what I know, if I want to do a regexp search for a pattern of
>
> > > characters then I would have to 'freeze' the generator  and pass the
> > > characters so far to re.search.
> > > It is expensive to create successive characters, but caching could be
> > > used for past characters. is it possible to wrap the generator in a
> > > class, possibly inheriting from string, that would allow the regexp
> > > searching of the string but without terminating the generator? In
> > > other words duck typing for the usual string object needed by
> > > re.search?
>
> > > - Paddy.
>
> > There seems to be no way of breaking into the re library accessing
> > characters from the string:
>
> > >>> class S(str):
>
> > ...     def __getitem__(self, *a):
> > ...             print "getitem:",a
> > ...             return str.__getitem__(self, *a)
> > ...     def __get__(self, *a):
> > ...             print "get:",a
> > ...             return str.__get__(self, *a)
> > ...>>> s = S('sdasd')
> > >>> m = re.search('as', s); m.span()
> > (2, 4)
> > >>> m = sre.search('as', s); m.span()
> > (2, 4)
> > >>> class A(array.array):
>
> > ...     def __getitem__(self, *a):
> > ...             print "getitem:",a
> > ...             return str.__getitem__(self, *a)
> > ...     def __get__(self, *a):
> > ...             print "get:",a
> > ...             return str.__get__(self, *a)
> > ...
>
> > >>> s = A('c','sdasd')
> > >>> m = re.search('as', s); m.span()
> > (2, 4)
> > >>> m = sre.search('as', s); m.span()
> > (2, 4)
>
> > - Paddy.
>
> That would no doubt be because it either copies the input [we hope
> not] or more likely because it hands off the grunt work to a C module
> (_sre).

Yes, it seems to need a buffer/string so probably access a contiguous
area of memory from C.
o
>
> Why do you want to "break into" it, anyway?

A simulation generates stream of data that could be gigabytes from
which I'd like to find interesting bits by doing a regexp search. I
could use megabyte length sliding buffers, and probably will have to.

- Paddy.




More information about the Python-list mailing list