[Python-Dev] sre.c and sre_match()

Jack Diederich jack@performancedrivers.com
Wed, 16 Apr 2003 13:33:58 -0400


> [Jack Diederich]
> > ...
> > I was actually poking around to see how hard it would be to allow
> > pure-python string classes to work with the re modules.
[Tim Peters]
> Sorry, no idea.  Note that sre works on any object supporting the ill-fated
> buffer interface.  You may have a hard time figuring out that too.  But,
> e.g., it implies that re can search directly over an mmap'ed file (you don't
> need to read the file into a string first).

Poking around some more in _sre.c

It looks like user defined strings could be supported via the same #include
hack as unicode with some extra defines.

// ascii/unicdoe
#define STATE_NEXT_CHAR(state) state->ptr++
// user strings
#define STATE_NEXT_CHAR(state) PyEval_CallObject(state->string_nextmethod)

similar for STATE_PREV_CHAR

and something to ask if we're at the end

// ascii
#define STATE_ISEND(state) (state->ptr == state->end)
// user strings
#define STATE_ISEND(state) PyEval_CallOjbect(state->string_endmethod)

Is there a speed reason why all the SRE_MATCH type functions do
  ptr = state->ptr;
  ptr++;
  ptr--;
  // lots more stuff with ptr
  state->ptr = ptr;

or is it just convenience?  If just convenience it would make writing the
#defines easier.

the PyEval_CallObjects are just psuedo code, it would be wrapped in something
that tested the appropriateness of the return value and other book keeping.

could this be done without hurting the speed of regular regexps?

-jackdied