[Python-Dev] [Python-checkins] python/dist/src/Lib sre_compile.py, 1.57, 1.58

A.M. Kuchling amk at amk.ca
Thu Jun 2 21:51:40 CEST 2005


On Thu, Jun 02, 2005 at 03:34:17PM -0400, Raymond Hettinger wrote:
> The times two operation also occurs twice in nearby code.  Are those
> also incorrect?

I believe they're correct.  

EXPN: The regex engine refers to both 'groups', where group #N means
the corresponding group in the pattern, and 'marks', which is a table
of indexes into the string; group #0 -> mark 0 and 1, group #1 -> mark
2 and 3, etc.  The compiler was storing the mark value, but then the
code for the GROUPREF_EXISTS opcode was doubling it, converting group
to mark:

        case SRE_OP_GROUPREF_EXISTS:
            TRACE(("|%p|%p|GROUPREF_EXISTS %d\n", ctx->pattern,
                   ctx->ptr, ctx->pattern[0]));
            /* <GROUPREF_EXISTS> <group> <skip> codeyes <JUMP> codeno ... */
            i = ctx->pattern[0];
            {
                int groupref = i+i;
                if (groupref >= state->lastmark) {
                    ctx->pattern += ctx->pattern[1];
                    break;
                } else {
                ...
             }

The two other uses of *2 are for the MARK opcode, which stores the
current position in the string in the corresponding entry in the
table; it really does want the mark number, so the doubling is
correct.

The 'groupref = i+i' in the above code is a bit odd -- why not write 2*i?  I
thought it might be a typo for i+1, but that wouldn't make sense
either; eventually I decided this was just a stylistic thing.

--am"Only the effbot knows for sure"k





More information about the Python-Dev mailing list