PEP 358 and operations on bytes

John Machin sjmachin at lexicon.net
Wed Oct 4 18:51:22 EDT 2006


Fredrik Lundh wrote:
> John Machin wrote:
>
> > But not on other integer subtypes. If regexps should not be restricted
> > to text, they should work on domains whose number of symbols is greater
> > than 256, shouldn't they?
>
> they do:
>
> import re, array
>
> data = [0, 1, 1, 2]
>
> array_type = "IH"[re.sre_compile.MAXCODE == 0xffff]
>
> a = array.array(array_type, data)
>
> m = re.search(r"\x01+", a)
>
> if m:
>      print m.span()
>      print m.group()

Very minor nit: re.sre_compile doesn't exist before Python 2.5.
Presumably sys.maxunicode can substitute for re.sre_compile.MAXCODE.

That aside, I'd like to nominate myself as UGPOTM (utterly gobsmacked
poster of the month). Not only does that work, but so does this, all
the way back to 2.1 at least:

import re, array
data = [0, 1, 1, 2, 257, 257, 258]
# array_type = "IH"[re.sre_compile.MAXCODE == 0xffff] # Python 2.5
array_type = "H"
a = array.array(array_type, data)
for q in (r"\x01+", ur"\u0101+"):
    m = re.search(q, a)
    if m:
         print m.span()
         print m.group()

produces:

(1, 3)
array('H', [1, 1])
(4, 6)
array('H', [257, 257])

Now, scurrying back towards Gerrit's original point: this feature  is
not documented, even for array.array('B', ...). Should it be left as a
happy accident of duck-typing, accessible only to those who stumble
over it, or should it be supported? Should it be included in Python 3?

Cheers,
John




More information about the Python-list mailing list