Regex'ing null bytes

Fredrik Lundh fredrik at pythonware.com
Wed Apr 21 11:53:24 EDT 2004


Jonas Galvez wrote:

> I'm trying to parse some binary data with regexes. It works well in
> the latest Python build, but I need to run this on Python 1.5.2. The
> binary data has a pattern like this:
>
>     keyName1\002..(.*)\000.*keyName2\002..(.*)\000
>     (I'm using regex syntax to illustrate)
>
> So I wrote the following script:
>
>     def amfKey(str):
>         return "%s\002..([^\000]*)" % str
>
>     keys = re.compile(amfKey("key"), re.DOTALL).findall(amfStr)
>
> Works on 2.3.3, but produces the following error on 1.5.2:
>
>     Traceback (innermost last):
>        File "test.py", line 26, in ?
>          keys = re.compile(amfKey("key"), re.DOTALL).findall(amfStr)
>        File "C:\Python152\Lib\re.py", line 79, in compile
>          code=pcre_compile(pattern, flags, groupindex)
>      TypeError: argument 1: expected string without null bytes, string found
>
> Does anyone know a workaround?

the "pre" (pcre) engine doesn't support null bytes in pattern strings, but
it does understand octal escapes.

in other words, use double backslashes, or an "r" string:

    "\\000"
    r"\0"

</F>







More information about the Python-list mailing list