[Tutor] escape character regex
Dave Angel
davea at davea.name
Sun Mar 29 01:21:09 CET 2015
On 03/28/2015 03:37 PM, Ian D wrote:
> Hi
>
>
> I run a regex like this:
>
>> pchars = re.compile('\x00\x00\x00') #with or without 'r' for raw
Which one did you actually want? The 3 byte sequence consisting of
nulls, or the 12 byte one containing zeroes and backslashes? I'm going
to assume the former, in which case you cannot use 'r' for raw. Unless
you've got a null key on your keyboard.
>
> on a string like this:
>
>> data = "['broadcast', 'd8on\x00\x00\x00\x11broadcast', 'd11on']"
>
>> print "found pchars :",pchars.findall(data)
>
> which returns:
>
>> found pchars : ['\x00\x00\x00']
>
>
> But if I try to match the extra digits at the end like this:
>
>> pchars = re.compile('\x00\x00\x00\x\d+')
>
> I get an error:
>
>> ValueError: invalid \x escape
The \x escape sequence must be followed by exactly two hex digits, and
forms a single byte from them. What did you want that byte to be, and
why didn't you specify it?
>
> Or if I use another ide than idle it actually flags it as an "illegal hexadecimal escape sequence"
>
The question is not what the various IDE's produce, but what the Python
compiler produces. So once you started getting errors, you really
should have just run it in the interactive interpreter, without IDE's
second-guessing you. Anyway, in 2.7.6's interactive interpreter, I get:
>>> a = '\x00\x00\x00\x\d+'
ValueError: invalid \x escape
>>>
So it has nothing to do with re, and is simply the result of trying an
invalid string literal.
What string were you hoping to get? You mention you wanted to match
digits at the end (end of what?). Perhaps you wanted a real backslash
followed by the letter d. In that case, since you cannot use a raw
string (see my first response paragraph), you need to double the backslash.
>>> a = '\x00\x00\x00\\d+'
>>> print a
\d+
Your data is funny, too, since it almost looks like it might be a string
representation of a Python list. But assuming you meant it exactly like
it is, there is a funny control character following the nulls.
>
> How could I match the \x00\x00\x00\x11 portion of the string?
>
There are no digits in that portion of the string, so I'm not sure why
you were earlier trying to match digits.
Perhaps you meant you were trying to match the single control character
x'11'. In that case, you'd want
a = '\x00\x00\x00\x11'
pchars = re.compile(a)
But if you wanted to match an arbitrary character following the nulls,
you'd want something different.
I think you'd better supply several strings to match against, and show
which ones you'd expect a match for.
--
DaveA
More information about the Tutor
mailing list