[Tutor] escape character regex

Dave Angel davea at davea.name
Sun Mar 29 01:21:09 CET 2015


On 03/28/2015 03:37 PM, Ian D wrote:
> Hi
>
>
> I  run a regex like this:
>
>> pchars = re.compile('\x00\x00\x00') #with or without 'r' for raw

Which one did you actually want?  The 3 byte sequence consisting of 
nulls, or the 12 byte one containing zeroes and backslashes?  I'm going 
to assume the former, in which case you cannot use 'r' for raw.  Unless 
you've got a null key on your keyboard.

>
> on a string like this:
>
>> data = "['broadcast', 'd8on\x00\x00\x00\x11broadcast', 'd11on']"
>
>> print "found pchars :",pchars.findall(data)
>
> which returns:
>
>> found pchars : ['\x00\x00\x00']
>
>
> But if I try to match the extra digits at the end like this:
>
>> pchars = re.compile('\x00\x00\x00\x\d+')
>
> I get an error:
>
>> ValueError: invalid \x escape

The \x escape sequence must be followed by exactly two hex digits, and 
forms a single byte from them.  What did you want that byte to be, and 
why didn't you specify it?

>
> Or if I use another ide than idle it actually flags it as an "illegal hexadecimal escape sequence"
>

The question is not what the various IDE's produce, but what the Python 
compiler produces.  So once you started getting errors, you really 
should have just run it in the interactive interpreter, without IDE's 
second-guessing you.  Anyway, in 2.7.6's interactive interpreter, I get:

 >>> a = '\x00\x00\x00\x\d+'
ValueError: invalid \x escape
 >>>

So it has nothing to do with re, and is simply the result of trying an 
invalid string literal.

What string were you hoping to get?  You mention you wanted to match 
digits at the end (end of what?).  Perhaps you wanted a real backslash 
followed by the letter d.  In that case, since you cannot use a raw 
string (see my first response paragraph), you need to double the backslash.

 >>> a = '\x00\x00\x00\\d+'
 >>> print a
\d+


Your data is funny, too, since it almost looks like it might be a string 
representation of a Python list.  But assuming you meant it exactly like 
it is, there is a funny control character following the nulls.
>
> How could I match the \x00\x00\x00\x11 portion of the string?
>

There are no digits in that portion of the string, so I'm not sure why 
you were earlier trying to match digits.

Perhaps you meant you were trying to match the single control character 
x'11'.  In that case, you'd want

a = '\x00\x00\x00\x11'
pchars = re.compile(a)


But if you wanted to match an arbitrary character following the nulls, 
you'd want something different.

I think you'd better supply several strings to match against, and show 
which ones you'd expect a match for.

-- 
DaveA


More information about the Tutor mailing list