what would be the regular expression for null byte present in a string

Denis McMahon denismfmcmahon at gmail.com
Tue Jan 13 12:25:11 EST 2015


On Tue, 13 Jan 2015 13:40:52 +0000, Shambhu Rajak wrote:

> I have a string that I get as an output of a command as:
> '\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00
\x00\x00\x00\x00\x00\x00\n'
> 
> I want to fetch '10232ae8944a' from the above string.
> 
> I want to find a re pattern that could replace all the \x01..\x0z to be
> replace by empty string '',  so that I can get the desired portion of
> string
> 
> Can anyone help me with a working regex for it.

What have you tried, and what was the result?

Regex isn't designed to work with byte strings.

>>> str = '\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a
\x02\x00\x00\x00\x00\x00\x00\x00\n'

>>> str.replace('\x00','').replace('\x0c','').replace('\x01','').replace
('\x02','').replace('\n','')
'10232ae8944a'

This works for the specific example you gave, will your "string" ever 
contain unwanted characters apart from \x00, \x01, \x02, \x0c, \n, and is 
it ever possible for one of those to be in the wanted set?

>>> str[12:24]
'10232ae8944a'

This also works for the specific example you gave, is the data you want 
to extract always going to be at the same offset in the string, and of 
the same length?

>>> ''.join([str[x] for x in range(len(str)) if str[x] >= ' ' and str[x] 
<= '~'])
'10232ae8944a'

This also works for the specific example you gave, and is a way to remove 
non printing and 8bit characters from a string. Is this what you actually 
want to do?

>>> str.strip('\x00\x0c\x01\x02\n')
'10232ae8944a'

This also works for the specific example that you gave, it uses the strip 
function with a string of characters to be stripped, this will work as 
long as you can predefine all the characters to strip and none of the 
characters to strip is ever desired as part of the result.

So 4 different methods, each of which seems to do, in the case of the 
specific example you gave, exactly what you want.

However although I tried a few patterns, I don't seem to be able to 
create an re that will do the job.

eg:

>>> patt = re.compile(r'[0-9a-zA-Z]+')
>>> res = patt.match(str)
>>> res
>>> print res
None
>>> type(res)
<type 'NoneType'>

>>> patt = re.compile(r'[0-z]+')
>>> res = patt.match(str)
>>> res
>>> print res
None
>>> type(res)
<type 'NoneType'>
>>> 

>>> patt = re.compile(r'[ -~]+')
>>> res = patt.match(str)
>>> res
>>> print res
None
>>> type(res)
<type 'NoneType'>
>>> 

-- 
Denis McMahon, denismfmcmahon at gmail.com



More information about the Python-list mailing list