what would be the regular expression for null byte present in a string
Denis McMahon
denismfmcmahon at gmail.com
Tue Jan 13 12:25:11 EST 2015
On Tue, 13 Jan 2015 13:40:52 +0000, Shambhu Rajak wrote:
> I have a string that I get as an output of a command as:
> '\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a\x02\x00
\x00\x00\x00\x00\x00\x00\n'
>
> I want to fetch '10232ae8944a' from the above string.
>
> I want to find a re pattern that could replace all the \x01..\x0z to be
> replace by empty string '', so that I can get the desired portion of
> string
>
> Can anyone help me with a working regex for it.
What have you tried, and what was the result?
Regex isn't designed to work with byte strings.
>>> str = '\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x00\x0010232ae8944a
\x02\x00\x00\x00\x00\x00\x00\x00\n'
>>> str.replace('\x00','').replace('\x0c','').replace('\x01','').replace
('\x02','').replace('\n','')
'10232ae8944a'
This works for the specific example you gave, will your "string" ever
contain unwanted characters apart from \x00, \x01, \x02, \x0c, \n, and is
it ever possible for one of those to be in the wanted set?
>>> str[12:24]
'10232ae8944a'
This also works for the specific example you gave, is the data you want
to extract always going to be at the same offset in the string, and of
the same length?
>>> ''.join([str[x] for x in range(len(str)) if str[x] >= ' ' and str[x]
<= '~'])
'10232ae8944a'
This also works for the specific example you gave, and is a way to remove
non printing and 8bit characters from a string. Is this what you actually
want to do?
>>> str.strip('\x00\x0c\x01\x02\n')
'10232ae8944a'
This also works for the specific example that you gave, it uses the strip
function with a string of characters to be stripped, this will work as
long as you can predefine all the characters to strip and none of the
characters to strip is ever desired as part of the result.
So 4 different methods, each of which seems to do, in the case of the
specific example you gave, exactly what you want.
However although I tried a few patterns, I don't seem to be able to
create an re that will do the job.
eg:
>>> patt = re.compile(r'[0-9a-zA-Z]+')
>>> res = patt.match(str)
>>> res
>>> print res
None
>>> type(res)
<type 'NoneType'>
>>> patt = re.compile(r'[0-z]+')
>>> res = patt.match(str)
>>> res
>>> print res
None
>>> type(res)
<type 'NoneType'>
>>>
>>> patt = re.compile(r'[ -~]+')
>>> res = patt.match(str)
>>> res
>>> print res
None
>>> type(res)
<type 'NoneType'>
>>>
--
Denis McMahon, denismfmcmahon at gmail.com
More information about the Python-list
mailing list