sscanf ?
Bruce Edge
bedge at troikanetworks.com
Tue Nov 20 18:47:17 EST 2001
In <Xns915FEF352160Dcliechtigmxnet at 62.2.16.82>, Chris Liechti wrote:
> [posted and mailed]
>
> "Bruce Edge" <bedge at troikanetworks.com> wrote in
> news:2SzK7.605$px5.156976 at newsfeed.slurp.net:
>>> It wouldn't be hard to write a simple wrapping script that takes a
>>> printf format string, converts it to a regular expression, does a
>>> match against a string, then pulls out the arguments and converts them
>>> to types as appropriate. But one might just as well be using regular
>>> expressions directly in the first place.
>>>
>>>
>> Here's what I did to convert the format string to a regex. It's not
>> pretty, and many types will break it, but it serves the purpose, for
>> now:
>>
>> def fmtstr2regex( str ):
>> regex = ""
>> while len(str):
>> if str[0] == '%':
>> x =
>> re.match("%(?P<len>\d*)(?P<type>\w)(?P<rest>.*)$",str)
>
> you don't match "%6.3f" and similar and not "%-4f"
I knew it was a bad idea to post this :) You're right of course, it's far
from complete.
>> if not x:
>> exc = CommandException()
>> exc.reason = "Invalid print format specifier %s" %
>> str raise exc
>
> when you define an exception like this:
> class CommandException(Exception): pass
>
> you can simply call "raise CommandException("reason %s" % str)
good to know.
>
>> length = int( x.group("len") )
>
> needs an if here...
maybe it _was_ a good idea to post this.... Thanks...
> if x.group("len"):
> length = int( x.group("len") )
> else:
> length = None
>
>
>> type = x.group("type")
>> # Some types need to be changed from printf to regexp
>> world if type == 'x':
>> type = '['+string.hexdigits+']'
>> else:
>> type = "\\%s" % type
>
> type 's' should also be transformed to 'w' ('\s' are whitespaces)
>
>> regex += "(%s" % type
>> if length:
>> regex += "{%d,%d})" % ( length, length )
>> else:
>> regex += "+)"
>> str = x.group('rest')
>> else:
>> regex += "\%s" % str[0]
>
> whats the intent of '\' here if you want to escape the character you
> should write two (ok "\%" -> '\\%' but writing it explicit is better,
> like you did above). but escaping produces wrong regexes here.
You're right. For my test case I had dots in my string. I should just do
reserved chars.
>>>> print fmtstr2regex("1.st %d")
> \1\.\s\t\ (\d+)
>
> '\s', '\t', '\1' are all special commands for regexes, you realy want is
> r'1\.st (\d+)'
>
> only reserved regex characters should be escaped. [](){}.+?*$^\
>
>
>> str = str[1:]
>> return regex
>>
>>
> just a sidenote: "type" is a builtin function, which you shadow with
> your variable
>
> but overall a nice idea..
> chris
>
OK, I think this is a bit better, heven't tested yet:
def fmtstr2regex( str ):
regex = ""
while len(str):
if str[0] == '%':
# "%6.3f" and similar and not "%-4f" not handled!!
x = re.match( "%(?P<len>\d*)(?P<type>\w)(?P<rest>.*)$", str )
if not x:
raise CommandException( "Invalid print format specifier %s" % str )
if x.group("len"):
length = int( x.group("len") )
else:
length = None
data_type = x.group("type")
# Some types need to be changed from printf to regexp world
if data_type == 'x':
data_type = '['+string.hexdigits+']'
else:
data_type = "\\%s" % data_type
regex += "(%s" % data_type
if length:
regex += "{%d,%d})" % ( length, length )
else:
regex += "+)"
str = x.group('rest')
else:
if str[0] in "[](){}.+?*$^\\":
regex += "\%s" % str[0]
else:
regex += "%s" % str[0]
str = str[1:]
return regex
More information about the Python-list
mailing list