sscanf ?

Bruce Edge bedge at troikanetworks.com
Tue Nov 20 18:47:17 EST 2001


In <Xns915FEF352160Dcliechtigmxnet at 62.2.16.82>, Chris Liechti wrote:

> [posted and mailed]
> 
> "Bruce Edge" <bedge at troikanetworks.com> wrote in
> news:2SzK7.605$px5.156976 at newsfeed.slurp.net:
>>> It wouldn't be hard to write a simple wrapping script that takes a
>>> printf format string, converts it to a regular expression, does a
>>> match against a string, then pulls out the arguments and converts them
>>> to types as appropriate.  But one might just as well be using regular
>>> expressions directly in the first place.
>>> 
>>> 
>> Here's what I did to convert the format string to a regex. It's not
>> pretty, and many types will break it, but it serves the purpose, for
>> now:
>> 
>> def fmtstr2regex( str ):
>>      regex = ""
>>      while len(str):
>>           if str[0] == '%':
>>                x =
>>                re.match("%(?P<len>\d*)(?P<type>\w)(?P<rest>.*)$",str)
> 
> you don't match "%6.3f" and similar and not "%-4f"

I knew it was a bad idea to post this :) You're right of course, it's far
from complete.


>>                if not x:
>>                     exc = CommandException()
>>                     exc.reason = "Invalid print format specifier %s" %
>>                     str raise exc
> 
> when you define an exception like this:
>     	class CommandException(Exception): pass
> 
> you can simply call "raise CommandException("reason %s" % str)

good to know.


> 
>>                length = int( x.group("len") )
> 
> needs an if here...

maybe it _was_ a good idea to post this.... Thanks...


>     	    	     if x.group("len"):
>                      length = int( x.group("len") )
>                  else:
>                      length = None
> 
> 
>>                type = x.group("type")
>>                # Some types need to be changed from printf to regexp
>>                world if type == 'x':
>>                     type = '['+string.hexdigits+']'
>>                else:
>>                     type = "\\%s" % type
> 
> type 's' should also be transformed to 'w' ('\s' are whitespaces)
> 
>>                regex += "(%s" % type
>>                if length:
>>                     regex += "{%d,%d})" % ( length, length )
>>                else:
>>                     regex += "+)"
>>                str = x.group('rest')
>>           else:
>>                regex += "\%s" % str[0]
> 
> whats the intent of '\' here if you want to escape the character you
> should write two (ok "\%" -> '\\%' but writing it explicit is better,
> like you did above). but escaping produces wrong regexes here.

You're right. For my test case I had dots in my string. I should just do
reserved chars.



>>>> print fmtstr2regex("1.st %d")
> \1\.\s\t\ (\d+)
> 
> '\s', '\t', '\1' are all special commands for regexes, you realy want is
> r'1\.st (\d+)'
> 
> only reserved regex characters should be escaped. [](){}.+?*$^\
> 
> 
>>                str = str[1:]
>>      return regex
>> 
>> 
> just a sidenote: "type" is a builtin function, which you shadow with
> your variable
> 
> but overall a nice idea..
> chris
> 

OK, I think this is a bit better, heven't tested yet:


def fmtstr2regex( str ):
	regex = ""
	while len(str):
		if str[0] == '%':
			# "%6.3f" and similar and not "%-4f" not handled!!
			x = re.match( "%(?P<len>\d*)(?P<type>\w)(?P<rest>.*)$", str )
			if not x:
				raise CommandException( "Invalid print format specifier %s" % str )
			if x.group("len"):
	  			length = int( x.group("len") )
			else:
  				length = None
			data_type = x.group("type")
			# Some types need to be changed from printf to regexp world
			if data_type == 'x':
				data_type = '['+string.hexdigits+']'
			else:
				data_type = "\\%s" % data_type
			regex += "(%s" % data_type
			if length:
				regex += "{%d,%d})" % ( length, length )
			else:
				regex += "+)"
			str = x.group('rest')
		else:
			if str[0] in "[](){}.+?*$^\\":
				regex += "\%s" % str[0]
			else: 
				regex += "%s" % str[0]
			str = str[1:]
	return regex



More information about the Python-list mailing list