sscanf ?

Chris Liechti cliechti at gmx.net
Tue Nov 20 17:29:13 EST 2001


[posted and mailed]

"Bruce Edge" <bedge at troikanetworks.com> wrote in
news:2SzK7.605$px5.156976 at newsfeed.slurp.net: 
>> It wouldn't be hard to write a simple wrapping script that takes a
>> printf format string, converts it to a regular expression, does a
>> match against a string, then pulls out the arguments and converts them
>> to types as appropriate.  But one might just as well be using regular
>> expressions directly in the first place.
>> 
> 
> Here's what I did to convert the format string to a regex.
> It's not pretty, and many types will break it, but it serves the
> purpose, for now:
> 
> def fmtstr2regex( str ):
>      regex = ""
>      while len(str):
>           if str[0] == '%':
>                x = re.match("%(?P<len>\d*)(?P<type>\w)(?P<rest>.*)$",str)

you don't match "%6.3f" and similar and not "%-4f"

>                if not x:
>                     exc = CommandException()
>                     exc.reason = "Invalid print format specifier %s" %
>                     str raise exc

when you define an exception like this:
    	class CommandException(Exception): pass

you can simply call "raise CommandException("reason %s" % str)


>                length = int( x.group("len") )

needs an if here...

    	    	     if x.group("len"):
                     length = int( x.group("len") )
                 else:
                     length = None


>                type = x.group("type")
>                # Some types need to be changed from printf to regexp
>                world if type == 'x':
>                     type = '['+string.hexdigits+']'
>                else:
>                     type = "\\%s" % type

type 's' should also be transformed to 'w' ('\s' are whitespaces)

>                regex += "(%s" % type
>                if length:
>                     regex += "{%d,%d})" % ( length, length )
>                else:
>                     regex += "+)"
>                str = x.group('rest')
>           else:
>                regex += "\%s" % str[0]

whats the intent of '\' here if you want to escape the character you should 
write two (ok "\%" -> '\\%' but writing it explicit is better, like you did 
above). but escaping produces wrong regexes here.

>>> print fmtstr2regex("1.st %d")
\1\.\s\t\ (\d+)

'\s', '\t', '\1' are all special commands for regexes, you realy want is
r'1\.st (\d+)'

only reserved regex characters should be escaped. [](){}.+?*$^\


>                str = str[1:]
>      return regex
> 

just a sidenote: "type" is a builtin function, which you shadow with your 
variable

but overall a nice idea..
chris

-- 
Chris <cliechti at gmx.net>




More information about the Python-list mailing list