String manipulation advice needed.

Bengt Richter bokr at oz.net
Wed Oct 13 19:59:01 EDT 2004


On Wed, 13 Oct 2004 22:26:23 +0200, "Fredrik Lundh" <fredrik at pythonware.com> wrote:

>"Raaijmakers, Vincent (GE Infrastructure)" wrote:
>
>> What is the easiest way of getting this information out of a string:
>
>making some reasonable assumptions, and generalising to any number
>of information instances:
>
>> foo = "My number 70 is what I want to parse"  => 70
>
>print re.findall("\d+", foo)
>
>> foo = "My info {info} between the curly brackets is what I want to parse" => {info}
>
>print re.findall("{[^}]+}", foo)
>
>> foo = "My  info between [hello world] is what I want to parse" => [hello world]
>
>print re.findall("\[[^]]+\]", foo)
>
>or, as a one-size-fits-all pattern:
>
>print re.findall("\d+|{[^}]+}|\[[^]]+\]", foo)
>
You can also tag the alternatives with names, so you can know what was found, e.g.,

 >>> foo = 'grabage {curly stuff} 123 [bracket stuff] 456'
 >>> import re
 >>> for m in re.finditer("(?P<dec>\d+)|(?P<curl>{[^}]+})|(?P<sqbk>\[[^]]+\])", foo):
 ...    for k,v in m.groupdict().items():
 ...       if v is None: continue
 ...       print '%6s: %r' % (k,v)
 ...
   curl: '{curly stuff}'
    dec: '123'
   sqbk: '[bracket stuff]'
    dec: '456'

Not for speed, I suppose ;-) (I can't recall the better way I think I did that once ;-)
BTW, also note that nested {}'s or []'s would cause problems.

Regards,
Bengt Richter



More information about the Python-list mailing list