Noob Parsing question

kai.peters at gmail.com kai.peters at gmail.com
Wed Feb 18 11:57:38 EST 2015


> >> > Given
> >> >
> >> > data = '{[<a=14^b=Fred^c=45.22^><a=22^b=Joe^><a=17^c=3.20^>][<a=72^b=Soup^>]}'
> >> >
> >> > How can I efficiently get dictionaries for each of the data blocks framed by <> ?
> >> >
> >> > Thanks for any help
> >>
> >> The question here is: What _can't_ happen? For instance, what happens
> >> if Fred's name contains a greater-than symbol, or a caret?
> >>
> >> If those absolutely cannot happen, your parser can be fairly
> >> straight-forward. Just put together some basic splitting (maybe a
> >> regex), and then split on the caret inside that. Otherwise, you may
> >> need a more stateful parser.
> >>
> >> ChrisA
> >
> > The data string is guaranteed to be clean - no such irregularities occur.
> 
> Okay!
> 
> (Side point: You've stripped off all citations, here, so it's not
> clear who said what. My shorthand signature isn't as useful as the
> full line identifying date, time, and person. It's polite to keep
> those lines, at least for the first level of quoting.)
> 
> What you want can be done with a regular expression. (Yes, yes, I
> know; now you have two problems.)
> 
> >>> data = '{[<a=14^b=Fred^c=45.22^><a=22^b=Joe^><a=17^c=3.20^>][<a=72^b=Soup^>]}'
> >>> re.findall("<.*?>",data)
> ['<a=14^b=Fred^c=45.22^>', '<a=22^b=Joe^>', '<a=17^c=3.20^>', '<a=72^b=Soup^>']
> 
> >From there, you can crack open the different pieces:
> 
> >>> for piece in re.findall("<.*?>",data):
> ...     d = {}
> ...     for elem in piece[1:-2].split("^"):
> ...         key, value = elem.split("=",1)
> ...         d[key] = value
> ...     print(d)
> ...
> {'c': '45.22', 'b': 'Fred', 'a': '14'}
> {'b': 'Joe', 'a': '22'}
> {'c': '3.20', 'a': '17'}
> {'b': 'Soup', 'a': '72'}
> 
> If you need some of those to be integers or floats, you'll need to do
> some post-processing on it, but this guarantees that you get the data
> out reliably. It depends on not having any of the special characters
> "=^<>" inside the elements, but other than that, it should be safe.
> 
> ChrisA

Thanks for your help - much appreciated!

KP



More information about the Python-list mailing list