Help needed with nested parsing of file into objects

Eelco hoogendoorn.eelco at gmail.com
Tue Jun 5 15:50:19 EDT 2012


> thank you both for your replies. Unfortunately it is a pre-existing
> file format imposed by an external system that I can't
> change. Thank you for the code snippet.

Hi Richard,

Despite the fact that it is a preexisting format, it is very close
indeed to valid YAML code.

Writing your own whitespace-aware parser can be a bit of a pain, but
since YAML does this for you, I would argue the cleanest solution
would be to bootstrap that functionality, rather than roll your own
solution, or to resort to hard to maintain regex voodoo.

Here is my solution. As a bonus, it directly constructs a custom
object hierarchy (obviously you would want to expand on this, but the
essentials are there). One caveat: at the moment, the conversion to
YAML relies on the appparent convention that instances never directly
contain other instances, and lists never directly contain lists. This
means all instances are list entries and get a '-' appended, and this
just works. If this is not a general rule, youd have to keep track of
an enclosing scope stack an emit dashes based on that. Anyway, the
idea is there, and I believe it to be one worth looking at.

<code>
import yaml

class A(yaml.YAMLObject):
    yaml_tag = u'!A'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'A' + str(self.__dict__)

class B(yaml.YAMLObject):
    yaml_tag = u'!B'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'B' + str(self.__dict__)

class C(yaml.YAMLObject):
    yaml_tag = u'!C'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'C' + str(self.__dict__)

class TestArray(yaml.YAMLObject):
    yaml_tag = u'!TestArray'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'TestArray' + str(self.__dict__)

class myList(yaml.YAMLObject):
    yaml_tag = u'!myList'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'myList' + str(self.__dict__)


data = \
"""
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element
   a=1
   b=2
   c=3
  Instance of A element
   d=1
   e=2
   f=3
 List of 1 B elements
  Instance of B element
   a=1
   b=2
   c=3
   List of 2 C elements
    Instance of C element
     a=1
     b=2
     c=3
    Instance of C element
     a=1
     b=2
     c=3
An instance of TestArray
 a=1
 b=2
 c=3
""".strip()

#remove trailing whitespace and seemingly erronous colon in line 5
lines = ['  '+line.rstrip().rstrip(':') for line in data.split('\n')]


def transform(lines):
    """transform text line by line"""
    for line in lines:
        #regular mapping lines
        if line.find('=') > 0:
            yield line.replace('=', ': ')
        #instance lines
        p = line.find('nstance of')
        if p > 0:
            s = p + 11
            e = line[s:].find(' ')
            if e == -1: e = len(line[s:])
            tag = line[s:s+e]
            whitespace= line.partition(line.lstrip())[0]
            yield whitespace[:-2]+' -'+ ' !'+tag
        #list lines
        p = line.find('List of')
        if p > 0:
            whitespace= line.partition(line.lstrip())[0]
            yield whitespace[:-2]+'  '+ 'myList:'

##transformed = (transform( lines))
##for i,t in enumerate(transformed):
##    print '{:>3}{}'.format(i,t)

transformed = '\n'.join(transform( lines))
print transformed

res = yaml.load(transformed)
print res
print yaml.dump(res)
</code>



More information about the Python-list mailing list