Help needed with nested parsing of file into objects

richard pullenjenna10 at gmail.com
Tue Jun 5 16:18:13 EDT 2012


On Jun 5, 8:50 pm, Eelco <hoogendoorn.ee... at gmail.com> wrote:
> > thank you both for your replies. Unfortunately it is a pre-existing
> > file format imposed by an external system that I can't
> > change. Thank you for the code snippet.
>
> Hi Richard,
>
> Despite the fact that it is a preexisting format, it is very close
> indeed to valid YAML code.
>
> Writing your own whitespace-aware parser can be a bit of a pain, but
> since YAML does this for you, I would argue the cleanest solution
> would be to bootstrap that functionality, rather than roll your own
> solution, or to resort to hard to maintain regex voodoo.
>
> Here is my solution. As a bonus, it directly constructs a custom
> object hierarchy (obviously you would want to expand on this, but the
> essentials are there). One caveat: at the moment, the conversion to
> YAML relies on the appparent convention that instances never directly
> contain other instances, and lists never directly contain lists. This
> means all instances are list entries and get a '-' appended, and this
> just works. If this is not a general rule, youd have to keep track of
> an enclosing scope stack an emit dashes based on that. Anyway, the
> idea is there, and I believe it to be one worth looking at.
>
> <code>
> import yaml
>
> class A(yaml.YAMLObject):
>     yaml_tag = u'!A'
>     def __init__(self, **kwargs):
>         self.__dict__.update(kwargs)
>     def __repr__(self):
>         return 'A' + str(self.__dict__)
>
> class B(yaml.YAMLObject):
>     yaml_tag = u'!B'
>     def __init__(self, **kwargs):
>         self.__dict__.update(kwargs)
>     def __repr__(self):
>         return 'B' + str(self.__dict__)
>
> class C(yaml.YAMLObject):
>     yaml_tag = u'!C'
>     def __init__(self, **kwargs):
>         self.__dict__.update(kwargs)
>     def __repr__(self):
>         return 'C' + str(self.__dict__)
>
> class TestArray(yaml.YAMLObject):
>     yaml_tag = u'!TestArray'
>     def __init__(self, **kwargs):
>         self.__dict__.update(kwargs)
>     def __repr__(self):
>         return 'TestArray' + str(self.__dict__)
>
> class myList(yaml.YAMLObject):
>     yaml_tag = u'!myList'
>     def __init__(self, **kwargs):
>         self.__dict__.update(kwargs)
>     def __repr__(self):
>         return 'myList' + str(self.__dict__)
>
> data = \
> """
> An instance of TestArray
>  a=a
>  b=b
>  c=c
>  List of 2 A elements:
>   Instance of A element
>    a=1
>    b=2
>    c=3
>   Instance of A element
>    d=1
>    e=2
>    f=3
>  List of 1 B elements
>   Instance of B element
>    a=1
>    b=2
>    c=3
>    List of 2 C elements
>     Instance of C element
>      a=1
>      b=2
>      c=3
>     Instance of C element
>      a=1
>      b=2
>      c=3
> An instance of TestArray
>  a=1
>  b=2
>  c=3
> """.strip()
>
> #remove trailing whitespace and seemingly erronous colon in line 5
> lines = ['  '+line.rstrip().rstrip(':') for line in data.split('\n')]
>
> def transform(lines):
>     """transform text line by line"""
>     for line in lines:
>         #regular mapping lines
>         if line.find('=') > 0:
>             yield line.replace('=', ': ')
>         #instance lines
>         p = line.find('nstance of')
>         if p > 0:
>             s = p + 11
>             e = line[s:].find(' ')
>             if e == -1: e = len(line[s:])
>             tag = line[s:s+e]
>             whitespace= line.partition(line.lstrip())[0]
>             yield whitespace[:-2]+' -'+ ' !'+tag
>         #list lines
>         p = line.find('List of')
>         if p > 0:
>             whitespace= line.partition(line.lstrip())[0]
>             yield whitespace[:-2]+'  '+ 'myList:'
>
> ##transformed = (transform( lines))
> ##for i,t in enumerate(transformed):
> ##    print '{:>3}{}'.format(i,t)
>
> transformed = '\n'.join(transform( lines))
> print transformed
>
> res = yaml.load(transformed)
> print res
> print yaml.dump(res)
> </code>

Hi Eelco many thanks for the reply / solution it definitely looks like
a clean way to go about it. However installing 3rd party libs like
yaml on the server I dont think is on the cards at the moment.



More information about the Python-list mailing list