OrderedDict

silver0346 at gmail.com silver0346 at gmail.com
Fri May 20 01:15:19 EDT 2016


On Wednesday, May 18, 2016 at 2:25:16 PM UTC+2, Peter Otten wrote:
> Chris Angelico wrote:
> 
> > On Wed, May 18, 2016 at 7:28 PM, Peter Otten <__peter__ at web.de> wrote:
> >> I don't see an official way to pass a custom dict type to the library,
> >> but if you are not afraid to change its source code the following patch
> >> will allow you to access the value of dictionaries with a single entry as
> >> d[0]:
> >>
> >> $ diff -u py2b_xmltodict/local/lib/python2.7/site-packages/xmltodict.py
> >> py2_xmltodict/local/lib/python2.7/site-packages/xmltodict.py
> >> --- py2b_xmltodict/local/lib/python2.7/site-packages/xmltodict.py      
> >> 2016-05-18 11:18:44.000000000 +0200
> >> +++ py2_xmltodict/local/lib/python2.7/site-packages/xmltodict.py       
> >> 2016-05-18 11:11:13.417665697 +0200 @@ -35,6 +35,13 @@
> >>  __version__ = '0.10.1'
> >>  __license__ = 'MIT'
> >>
> >> +_OrderedDict = OrderedDict
> >> +class OrderedDict(_OrderedDict):
> >> +    def __getitem__(self, key):
> >> +        if key == 0:
> >> +            [result] = self.values()
> >> +            return result
> >> +        return _OrderedDict.__getitem__(self, key)
> >>
> >>  class ParsingInterrupted(Exception):
> >>      pass
> > 
> > Easier than patching might be monkeypatching.
> > 
> > class OrderedDict(OrderedDict):
> >     ... getitem code as above ...
> > xmltodict.OrderedDict = OrderedDict
> > 
> > Try it, see if it works.
> 
> It turns out I was wrong on (at least) two accounts: 
> 
> - xmltodict does offer a way to specify the dict type
> - the proposed dict implementation will not solve the OP's problem
> 
> Here is an improved fix which should work:
> 
> 
> $ cat sample.xml 
> <?xml version="1.0" encoding="utf-8" ?>
> <profiles>
>   <profile id='visio02' revision='2015051501' >
>   <package package-id='0964-gpg4win' />
>   </profile>
> </profiles>
> $ cat sample2.xml 
> <?xml version="1.0" encoding="utf-8" ?>
> <profiles>
>   <profile id='visio02' revision='2015051501' >
>   <package package-id='0964-gpg4win' />
>   <package package-id='0965-gpg4win' />
>   </profile>
> </profiles>
> $ cat demo.py
> import collections
> import sys
> import xmltodict
> 
> 
> class MyOrderedDict(collections.OrderedDict):
>     def __getitem__(self, key):
>         if key == 0 and len(self) == 1:
>             return self
>         return super(MyOrderedDict, self).__getitem__(key)
> 
> 
> def main():
>     filename = sys.argv[1]
>     with open(filename) as f:
>         doc = xmltodict.parse(f.read(), dict_constructor=MyOrderedDict)
> 
>     print "doc:\n{}\n".format(doc)
>     print "package-id: {}".format(
>         doc['profiles']['profile']['package'][0]['@package-id'])
> 
> 
> if __name__ == "__main__":
>     main()
> $ python demo.py sample.xml 
> doc:
> MyOrderedDict([(u'profiles', MyOrderedDict([(u'profile', 
> MyOrderedDict([(u'@id', u'visio02'), (u'@revision', u'2015051501'), 
> (u'package', MyOrderedDict([(u'@package-id', u'0964-gpg4win')]))]))]))])
> 
> package-id: 0964-gpg4win
> $ python demo.py sample2.xml 
> doc:
> MyOrderedDict([(u'profiles', MyOrderedDict([(u'profile', 
> MyOrderedDict([(u'@id', u'visio02'), (u'@revision', u'2015051501'), 
> (u'package', [MyOrderedDict([(u'@package-id', u'0964-gpg4win')]), 
> MyOrderedDict([(u'@package-id', u'0965-gpg4win')])])]))]))])
> 
> package-id: 0964-gpg4win

I have tested the first solution. Works nice. Before I used xml.etree to parse 2000 xml files. 

Execution time decrease from more then 5 min to 20 sec. Great. On weekend I will test the solution with the own class.

Many thanks.



More information about the Python-list mailing list