Parsing Nested List

Sun Feb 4 23:06:27 EST 2018

On Sunday, February 4, 2018 at 5:32:51 PM UTC-6, Stanley Denman wrote:
> On Sunday, February 4, 2018 at 4:26:24 PM UTC-6, Stanley Denman wrote:
> > I am trying to parse a Python nested list that is the result of the getOutlines() function of module PyPFD2 using pyparsing module. This is the result I get. what in the world are 'expandtabs' and why is that making a difference to my parse attempt?
> > 
> > Python Code
> > 7
> > import PPDF2,pyparsing
> > from pyparsing import Word, alphas, nums
> > pdfFileObj=open('x.pdf','rb')
> > pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
> > List=pdfReader.getOutlines()
> > myparser = Word( alphas ) + Word(nums, exact=2) +"of" + Word(nums, exact=2)
> > myparser.parseString(List)
> > 
> > This is the error I get:
> > 
> > Traceback (most recent call last):
> >   File "<pyshell#23>", line 1, in <module>
> >     myparser.parseString(List)
> >   File "C:\python\lib\site-packages\pyparsing.py", line 1620, in parseString
> >     instring = instring.expandtabs()
> > AttributeError: 'list' object has no attribute 'expandtabs'
> > 
> > Thanks so much, not getting any helpful responses from https://python-forum.io.

I have found that I can use the index values in the list to print out the section I need.  So print(MyList[7]) get me to section f taht I want.  print(MyList[9][1]) for example give me a string that is the bookmark entry for Exhibit 1F.  But this index value would presumeably be different for each pdf file - that is there may not always be Section A-E, but there will always be a Section F. In ther words, the index values that get me to the right section would be different in each pdf file.