[Tutor] need help generating table of contents

Mon Aug 27 14:43:50 EDT 2018

Albert-Jan Roskam wrote:

> 
> From: Tutor <tutor-bounces+sjeik_appie=hotmail.com at python.org> on behalf
> of Peter Otten <__peter__ at web.de> Sent: Friday, August 24, 2018 3:55 PM
> To: tutor at python.org
> <snip>
>> The following reshuffle of your code seems to work:
>> 
>> print('\r\n** Table of contents\r\n')
>> pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?'
>> 
>> def process(triples, limit=None, indent=0):
>> for index, (title, page, count) in enumerate(triples, 1):
>> title = indent * 4 * ' ' + title
>> print(title.ljust(79, ".") + page.zfill(2))
>> if count:
>> process(triples, limit=int(count), indent=indent+1)
>> if limit is not None and limit == index:
>>  break
>> 
>> process(iter(re.findall(pattern, toc, re.DOTALL)))
> 
> Hi Peter, Cameron,
> 
> Thanks for your replies! The code above indeeed works as intended, but: I
> don't really understand *why*. I would assign a name to the following line
> "if limit is not None and limit == index", what would be the most
> descriptive name? I often use "is_*" names for boolean variables. Would
> "is_deepest_nesting_level" be a good name?

No, it's not necessarily the deepest level. Every subsection eventually ends 
at this point; so you might call it

reached_end_of_current_section

Or just 'limit' ;) 

The None is only there for the outermost level where no /Count is provided. 
In this case the loop is exhausted.

If you find it is easier to understand you can calculate the outer count aka 
limit as the number of matches - sum of counts:

def process(triples, section_length, indent=0):
    for index, (title, page, count) in enumerate(triples, 1):
        title = indent * 4 * ' ' + title
        print(title.ljust(79, ".") + page.zfill(2))
        if count:
            process(triples, section_length=int(count), indent=indent+1)
        if section_length == index:
            break

triples = re.findall(pattern, toc, re.DOTALL)
toplevel_section_length = (
    len(triples)
    - sum(int(c or 0) for t, p, c in triples)
)
process(iter(triples), toplevel_section_length)

Just for fun here's one last variant that does away with the break -- and 
thus the naming issue -- completely:

def process(triples, limit=None, indent=0):
    for title, page, count in itertools.islice(triples, limit):
        title = indent * 4 * ' ' + title
        print(title.ljust(79, ".") + page.zfill(2))
        if count:
            process(triples, limit=int(count), indent=indent+1)

Note that islice(items, None) does the right thing:

>>> list(islice("abc", None))
['a', 'b', 'c']

> Also, I don't understand why iter() is required here, and why finditer()
> is not an alternative.

finditer() would actually work -- I didn't use it because I wanted to make 
as few changes as possible to your code. What does not work is a list like 
the result of findall(). This is because the inner for loops (i. e. the ones 
in the nested calls of process) are supposed to continue the iteration 
instead of restarting it. A simple example to illustrate the difference:

 >>> s = "abcdefg"
>>> for k in range(3):
...     print("===", k, "===")
...     for i, v in enumerate(s):
...         print(v)
...         if i == 2: break
... 
=== 0 ===
a
b
c
=== 1 ===
a
b
c
=== 2 ===
a
b
c
>>> s = iter("abcdefg")
>>> for k in range(3):
...     print("===", k, "===")
...     for i, v in enumerate(s):
...         print(v)
...         if i == 2: break
... 
=== 0 ===
a
b
c
=== 1 ===
d
e
f
=== 2 ===
g