[Tutor] need help generating table of contents
Peter Otten
__peter__ at web.de
Mon Aug 27 14:43:50 EDT 2018
Albert-Jan Roskam wrote:
>
> From: Tutor <tutor-bounces+sjeik_appie=hotmail.com at python.org> on behalf
> of Peter Otten <__peter__ at web.de> Sent: Friday, August 24, 2018 3:55 PM
> To: tutor at python.org
> <snip>
>> The following reshuffle of your code seems to work:
>>
>> print('\r\n** Table of contents\r\n')
>> pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?'
>>
>> def process(triples, limit=None, indent=0):
>> for index, (title, page, count) in enumerate(triples, 1):
>> title = indent * 4 * ' ' + title
>> print(title.ljust(79, ".") + page.zfill(2))
>> if count:
>> process(triples, limit=int(count), indent=indent+1)
>> if limit is not None and limit == index:
>> break
>>
>> process(iter(re.findall(pattern, toc, re.DOTALL)))
>
> Hi Peter, Cameron,
>
> Thanks for your replies! The code above indeeed works as intended, but: I
> don't really understand *why*. I would assign a name to the following line
> "if limit is not None and limit == index", what would be the most
> descriptive name? I often use "is_*" names for boolean variables. Would
> "is_deepest_nesting_level" be a good name?
No, it's not necessarily the deepest level. Every subsection eventually ends
at this point; so you might call it
reached_end_of_current_section
Or just 'limit' ;)
The None is only there for the outermost level where no /Count is provided.
In this case the loop is exhausted.
If you find it is easier to understand you can calculate the outer count aka
limit as the number of matches - sum of counts:
def process(triples, section_length, indent=0):
for index, (title, page, count) in enumerate(triples, 1):
title = indent * 4 * ' ' + title
print(title.ljust(79, ".") + page.zfill(2))
if count:
process(triples, section_length=int(count), indent=indent+1)
if section_length == index:
break
triples = re.findall(pattern, toc, re.DOTALL)
toplevel_section_length = (
len(triples)
- sum(int(c or 0) for t, p, c in triples)
)
process(iter(triples), toplevel_section_length)
Just for fun here's one last variant that does away with the break -- and
thus the naming issue -- completely:
def process(triples, limit=None, indent=0):
for title, page, count in itertools.islice(triples, limit):
title = indent * 4 * ' ' + title
print(title.ljust(79, ".") + page.zfill(2))
if count:
process(triples, limit=int(count), indent=indent+1)
Note that islice(items, None) does the right thing:
>>> list(islice("abc", None))
['a', 'b', 'c']
> Also, I don't understand why iter() is required here, and why finditer()
> is not an alternative.
finditer() would actually work -- I didn't use it because I wanted to make
as few changes as possible to your code. What does not work is a list like
the result of findall(). This is because the inner for loops (i. e. the ones
in the nested calls of process) are supposed to continue the iteration
instead of restarting it. A simple example to illustrate the difference:
>>> s = "abcdefg"
>>> for k in range(3):
... print("===", k, "===")
... for i, v in enumerate(s):
... print(v)
... if i == 2: break
...
=== 0 ===
a
b
c
=== 1 ===
a
b
c
=== 2 ===
a
b
c
>>> s = iter("abcdefg")
>>> for k in range(3):
... print("===", k, "===")
... for i, v in enumerate(s):
... print(v)
... if i == 2: break
...
=== 0 ===
a
b
c
=== 1 ===
d
e
f
=== 2 ===
g
More information about the Tutor
mailing list