Block comments

Tue Dec 11 13:55:16 EST 2007

MartinRinehart at gmail.com a écrit :
> Tomorrow is block comment day. I want them to nest. I think the reason
> that they don't routinely nest is that it's a lot of trouble to code.

Indeed.

> Two questions:
> 
> 1) Given a start and end location (line position and char index) in an
> array of lines of text, how do you Pythonly extract the whole block
> comment? (Goal: not to have Bruno accusing me - correctly - of writing
> C in Python.)

Is the array of lines the appropriate data structure here ?

> 2) My tokenizer has a bunch of module-level constants including ones
> that define block comment starts/ends. Suppose I comment that code
> out. This is the situation:
> 
> /* start of block comment
> ...
> BLOCK_COMMENT_END_CHARS = '*/'
> ...
> end of block comment */
> 
> Is this the reason for """?

Triple-quoted strings are not comments, they are a way to build 
multilines string litterals. The fact is that they are commonly used for 
doctrings - for obvious reasons - but then it's the position of this 
string litteral that makes it a docstring, not the fact that it's 
triple-quoted.

wrt/ your above example, making it a legal construct imply that you 
should not consider the block start/end markers as comment markers if 
they are enclosed in string-litteral markers.

Now this doesn't solve the problem of nested block comments. Here, I 
guess the solution would be to only allow fully nested block comments - 
that is, the nested block *must* be opened *and* closed within the 
parent block. In which case it should not be harder to parse than any 
other nested construct.

While we're at it, you may not know but there are already a couple 
Python packages for building tokenizers/parsers - could it be the case 
that you're guilty of ReinventingTheSquaredWheel(tm) ?-)

My 2 cents...