syntax difference

Mon Jun 18 08:07:50 EDT 2018

On 18/06/2018 12:33, Chris Angelico wrote:
> On Mon, Jun 18, 2018 at 9:16 PM, Bart <bc at freeuk.com> wrote:

>> What will those look like? If copyright/licence comments have their own
>> specific syntax, then they just become another token which has to be
>> recognised.
> 
> If they have specific syntax, they're not comments, are they?

So how is it possible for ANY program to determine what kind of comments 
they are?

I've used 'smart' comments myself, which contain special information, 
but are also designed to be very easily detected by the simplest of 
programs which scan the source code. For that purpose, they might start 
with a special prefix so that it is not necessary to parse the special 
information, but just to detect the prefix.

For example, comments that start with #T# (and in my case, that begin at 
the start of a line). Funnily enough, this also provided type 
information (although for different purposes than what is discussed here).

>> The main complication I can see is that, if this is really a one-time
>> source-to-source translator so that you will be working with the result,
>> then usually you will want to keep the comments.
>>
>> Then it is a question of more precisely defining the task that such a
>> translator is to perform.
> 
> Right, exactly. So you need to do an actual smart parse, which - as
> mentioned - is functionally equivalent whether you're stripping
> comments or some lexical token.

The subject is type annotation. Presumably there is some way to 
distinguish such a type annotation within a comment from a regular 
comment? Such as the marker I suggested above.

Then the tokeniser just needs to detect that kind of comment rather than 
need to understand the contents.

Although the tokeniser will need to work a little differently by 
maintaining the positions of all tokens within the line, information 
that is usually discarded.

-- 
bart