Quotation Ugliness

Tue Nov 25 19:44:59 EST 2014

On Wed, Nov 26, 2014 at 11:18 AM, Tim Daneliuk <tundra at tundraware.com> wrote:
> A problem for your consideration:
>
> We are given a tuple of delimiter string pairs to quote or comment text,
> possibly over multiple lines.   Something like this:
>
>     delims = (('"', '"'), ("'", "'"), ('#', '\n'), ("\*", "*\), ('\\', '\n')
> ...)
>
> These may be nested.
>
> Here's the problem:  Determine is the string S appears *outside* or *inside*
> any such quotation.

Be aware that, according to the usual definitions, some of those don't
nest... and others may or may not. If you're talking about comment
characters, for instance, /* /* */ */ sometimes nests and sometimes
doesn't; but # # \n \n definitely doesn't nest (two hashes on a line
doesn't extend the comment to another line). Also, they're usually
slashes, not backslashes.

But if your definitions are simple and strict (any of these may be
nested and to any level) and correct (the tuple is exactly the pairs
you're after), then your best bet would be to step through the data,
building up a stack (probably in a Python list) of currently-open
delimiters. As you find open-quote markers, you add to the stack; any
time you find the top-of-stack closing delimiter, you pop it off. When
you find the string S, if the stack is empty, it's outside; otherwise,
it's inside.

You may have issues with your definition of nesting, though. For
instance, what's it mean if you have double-quotes, then a hash? In
normal programming, the hash isn't significant. If that's your
definition, then the only nesting you need worry about is /* and */,
so your parser is quite simple: when you find any opener, you seek its
corresponding closer, and then special-case /* to count any additional
/* and look for a */ for each one */ .

At the moment, the problem's a little underspecified.

ChrisA