Is this a bug?

David Bolen db3l at fitlinxx.com
Sat May 12 20:27:08 EDT 2001


costas at meezon.com (Costas Menico) writes:

> Actually my dream r-architecture is quite simple. Take anything that
> starts and ends with two quotes to be the literal sring. 
> 
> x=''abcdef'''  -> abcdef'
> x=''abc'xyz''  -> abc'xyz
> x=''abc\'xyz'' -> abc\'xyz

Of course, each of these examples are fine with existing raw
triple-quoted strings (not even raw in the first two) which
essentially does precisely what you say - goes until it finds the same
marker (three quotes).  But these examples don't really highlight the
same sorts of special cases or boundary conditions any such quoting
mechanism is bound to have lying around.

For example, in your syntax above, how could I enclose two quotes
within the string itself?  If I tried something like:

    x = ''abc''xyz''

Would I only get the initial abc and then a parse error on the xyz''?

Presuming you say I'd have to quote the internal '' somehow, you've
introduced a quoting scheme and then how will you handle when that
quoting scheme occurs just prior to the final terminator (e.g., the
same issue as the final backslash in a raw string).  Before long
you'll likely be making similar tradeoffs to Python's current methods.
(And sure, triple quoted strings also need to quote an internal use of
triple quotes)

You mentioned in an earlier post that your main problem was in
automatically generated code, such as in HTML pages.  Automatically
generated code should be the simplest because you can easily and
algorithmically enforce quoting rules.  And raw strings are pretty
simple - just make a string raw (triple quoted if you want newlines)
and then special case a trailing backslash, which should be the only
special case.

Triple quoted raw strings seem to have awfully simple rules to me -
everything is included, a backslash is still parsed as a "quote"
character but it and the following character remain in the string.
The two special cases I see are the official one (the trailing
backslash case), and the need to check if your text contains the
terminator itself (triple quotes - unlikely unless it's Python code
yourself you're quoting).  Section 2.4.1 in the language reference
covers this pretty well.

This explanation covers your your prior example as well:

    x=r"""\
    ... """
    >>> x
    '\\\012'
    >>> 

In this case the raw triple quoted string parsed as a backslash and a
newline, both of which remain in the final string.  (Note don't be
confused by the extra \ in the output - the interpreter uses repr() by
default, so the backslash was quoted in the presentation, but not in
the memory representation, e.g.,:

    >>> len(x)
    2
    >>> print x
    \

    >>>

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list