Hot to split string literals that will across two or more lines ?

Steven D'Aprano steve at REMOVETHIScyber.com.au
Sat Nov 19 23:43:02 EST 2005


On Sat, 19 Nov 2005 22:51:04 -0500, Mike Meyer wrote:

> Steven D'Aprano <steve at REMOVETHIScyber.com.au> writes:
>> Brackets include:
>>
>> parentheses or round brackets ( )
>> square brackets [ ]
>> braces or curly brackets { }
>> chevrons or angle brackets 〈 〉
>>
>> The symbols for chevrons are not available on common keyboards, are not
>> available in ordinary ASCII, and may not show up correctly in many
>> typefaces, so a common alternative is to substitute less than and greater
>> than signs < > as brackets. HTML and XML use that convention.
> 
> Hmm. I'm used to seeing "angle brackets" - aka brokets - used to refer
> to </>. That may be the convention you mention leaking across, though.
> 
> You imply that HTML/XML might use chevrons. I don't think that's the
> case. They inherit their start/end tag characters from SGML's
> default.

No! That's not what I said.

SGML-derived languages use greater-than and less-than symbols < > as if
they were brackets. That's practicality beats purity: true chevrons are
not available in ASCII or on common keyboards, making them difficult to
use. People commonly call < and > "angle brackets" in the context of HTML
etc. but they aren't really, they are mathematical comparison operator
signs.

Proper chevrons are narrower and taller. The difference between 〈 〉
and < > is *very* obvious in the font I'm using, although of course not
all fonts use the proper glyphs. If it helps, the angle on the inside of
the chevron is about 120 degrees, compared to maybe 60 degrees for the
comparison operators. Again, this depends on the precise glyph being used.

True angle brackets are available in Unicode at code points 9001 and 9002,
(0x2329 and 0x232A). The less-than and greater-than symbols can be found
in both Unicode and ASCII at code points 60 and 62 (0x003C and 0x003E).


I did warn in my earlier post that I was being pedantic. In common usage,
I'll describe <tag> as using angle brackets, just as I'll describe "quote"
as using quote marks. They're not actually: they are double-prime marks,
and ' is a prime mark not an apostrophe or quote mark. Prime and
double-prime marks are also known as foot and inch marks, although *real*
pedants would argue that there is a difference between them too. (I think
they get separate Unicode points.)


It is easy to get confused when it comes to characters, because there are
three separate but related things to keep in mind. Firstly, there is the
glyph or picture used, which differs according to the font and type-style.
Two different glyphs can represent the same symbol, and two identical
glyphs can represent different symbols.

Secondly, there is the semantic meaning of the character: a dash and a
hyphen are both horizontal lines, but they have very different meanings.
Dashes -- sometimes faked with two hyphens in a row like this -- are used
as separators, and hyphens are used to join compound words like
fire-fighting or anti-matter.

Thirdly, there is the specific implementation of the character. ASCII
defines only 127 characters, a good thirty-plus being invisible control
characters. Eight-bit extensions to ASCII unfortunately vary between each
other: the character 176 (0xB0) is the degree symbol in the Latin-1
encoding (ISO 8859-1) but the infinity symbol in the MacRoman encoding.



-- 
Steven.




More information about the Python-list mailing list