[Doc-SIG] which characters to use for docstring markup

Edward D. Loper edloper@gradient.cis.upenn.edu
Fri, 06 Apr 2001 14:52:21 EDT


I've been a bit busy lately, but I'm still working on coming up with
a good markup language for docstrings...

I was trying to figure out which characters should be used for
markup..  (e.g., to delimit colored regions, etc).  And so I wrote 
a script to see who often different characters are used in 
docstrings, using all the docstrings in the standard library (well, 
actually, in /usr/local/lib/python2.0/*.py) as a "representative" 
sample.  Here are the results:

Character Count        Module Count     Character
--------------------------------------------------
   1                            1           ^H
  10                            3           ^M
  11                            4           ^ 
  12                            5           ~ 
  13                           10           { 
  13                           10           } 
  16                            6           % 
  28                            7           $ 
  48                           12           ? 
  50                           20           ! 
  70                           16           ` 
  75                            8           & 
  87                           12           \ 
 108                           18           + 
 130                           12           | 
 197                            7           @ 
 222                           22           * 
 229                           20           # 
 269                           35           ] 
 277                           36           [ 
 313                           44           = 
 331                           53           ; 
 421                           48           / 
 441                           46           " 
 514                           23           < 
 663                           67           : 
 779                           54           _ 
 875                           28           > 
1302                           75           ' 
1858                           94           ( 
1874                           94           ) 
2145                           97           , 
2277                           92           - 
3413                          110           . 

1. Any character(s) that are used for markup will have to be either
   backslashed/quoted whenever they are used, or will have to be
   only allowed in literal blocks.  Clearly, we want to keep either
   of these to a minimum.  
2. These results suggest that using perldoc style coloring, like
   B<this>, may not be the best idea, given that '<' and '>' are
   used so often.  This is because people often talk about orderings
   between elements, like x>y.  We might be better off using B{this} 
   instead.  '<' and '>' are used 53 times more frequently than
   '{' and '}'.
3. It makes much more sense to use "`" rather than "'" for 
   literals, since "'" occurs 18 times more often.  Of course, we
   would probably want to use *either* "`" for literals *or*
   something like L{literal} or C{code} or whatever.
4. You should keep in mind that any of these characters will be used
   in the docstring for *something* (well, actually, I was surprised
   to see a backspace in a docstring..).  So, for the most part, it's
   a matter of inconveniencing the least number of people the least
   amount of time..

I'm leaning towards using either::

    C{code}, E{emph} etc.

or::

    `literal` and *one* *word* *emph* (and that's it)

to color code in my markup.  Any comments?


-Edward

p.s., I'll probably have a preliminary description of my proposed
markup language in about 2 weeks.. I hope. :)