[Python-3000-checkins] r57824 - in python/branches/py3k/Doc: documenting/index.rst reference/introduction.rst reference/lexical_analysis.rst

georg.brandl python-3000-checkins at python.org
Fri Aug 31 10:07:46 CEST 2007


Author: georg.brandl
Date: Fri Aug 31 10:07:45 2007
New Revision: 57824

Modified:
   python/branches/py3k/Doc/documenting/index.rst
   python/branches/py3k/Doc/reference/introduction.rst
   python/branches/py3k/Doc/reference/lexical_analysis.rst
Log:
Update the first two parts of the reference manual for Py3k,
mainly concerning PEPs 3131 and 3120.


Modified: python/branches/py3k/Doc/documenting/index.rst
==============================================================================
--- python/branches/py3k/Doc/documenting/index.rst	(original)
+++ python/branches/py3k/Doc/documenting/index.rst	Fri Aug 31 10:07:45 2007
@@ -27,6 +27,7 @@
    style.rst
    rest.rst
    markup.rst
+   fromlatex.rst
    sphinx.rst
 
 .. XXX add credits, thanks etc.

Modified: python/branches/py3k/Doc/reference/introduction.rst
==============================================================================
--- python/branches/py3k/Doc/reference/introduction.rst	(original)
+++ python/branches/py3k/Doc/reference/introduction.rst	Fri Aug 31 10:07:45 2007
@@ -22,11 +22,12 @@
 
 It is dangerous to add too many implementation details to a language reference
 document --- the implementation may change, and other implementations of the
-same language may work differently.  On the other hand, there is currently only
-one Python implementation in widespread use (although alternate implementations
-exist), and its particular quirks are sometimes worth being mentioned,
-especially where the implementation imposes additional limitations.  Therefore,
-you'll find short "implementation notes" sprinkled throughout the text.
+same language may work differently.  On the other hand, CPython is the one
+Python implementation in widespread use (although alternate implementations
+continue to gain support), and its particular quirks are sometimes worth being
+mentioned, especially where the implementation imposes additional limitations.
+Therefore, you'll find short "implementation notes" sprinkled throughout the
+text.
 
 Every Python implementation comes with a number of built-in and standard
 modules.  These are documented in :ref:`library-index`.  A few built-in modules
@@ -88,11 +89,7 @@
 Notation
 ========
 
-.. index::
-   single: BNF
-   single: grammar
-   single: syntax
-   single: notation
+.. index:: BNF, grammar, syntax, notation
 
 The descriptions of lexical analysis and syntax use a modified BNF grammar
 notation.  This uses the following style of definition:
@@ -118,9 +115,7 @@
 rules with many alternatives may be formatted alternatively with each line after
 the first beginning with a vertical bar.
 
-.. index::
-   single: lexical definitions
-   single: ASCII at ASCII
+.. index:: lexical definitions, ASCII
 
 In lexical definitions (as the example above), two more conventions are used:
 Two literal characters separated by three dots mean a choice of any single

Modified: python/branches/py3k/Doc/reference/lexical_analysis.rst
==============================================================================
--- python/branches/py3k/Doc/reference/lexical_analysis.rst	(original)
+++ python/branches/py3k/Doc/reference/lexical_analysis.rst	Fri Aug 31 10:07:45 2007
@@ -5,38 +5,16 @@
 Lexical analysis
 ****************
 
-.. index::
-   single: lexical analysis
-   single: parser
-   single: token
+.. index:: lexical analysis, parser, token
 
 A Python program is read by a *parser*.  Input to the parser is a stream of
 *tokens*, generated by the *lexical analyzer*.  This chapter describes how the
 lexical analyzer breaks a file into tokens.
 
-Python uses the 7-bit ASCII character set for program text.
-
-.. versionadded:: 2.3
-   An encoding declaration can be used to indicate that  string literals and
-   comments use an encoding different from ASCII.
-
-For compatibility with older versions, Python only warns if it finds 8-bit
-characters; those warnings should be corrected by either declaring an explicit
-encoding, or using escape sequences if those bytes are binary data, instead of
-characters.
-
-The run-time character set depends on the I/O devices connected to the program
-but is generally a superset of ASCII.
-
-**Future compatibility note:** It may be tempting to assume that the character
-set for 8-bit characters is ISO Latin-1 (an ASCII superset that covers most
-western languages that use the Latin alphabet), but it is possible that in the
-future Unicode text editors will become common.  These generally use the UTF-8
-encoding, which is also an ASCII superset, but with very different use for the
-characters with ordinals 128-255.  While there is no consensus on this subject
-yet, it is unwise to assume either Latin-1 or UTF-8, even though the current
-implementation appears to favor Latin-1.  This applies both to the source
-character set and the run-time character set.
+Python reads program text as Unicode code points; the encoding of a source file
+can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120`
+for details.  If the source file cannot be decoded, a :exc:`SyntaxError` is
+raised.
 
 
 .. _line-structure:
@@ -44,21 +22,17 @@
 Line structure
 ==============
 
-.. index:: single: line structure
+.. index:: line structure
 
 A Python program is divided into a number of *logical lines*.
 
 
-.. _logical:
+.. _logical-lines:
 
 Logical lines
 -------------
 
-.. index::
-   single: logical line
-   single: physical line
-   single: line joining
-   single: NEWLINE token
+.. index:: logical line, physical line, line joining, NEWLINE token
 
 The end of a logical line is represented by the token NEWLINE.  Statements
 cannot cross logical line boundaries except where NEWLINE is allowed by the
@@ -67,7 +41,7 @@
 implicit *line joining* rules.
 
 
-.. _physical:
+.. _physical-lines:
 
 Physical lines
 --------------
@@ -89,9 +63,7 @@
 Comments
 --------
 
-.. index::
-   single: comment
-   single: hash character
+.. index:: comment, hash character
 
 A comment starts with a hash character (``#``) that is not part of a string
 literal, and ends at the end of the physical line.  A comment signifies the end
@@ -104,9 +76,7 @@
 Encoding declarations
 ---------------------
 
-.. index::
-   single: source character set
-   single: encodings
+.. index:: source character set, encodings
 
 If a comment in the first or second line of the Python script matches the
 regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an
@@ -119,19 +89,19 @@
 
    # vim:fileencoding=<encoding-name>
 
-which is recognized by Bram Moolenaar's VIM. In addition, if the first bytes of
-the file are the UTF-8 byte-order mark (``'\xef\xbb\xbf'``), the declared file
-encoding is UTF-8 (this is supported, among others, by Microsoft's
-:program:`notepad`).
+which is recognized by Bram Moolenaar's VIM.
+
+If no encoding declaration is found, the default encoding is UTF-8.  In
+addition, if the first bytes of the file are the UTF-8 byte-order mark
+(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported,
+among others, by Microsoft's :program:`notepad`).
 
 If an encoding is declared, the encoding name must be recognized by Python. The
-encoding is used for all lexical analysis, in particular to find the end of a
-string, and to interpret the contents of Unicode literals. String literals are
-converted to Unicode for syntactical analysis, then converted back to their
-original encoding before interpretation starts. The encoding declaration must
-appear on a line of its own.
+encoding is used for all lexical analysis, including string literals, comments
+and identifiers. The encoding declaration must appear on a line of its own.
 
-.. % XXX there should be a list of supported encodings.
+A list of standard encodings can be found in the section
+:ref:`standard-encodings`.
 
 
 .. _explicit-joining:
@@ -139,21 +109,13 @@
 Explicit line joining
 ---------------------
 
-.. index::
-   single: physical line
-   single: line joining
-   single: line continuation
-   single: backslash character
+.. index:: physical line, line joining, line continuation, backslash character
 
 Two or more physical lines may be joined into logical lines using backslash
 characters (``\``), as follows: when a physical line ends in a backslash that is
 not part of a string literal or comment, it is joined with the following forming
 a single logical line, deleting the backslash and the following end-of-line
-character.  For example:
-
-.. % 
-
-::
+character.  For example::
 
    if 1900 < year < 2100 and 1 <= month <= 12 \
       and 1 <= day <= 31 and 0 <= hour < 24 \
@@ -197,9 +159,9 @@
 A logical line that contains only spaces, tabs, formfeeds and possibly a
 comment, is ignored (i.e., no NEWLINE token is generated).  During interactive
 input of statements, handling of a blank line may differ depending on the
-implementation of the read-eval-print loop.  In the standard implementation, an
-entirely blank logical line (i.e. one containing not even whitespace or a
-comment) terminates a multi-line statement.
+implementation of the read-eval-print loop.  In the standard interactive
+interpreter, an entirely blank logical line (i.e. one containing not even
+whitespace or a comment) terminates a multi-line statement.
 
 
 .. _indentation:
@@ -207,14 +169,7 @@
 Indentation
 -----------
 
-.. index::
-   single: indentation
-   single: whitespace
-   single: leading whitespace
-   single: space
-   single: tab
-   single: grouping
-   single: statement grouping
+.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping
 
 Leading whitespace (spaces and tabs) at the beginning of a logical line is used
 to compute the indentation level of the line, which in turn is used to determine
@@ -238,9 +193,7 @@
 in the leading whitespace have an undefined effect (for instance, they may reset
 the space count to zero).
 
-.. index::
-   single: INDENT token
-   single: DEDENT token
+.. index:: INDENT token, DEDENT token
 
 The indentation levels of consecutive lines are used to generate INDENT and
 DEDENT tokens, using a stack, as follows.
@@ -315,22 +268,48 @@
 Identifiers and keywords
 ========================
 
-.. index::
-   single: identifier
-   single: name
+.. index:: identifier, name
 
 Identifiers (also referred to as *names*) are described by the following lexical
 definitions:
 
-.. productionlist::
-   identifier: (`letter`|"_") (`letter` | `digit` | "_")*
-   letter: `lowercase` | `uppercase`
-   lowercase: "a"..."z"
-   uppercase: "A"..."Z"
-   digit: "0"..."9"
+The syntax of identifiers in Python is based on the Unicode standard annex
+UAX-31, with elaboration and changes as defined below.
+
+Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
+are the same as in Python 2.5; Python 3.0 introduces additional
+characters from outside the ASCII range (see :pep:`3131`).  For other
+characters, the classification uses the version of the Unicode Character
+Database as included in the :mod:`unicodedata` module.
 
 Identifiers are unlimited in length.  Case is significant.
 
+.. productionlist::
+   identifier: `id_start` `id_continue`*
+   id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl,
+              the underscore, and characters with the Other_ID_Start property>
+   id_continue: <all characters in `id_start`, plus characters in the categories
+                 Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
+
+The Unicode category codes mentioned above stand for:
+
+* *Lu* - uppercase letters
+* *Ll* - lowercase letters
+* *Lt* - titlecase letters
+* *Lm* - modifier letters
+* *Lo* - other letters
+* *Nl* - letter numbers
+* *Mn* - nonspacing marks
+* *Mc* - spacing combining marks
+* *Nd* - decimal numbers
+* *Pc* - connector punctuations
+
+All identifiers are converted into the normal form NFC while parsing; comparison
+of identifiers is based on NFC.
+
+A non-normative HTML file listing all valid identifier characters for Unicode
+4.1 can be found at
+http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.
 
 .. _keywords:
 
@@ -345,25 +324,13 @@
 language, and cannot be used as ordinary identifiers.  They must be spelled
 exactly as written here::
 
-   and       def       for       is        raise
-   as        del       from      lambda    return
-   assert    elif      global    not       try
-   break     else      if        or        while
-   class     except    import    pass      with
-   continue  finally   in        print     yield
-
-.. versionchanged:: 2.4
-   :const:`None` became a constant and is now recognized by the compiler as a name
-   for the built-in object :const:`None`.  Although it is not a keyword, you cannot
-   assign a different object to it.
-
-.. versionchanged:: 2.5
-   Both :keyword:`as` and :keyword:`with` are only recognized when the
-   ``with_statement`` future feature has been enabled. It will always be enabled in
-   Python 2.6.  See section :ref:`with` for details.  Note that using :keyword:`as`
-   and :keyword:`with` as identifiers will always issue a warning, even when the
-   ``with_statement`` future directive is not in effect.
-
+   False      class      finally    is         return
+   None       continue   for        lambda     try
+   True       def        from       nonlocal   while
+   and        del        global     not        with
+   as         elif       if         or         yield
+   assert     else       import     pass
+   break      except     in         raise
 
 .. _id-classes:
 
@@ -405,71 +372,71 @@
 Literals
 ========
 
-.. index::
-   single: literal
-   single: constant
+.. index:: literal, constant
 
 Literals are notations for constant values of some built-in types.
 
 
 .. _strings:
 
-String literals
----------------
+String and Bytes literals
+-------------------------
 
-.. index:: single: string literal
+.. index:: string literal, bytes literal, ASCII
 
 String literals are described by the following lexical definitions:
 
-.. index:: single: ASCII at ASCII
-
 .. productionlist::
    stringliteral: [`stringprefix`](`shortstring` | `longstring`)
-   stringprefix: "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"
+   stringprefix: "r" | "R"
    shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
-   longstring: ""'" `longstringitem`* ""'"
-             : | '"""' `longstringitem`* '"""'
-   shortstringitem: `shortstringchar` | `escapeseq`
-   longstringitem: `longstringchar` | `escapeseq`
+   longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
+   shortstringitem: `shortstringchar` | `stringescapeseq`
+   longstringitem: `longstringchar` | `stringescapeseq`
    shortstringchar: <any source character except "\" or newline or the quote>
    longstringchar: <any source character except "\">
-   escapeseq: "\" <any ASCII character>
+   stringescapeseq: "\" <any source character>
+
+.. productionlist::
+   bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
+   bytesprefix: "b" | "B"
+   shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
+   longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
+   shortbytesitem: `shortbyteschar` | `bytesescapeseq`
+   longbytesitem: `longbyteschar` | `bytesescapeseq`
+   shortbyteschar: <any ASCII character except "\" or newline or the quote>
+   longbyteschar: <any ASCII character except "\">
+   bytesescapeseq: "\" <any ASCII character>
 
 One syntactic restriction not indicated by these productions is that whitespace
-is not allowed between the :token:`stringprefix` and the rest of the string
-literal. The source character set is defined by the encoding declaration; it is
-ASCII if no encoding declaration is given in the source file; see section
-:ref:`encodings`.
+is not allowed between the :token:`stringprefix` or :token:`bytesprefix` and the
+rest of the literal. The source character set is defined by the encoding
+declaration; it is UTF-8 if no encoding declaration is given in the source file;
+see section :ref:`encodings`.
 
-.. index::
-   single: triple-quoted string
-   single: Unicode Consortium
-   single: string; Unicode
-   single: raw string
+.. index:: triple-quoted string, Unicode Consortium, raw string
 
-In plain English: String literals can be enclosed in matching single quotes
+In plain English: Both types of literals can be enclosed in matching single quotes
 (``'``) or double quotes (``"``).  They can also be enclosed in matching groups
 of three single or double quotes (these are generally referred to as
 *triple-quoted strings*).  The backslash (``\``) character is used to escape
 characters that otherwise have a special meaning, such as newline, backslash
-itself, or the quote character.  String literals may optionally be prefixed with
-a letter ``'r'`` or ``'R'``; such strings are called :dfn:`raw strings` and use
-different rules for interpreting backslash escape sequences.  A prefix of
-``'u'`` or ``'U'`` makes the string a Unicode string.  Unicode strings use the
-Unicode character set as defined by the Unicode Consortium and ISO 10646.  Some
-additional escape sequences, described below, are available in Unicode strings.
-The two prefix characters may be combined; in this case, ``'u'`` must appear
-before ``'r'``.
+itself, or the quote character.
+
+String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``;
+such strings are called :dfn:`raw strings` and use different rules for
+interpreting backslash escape sequences.
+
+Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
+instance of the :class:`bytes` type instead of the :class:`str` type.  They
+may only contain ASCII characters; bytes with a numeric value of 128 or greater
+must be expressed with escapes.
 
 In triple-quoted strings, unescaped newlines and quotes are allowed (and are
 retained), except that three unescaped quotes in a row terminate the string.  (A
 "quote" is the character used to open the string, i.e. either ``'`` or ``"``.)
 
-.. index::
-   single: physical line
-   single: escape sequence
-   single: Standard C
-   single: C
+.. index:: physical line, escape sequence, Standard C, C
 
 Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in strings are
 interpreted according to rules similar to those used by Standard C.  The
@@ -478,7 +445,7 @@
 +-----------------+---------------------------------+-------+
 | Escape Sequence | Meaning                         | Notes |
 +=================+=================================+=======+
-| ``\newline``    | Ignored                         |       |
+| ``\newline``    | Backslash and newline ignored   |       |
 +-----------------+---------------------------------+-------+
 | ``\\``          | Backslash (``\``)               |       |
 +-----------------+---------------------------------+-------+
@@ -494,83 +461,83 @@
 +-----------------+---------------------------------+-------+
 | ``\n``          | ASCII Linefeed (LF)             |       |
 +-----------------+---------------------------------+-------+
-| ``\N{name}``    | Character named *name* in the   |       |
-|                 | Unicode database (Unicode only) |       |
-+-----------------+---------------------------------+-------+
 | ``\r``          | ASCII Carriage Return (CR)      |       |
 +-----------------+---------------------------------+-------+
 | ``\t``          | ASCII Horizontal Tab (TAB)      |       |
 +-----------------+---------------------------------+-------+
-| ``\uxxxx``      | Character with 16-bit hex value | \(1)  |
-|                 | *xxxx* (Unicode only)           |       |
-+-----------------+---------------------------------+-------+
-| ``\Uxxxxxxxx``  | Character with 32-bit hex value | \(2)  |
-|                 | *xxxxxxxx* (Unicode only)       |       |
-+-----------------+---------------------------------+-------+
 | ``\v``          | ASCII Vertical Tab (VT)         |       |
 +-----------------+---------------------------------+-------+
-| ``\ooo``        | Character with octal value      | (3,5) |
+| ``\ooo``        | Character with octal value      | (1,3) |
 |                 | *ooo*                           |       |
 +-----------------+---------------------------------+-------+
-| ``\xhh``        | Character with hex value *hh*   | (4,5) |
+| ``\xhh``        | Character with hex value *hh*   | (2,3) |
 +-----------------+---------------------------------+-------+
 
-.. index:: single: ASCII at ASCII
+Escape sequences only recognized in string literals are:
+
++-----------------+---------------------------------+-------+
+| Escape Sequence | Meaning                         | Notes |
++=================+=================================+=======+
+| ``\N{name}``    | Character named *name* in the   |       |
+|                 | Unicode database                |       |
++-----------------+---------------------------------+-------+
+| ``\uxxxx``      | Character with 16-bit hex value | \(4)  |
+|                 | *xxxx*                          |       |
++-----------------+---------------------------------+-------+
+| ``\Uxxxxxxxx``  | Character with 32-bit hex value | \(5)  |
+|                 | *xxxxxxxx*                      |       |
++-----------------+---------------------------------+-------+
 
 Notes:
 
 (1)
-   Individual code units which form parts of a surrogate pair can be encoded using
-   this escape sequence.
+   As in Standard C, up to three octal digits are accepted.
 
 (2)
-   Any Unicode character can be encoded this way, but characters outside the Basic
-   Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
-   compiled to use 16-bit code units (the default).  Individual code units which
-   form parts of a surrogate pair can be encoded using this escape sequence.
+   Unlike in Standard C, at most two hex digits are accepted.
 
 (3)
-   As in Standard C, up to three octal digits are accepted.
+   In a bytes literal, hexadecimal and octal escapes denote the byte with the
+   given value. In a string literal, these escapes denote a Unicode character
+   with the given value.
 
 (4)
-   Unlike in Standard C, at most two hex digits are accepted.
+   Individual code units which form parts of a surrogate pair can be encoded using
+   this escape sequence.
 
 (5)
-   In a string literal, hexadecimal and octal escapes denote the byte with the
-   given value; it is not necessary that the byte encodes a character in the source
-   character set. In a Unicode literal, these escapes denote a Unicode character
-   with the given value.
+   Any Unicode character can be encoded this way, but characters outside the Basic
+   Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
+   compiled to use 16-bit code units (the default).  Individual code units which
+   form parts of a surrogate pair can be encoded using this escape sequence.
+
 
-.. index:: single: unrecognized escape sequence
+.. index:: unrecognized escape sequence
 
 Unlike Standard C, all unrecognized escape sequences are left in the string
 unchanged, i.e., *the backslash is left in the string*.  (This behavior is
 useful when debugging: if an escape sequence is mistyped, the resulting output
 is more easily recognized as broken.)  It is also important to note that the
-escape sequences marked as "(Unicode only)" in the table above fall into the
-category of unrecognized escapes for non-Unicode string literals.
+escape sequences only recognized in string literals fall into the category of
+unrecognized escapes for bytes literals.
 
-When an ``'r'`` or ``'R'`` prefix is present, a character following a backslash
-is included in the string without change, and *all backslashes are left in the
-string*.  For example, the string literal ``r"\n"`` consists of two characters:
-a backslash and a lowercase ``'n'``.  String quotes can be escaped with a
-backslash, but the backslash remains in the string; for example, ``r"\""`` is a
-valid string literal consisting of two characters: a backslash and a double
-quote; ``r"\"`` is not a valid string literal (even a raw string cannot end in
-an odd number of backslashes).  Specifically, *a raw string cannot end in a
-single backslash* (since the backslash would escape the following quote
-character).  Note also that a single backslash followed by a newline is
-interpreted as those two characters as part of the string, *not* as a line
-continuation.
-
-When an ``'r'`` or ``'R'`` prefix is used in conjunction with a ``'u'`` or
-``'U'`` prefix, then the ``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are
-processed while  *all other backslashes are left in the string*. For example,
-the string literal ``ur"\u0062\n"`` consists of three Unicode characters: 'LATIN
-SMALL LETTER B', 'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can
-be escaped with a preceding backslash; however, both remain in the string.  As a
-result, ``\uXXXX`` escape sequences are only recognized when there are an odd
-number of backslashes.
+When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the
+``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other
+backslashes are left in the string*. For example, the string literal
+``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B',
+'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a
+preceding backslash; however, both remain in the string.  As a result,
+``\uXXXX`` escape sequences are only recognized when there is an odd number of
+backslashes.
+
+Even in a raw string, string quotes can be escaped with a backslash, but the
+backslash remains in the string; for example, ``r"\""`` is a valid string
+literal consisting of two characters: a backslash and a double quote; ``r"\"``
+is not a valid string literal (even a raw string cannot end in an odd number of
+backslashes).  Specifically, *a raw string cannot end in a single backslash*
+(since the backslash would escape the following quote character).  Note also
+that a single backslash followed by a newline is interpreted as those two
+characters as part of the string, *not* as a line continuation.
 
 
 .. _string-catenation:
@@ -600,19 +567,9 @@
 Numeric literals
 ----------------
 
-.. index::
-   single: number
-   single: numeric literal
-   single: integer literal
-   single: plain integer literal
-   single: long integer literal
-   single: floating point literal
-   single: hexadecimal literal
-   single: octal literal
-   single: binary literal
-   single: decimal literal
-   single: imaginary literal
-   single: complex; literal
+.. index:: number, numeric literal, integer literal, plain integer literal
+   long integer literal, floating point literal, hexadecimal literal
+   octal literal, binary literal, decimal literal, imaginary literal, complex literal
 
 There are four types of numeric literals: plain integers, long integers,
 floating point numbers, and imaginary numbers.  There are no complex literals
@@ -633,18 +590,17 @@
 .. productionlist::
    integer: `decimalinteger` | `octinteger` | `hexinteger`
    decimalinteger: `nonzerodigit` `digit`* | "0"+
+   nonzerodigit: "1"..."9"
+   digit: "0"..."9"
    octinteger: "0" ("o" | "O") `octdigit`+
    hexinteger: "0" ("x" | "X") `hexdigit`+
    bininteger: "0" ("b" | "B") `bindigit`+
-   nonzerodigit: "1"..."9"
    octdigit: "0"..."7"
    hexdigit: `digit` | "a"..."f" | "A"..."F"
-   bindigit: "0"..."1"
+   bindigit: "0" | "1"
 
-Plain integer literals that are above the largest representable plain integer
-(e.g., 2147483647 when using 32-bit arithmetic) are accepted as if they were
-long integers instead. [#]_  There is no limit for long integer literals apart
-from what can be stored in available memory.
+There is no limit for the length of integer literals apart from what can be
+stored in available memory.
 
 Note that leading zeros in a non-zero decimal number are not allowed. This is
 for disambiguation with C-style octal literals, which Python used before version
@@ -732,7 +688,7 @@
    &=      |=      ^=      >>=     <<=     **=
 
 The period can also occur in floating-point and imaginary literals.  A sequence
-of three periods has a special meaning as an ellipsis in slices. The second half
+of three periods has a special meaning as an ellipsis literal. The second half
 of the list, the augmented assignment operators, serve lexically as delimiters,
 but also perform an operation.
 
@@ -741,18 +697,7 @@
 
    '       "       #       \
 
-.. index:: single: ASCII at ASCII
-
 The following printing ASCII characters are not used in Python.  Their
 occurrence outside string literals and comments is an unconditional error::
 
    $       ?
-
-.. rubric:: Footnotes
-
-.. [#] In versions of Python prior to 2.4, octal and hexadecimal literals in the range
-   just above the largest representable plain integer but below the largest
-   unsigned 32-bit number (on a machine using 32-bit arithmetic), 4294967296, were
-   taken as the negative plain integer obtained by subtracting 4294967296 from
-   their unsigned value.
-


More information about the Python-3000-checkins mailing list