Why not 3.__class__ ?

Alex Martelli aleax at aleax.it
Mon Sep 17 04:02:25 EDT 2001


"Kirby Urner" <urner at alumni.princeton.edu> wrote in message
news:3v9bqtogrl53sfqor0f81oojaa94tr0qt5 at 4ax.com...
>
> Since I can invoke string methods using string.method() notation,
> or go
>
>   >>> 'a'.__class__
>   <type 'str'>
>
> why is it that 3.__class__ or 3.__add__(4) are syntax errors.
> It looks ugly maybe, but no one says you have to use it.
> Why not include for consistency?
>
> Probably good reasons why not.  I'm here to learn.

The semantics are there in Python 2.2 alpha 3:

C:\Python22>python
Python 2.2a3 (#23, Sep  7 2001, 01:43:22) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> (3).__class__
<type 'int'>
>>> (3).__add__(4)
7
>>>

The reason you cannot omit the parentheses around the literal
in this case is that "3." is a valid token in and by itself,
and lexical analysis takes place first, and by a "maximal
munch" strategy (simplest, after all).  As long as you work
around the tokenization issue, the rest works fine, e.g.:

>>> 3 .__class__
<type 'int'>
>>> 3..__class__
<type 'float'>
>>> 3L.__class__
<type 'long'>
>>> 0x3.__class__
<type 'int'>

The whitespace, or the trailing . or L, or the leading 0x,
all suffice to tell the tokenizer that the dot before the
attribute name is a separate token from the literal, just
as do the parentheses used earlier.  But tokenizing
"3.whatever" as anything different from a "3." token
followed by a "whatever" token would break backwards
compatibility -- and it's definitely NOT worth the BIG
hassle to make the tokenizer way more complicated to
deal with this single, specific case, rare and marginal,
by making the tokenization of "3.whatever" depend on
the "whatever" or even wider syntax issues.  E.g.,

>>> 3.and 4
4

this MUST keep lexing with 3. first -- and having "3."
tokenize differently depends on what FOLLOWS it would
make everything horribly more convoluted.

C++ has a similar issue, which comes up MUCH more
often, since it also uses maximal-munch tokenizing
(one rare case where C++ has chosen simplicity:-):
">>" is a single token (shift-right), but ">" as a
token is widely overloaded and in particular "<"
and ">" are used as 'parentheses' for templates,
so a VERY common beginner's mistake is to write:

std::vector<std::list<int>> myvar;

which gives a syntax error because the ">>" tokenizes
as "shift-right" which is incorrect here; so one
soon learns to write, e.g.:

std::vector< std::list<int> > myvar;

where the space in "< std" is irrelevant, but is
generally used for symmetry, while the space in
"> >" is absolutely crucial (it ensures each ">"
is seen as a separate token).


Python tries to avoid overloading characters, but
a few cases are inevitable, to have both the nice
"3." notation for floating-point literals AND the
nice "x.feep" notation for attribute access -- the
dot must be overloaded lexically.


Alex






More information about the Python-list mailing list