Default scope of variables

Thu Jul 4 22:27:18 EDT 2013

On 5 July 2013 03:03, Dave Angel <davea at davea.name> wrote:
> On 07/04/2013 09:24 PM, Steven D'Aprano wrote:
>> On Thu, 04 Jul 2013 17:54:20 +0100, Rotwang wrote:
>>> It's perhaps worth mentioning that some non-ascii characters are allowed
>>> in identifiers in Python 3, though I don't know which ones.
>>
>> PEP 3131 describes the rules:
>>
>> http://www.python.org/dev/peps/pep-3131/
>
> The isidentifier() method will let you weed out the characters that cannot
> start an identifier.  But there are other groups of characters that can
> appear after the starting "letter".  So a more reasonable sample might be
> something like:
...
> In particular,
>     http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers
>
> has a  definition for id_continue that includes several interesting
> categories.  I expected the non-ASCII digits, but there's other stuff there,
> like "nonspacing marks" that are surprising.
>
> I'm pretty much speculating here, so please correct me if I'm way off.

For my calculation above, I used this code I quickly mocked up:

> import unicodedata as unidata
> from sys import maxunicode
> from collections import defaultdict
> from itertools import chain
>
> def get():
>     xid_starts = set()
>     xid_continues = set()
>
>     id_start_categories = "Lu, Ll, Lt, Lm, Lo, Nl".split(", ")
>     id_continue_categories = "Mn, Mc, Nd, Pc".split(", ")
>
>     characters = (chr(n) for n in range(maxunicode + 1))
>
>     print("Making normalized characters")
>
>     normalized = (unidata.normalize("NFKC", character) for character in characters)
>     normalized = set(chain.from_iterable(normalized))
>
>     print("Assigning to categories")
>
>     for character in normalized:
>         category = unidata.category(character)
>
>         if category in id_start_categories:
>             xid_starts.add(character)
>         elif category in id_continue_categories:
>             xid_continues.add(character)
>
>     return xid_starts, xid_continues

Please note that "xid_continues" actually represents "xid_continue - xid_start".