I need an idea for practise!

Terry Reedy tjreedy at udel.edu
Thu Jul 17 17:46:51 EDT 2014


On 7/17/2014 1:20 PM, Chris Angelico wrote:

> By the way, one specific point about RR's advice: A colorizer should
> *not* be written using regexps. It'd make for an absolute nightmare of
> impossible-to-debug regexp strings, plus there are fundamental
> limitations on what you can accomplish with them. You need to use a
> lexer - a lexical analyzer. Basically, to correctly colorize code, you
> need to have something equivalent to the first part of the language
> interpreter, but with a lot more tolerance for errors. That's a pretty
> big thing to write as regexps.

It depends on how deeply one wants to colorize. idlelib.ColorDelegator 
colors comments, strings, keywords, builtin names, and names following 
'def' and 'class' with the following regexes.

def any(name, alternates):
     "Return a named group pattern matching list of alternates."
     return "(?P<%s>" % name + "|".join(alternates) + ")"

def make_pat():
     kw = r"\b" + any("KEYWORD", keyword.kwlist) + r"\b"
     builtinlist = [str(name) for name in dir(builtins)
                                         if not name.startswith('_') and \
                                         name not in keyword.kwlist]
     # self.file = open("file") :
     # 1st 'file' colorized normal, 2nd as builtin, 3rd as string
     builtin = r"([^.'\"\\#]\b|^)" + any("BUILTIN", builtinlist) + r"\b"
     comment = any("COMMENT", [r"#[^\n]*"])
     stringprefix = r"(\br|u|ur|R|U|UR|Ur|uR|b|B|br|Br|bR|BR|rb|rB|Rb|RB)?"
     sqstring = stringprefix + r"'[^'\\\n]*(\\.[^'\\\n]*)*'?"
     dqstring = stringprefix + r'"[^"\\\n]*(\\.[^"\\\n]*)*"?'
     sq3string = stringprefix + r"'''[^'\\]*((\\.|'(?!''))[^'\\]*)*(''')?"
     dq3string = stringprefix + r'"""[^"\\]*((\\.|"(?!""))[^"\\]*)*(""")?'
     string = any("STRING", [sq3string, dq3string, sqstring, dqstring])
     return kw + "|" + builtin + "|" + comment + "|" + string +\
            "|" + any("SYNC", [r"\n"])

prog = re.compile(make_pat(), re.S)
idrog = re.compile(r"\s+(\w+)", re.S)
asprog = re.compile(r".*?\b(as)\b")

I am not sure if the separate definition for as is still needed, or is a 
holdover from when 'as' was not a keyword except in certain contexts.

-- 
Terry Jan Reedy




More information about the Python-list mailing list