string-based tokenizer?

Tim Peters tim_one at email.msn.com
Mon May 3 23:36:00 EDT 1999


[The Blue Wizard]
> I was looking for a quick way to tokenize the Python and Python-like
> expressions, and I found the tokenizer.py.  It is quite nice, but there
> is one little ugly fact:  it is dependent on datastream.

Nope:  it depends on something (anything!) that "acts like a stream".  With
enough Pythonistic cleverness, you can feed it a disguised integer <wink>.

> I want to be able to just type  tokenize('a cute string known as 3+4')
> and get a series of tokens directly from it.

Well, that's not what it does-- so you write a class to take what it does do
and turn that into what you want it to do:

import tokenize

class TokenWrapper:
    def __init__(self, lines):
        self.lines = lines
        self.i = 0
        self.tokens = []

    def run(self):
        tokenize.tokenize(self.readline, self.tokeneater)
        return self.tokens

    def tokeneater(self, type, token, (srow, scol),
                   (erow, ecol), line):
        self.tokens.append(token)

    def readline(self):
        if self.i >= len(self.lines):
            return ""
        line = self.lines[self.i]
        self.i = self.i + 1
        return line

def blue(astring):
    return TokenWrapper([astring]).run()

print blue('a cute string known as 3+4')

If you write a different TokenWrapper class for each string, you're missing
the point <wink>.

write-once-use-twice-ly y'rs  - tim






More information about the Python-list mailing list