string-based tokenizer?
Tim Peters
tim_one at email.msn.com
Mon May 3 23:36:00 EDT 1999
[The Blue Wizard]
> I was looking for a quick way to tokenize the Python and Python-like
> expressions, and I found the tokenizer.py. It is quite nice, but there
> is one little ugly fact: it is dependent on datastream.
Nope: it depends on something (anything!) that "acts like a stream". With
enough Pythonistic cleverness, you can feed it a disguised integer <wink>.
> I want to be able to just type tokenize('a cute string known as 3+4')
> and get a series of tokens directly from it.
Well, that's not what it does-- so you write a class to take what it does do
and turn that into what you want it to do:
import tokenize
class TokenWrapper:
def __init__(self, lines):
self.lines = lines
self.i = 0
self.tokens = []
def run(self):
tokenize.tokenize(self.readline, self.tokeneater)
return self.tokens
def tokeneater(self, type, token, (srow, scol),
(erow, ecol), line):
self.tokens.append(token)
def readline(self):
if self.i >= len(self.lines):
return ""
line = self.lines[self.i]
self.i = self.i + 1
return line
def blue(astring):
return TokenWrapper([astring]).run()
print blue('a cute string known as 3+4')
If you write a different TokenWrapper class for each string, you're missing
the point <wink>.
write-once-use-twice-ly y'rs - tim
More information about the Python-list
mailing list