Tokenizing a string

Fredrik Lundh effbot at telia.com
Sat Mar 18 10:41:42 EST 2000


Michael Dartt <mad96 at hampshire.edu> wrote:
> I've got a string I'd like to tokenize, but it's not in a file, and it'd
> be rather inefficient to write it to a file just to tokenize it.  Is
> there any function I can use to pass this string to
> tokenize.tokenize()?

the "tokenize" function takes any method which returns
a new line of code for each call, and an empty string when
it runs out of data.

the easiest way to use this on a string is to wrap the
string in a StringIO object, and pass the readline method
to the tokenizer:

import tokenize
import StringIO

prog = "print 'hello'\n"

tokenize.tokenize(StringIO.StringIO(prog).readline)

## this prints:
##
## 1,0-1,5:     NAME    'print'
## 1,6-1,13:    STRING  "'hello'"
## 1,13-1,14:   NEWLINE '\012'
## 2,0-2,0:     ENDMARKER       ''

alternatively, you can use your own wrapper, such as:

import string

class Wrapper:
    def __init__(self, program):
        self.prog = string.split(program, "\n")
        if program[-1:] == "\n":
            del self.prog[-1] # trim tail
    def __call__(self):
        try:
            return self.prog.pop(0) + "\n"
        except IndexError:
            return "" # end of list

tokenize.tokenize(Wrapper(prog))

hope this helps!

</F>

<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->





More information about the Python-list mailing list