[Python-Dev] "".tokenize() ?

Fredrik Lundh fredrik@pythonware.com
Fri, 4 May 2001 12:50:06 +0200


mal wrote:

> > > "one, two and three".tokenize([",", "and"])
> > > -> ["one", " two ", "three"]
> > >
> > > I like this method -- should I review the code and then check it in ?
> >
> > -1.  method bloat.  not exactly something you do every day, and
> > when you do, it's a one-liner:
> >
> > def tokenize(string, ignore):
> >     [word for word in re.findall("\w+", string) if not word in ignore]
>
> This is not the same as what .tokenize() does: it cut at each
> occurrance of a substring rather than words as in your example

oh, I didn't see the spaces.  splitting on all substrings is even
easier (but perhaps a bit more obscure, at least when written
on one line):

def tokenize(string, seps):
    return re.split("|".join(map(re.escape, seps)), string)

Cheers /F