[Tutor] str.split and quotes
Marilyn Davis
marilyn at deliberate.com
Fri Apr 8 02:36:32 CEST 2005
On Thu, 7 Apr 2005, Danny Yoo wrote:
>
>
> On Wed, 6 Apr 2005, Kent Johnson wrote:
>
> > >>>>s = 'Hi "Python Tutors" please help'
> > >>>>s.split()
> > >
> > > ['Hi', '"Python', 'Tutors"', 'please', 'help']
> > >
> > >
> > > I wish it would leave the stuff in quotes in tact:
> > >
> > > ['Hi', '"Python Tutors"', 'please', 'help']
> >
> > You can do this easily with the csv module. The only complication is
> > that the string has to be wrapped in a StringIO to turn it into a
> > file-like object.
>
>
> Hello!
>
> A variation of Kent's approach might be to use the 'tokenize' module:
>
> http://www.python.org/doc/lib/module-tokenize.html
>
> which takes advantage of Python's tokenizer itself to break lines into
> chunks of tokens. If you intend your input to be broken up just like
> Python tokens, the 'tokenize' module might be ok:
>
> ######
> >>> import tokenize
> >>> from StringIO import StringIO
> >>> def getListOfTokens(s):
> ... results = []
> ... for tokenTuple in tokenize.generate_tokens(StringIO(s).readline):
> ... results.append(tokenTuple[1])
> ... return results
> ...
> >>> getListOfTokens('Hi "Python Tutors" please help')
> ['Hi', '"Python Tutors"', 'please', 'help', '']
> ######
>
> (The last token, the empty string, is EOF, which can be filtered out if we
> use the token.ISEOF() function.)
>
In my context, I expect exactly 8 tokens so the extra '' wouldn't be
noticed.
>
> I'm not sure if this is appropriate for Marilyn's purposes though, but I
> thought I might just toss it out. *grin*
Thank you Danny. Very interesting. Both approaches are perfect for
me.
Is there a reason to prefer one over the other? Is one faster? I
compiled my regular expression to make it quicker.
What a rich language! So many choices.
Marilyn
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
--
More information about the Tutor
mailing list