[Tutor] str.split and quotes

Marilyn Davis marilyn at deliberate.com
Fri Apr 8 02:36:32 CEST 2005


On Thu, 7 Apr 2005, Danny Yoo wrote:

> 
> 
> On Wed, 6 Apr 2005, Kent Johnson wrote:
> 
> > >>>>s = 'Hi "Python Tutors" please help'
> > >>>>s.split()
> > >
> > > ['Hi', '"Python', 'Tutors"', 'please', 'help']
> > >
> > >
> > > I wish it would leave the stuff in quotes in tact:
> > >
> > > ['Hi', '"Python Tutors"', 'please', 'help']
> >
> > You can do this easily with the csv module. The only complication is
> > that the string has to be wrapped in a StringIO to turn it into a
> > file-like object.
> 
> 
> Hello!
> 
> A variation of Kent's approach might be to use the 'tokenize' module:
> 
>     http://www.python.org/doc/lib/module-tokenize.html
> 
> which takes advantage of Python's tokenizer itself to break lines into
> chunks of tokens.  If you intend your input to be broken up just like
> Python tokens, the 'tokenize' module might be ok:
> 
> ######
> >>> import tokenize
> >>> from StringIO import StringIO
> >>> def getListOfTokens(s):
> ...     results = []
> ...     for tokenTuple in tokenize.generate_tokens(StringIO(s).readline):
> ...         results.append(tokenTuple[1])
> ...     return results
> ...
> >>> getListOfTokens('Hi "Python Tutors" please help')
> ['Hi', '"Python Tutors"', 'please', 'help', '']
> ######
> 
> (The last token, the empty string, is EOF, which can be filtered out if we
> use the token.ISEOF() function.)
> 

In my context, I expect exactly 8 tokens so the extra '' wouldn't be
noticed.

> 
> I'm not sure if this is appropriate for Marilyn's purposes though, but I
> thought I might just toss it out.  *grin*

Thank you Danny.  Very interesting.  Both approaches are perfect for
me.

Is there a reason to prefer one over the other?  Is one faster?  I
compiled my regular expression to make it quicker.

What a rich language!  So many choices.

Marilyn

> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

-- 



More information about the Tutor mailing list