[issue1521950] shlex.split() does not tokenize like the shell

Vinay Sajip report at bugs.python.org
Tue Feb 21 18:27:36 CET 2012


Vinay Sajip <vinay_sajip at yahoo.co.uk> added the comment:

I updated the patch to reflect Éric's comments on Rietveld, but there are also some other changes:

Previously when punctuation chars were set, wordchars was being augmented by '-'. This was incomplete, so the augmentation is now with '~-./*?=' which allows for wildcards, filename chars and argument flags.

I added a token_type attribute whose value is 'a' for alphanumeric tokens and 'c' for punctuation tokens. This token type is internally tracked anyway - we just expose it now. It is needed for when multiple punctuation tokens need to be disambiguated, because we might return two logically separate punctuation tokens as one if they are not separated by whitespace in the source being tokenised.

New attributes and the changes to wordchars have been documented, and a test added for token_type return values.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1521950>
_______________________________________


More information about the Python-bugs-list mailing list