Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

Ian Kelly ian.g.kelly at gmail.com
Tue Nov 1 15:25:42 EDT 2011


On Mon, Oct 31, 2011 at 6:32 PM, Patrick Maupin <pmaupin at gmail.com> wrote:
> On Oct 31, 5:52 pm, Ian Kelly <ian.g.ke... at gmail.com> wrote:
>> For instance, split() will split on vertical tab,
>> which is not one of the characters the OP wanted.
>
> That's just the default behavior.  You can explicitly specify the
> separator to split on.  But it's probably more efficient to just use
> translate with deletechars.

As I understood it, the point of using the default behavior was to
merge whitespace, which cannot be done when the separator is
explicitly specified.  For example:

>>> "     ".split()
[]
>>> "     ".split(" ")
['', '', '', '', '', '']

It is easy to check that the first is empty.  The second is a bit more
annoying and is O(n).  Your point about deletechars is good, though,
definitely better than a regular expression.



More information about the Python-list mailing list