[issue18814] Add utilities to "clean" surrogate code points from strings
Steven D'Aprano
report at bugs.python.org
Mon Sep 28 04:23:56 CEST 2015
Steven D'Aprano added the comment:
On Sun, Sep 27, 2015 at 04:17:45PM +0000, R. David Murray wrote:
>
> I also want "detect if there are any surrogates".
I think that's useful enough it should be a str method. Here's a
pure-Python implementation:
def is_surrogate(s):
return any(0xD800 <= ord(c) <= 0xDFFF for c in s)
The actual Flexible String Representation implementation can be even
more efficient. All-ASCII and all-Latin1 strings cannot possibly include
surrogates, so is_surrogate will be a constant-time operation.
(If we care about making this a constant-time operation for all
strings, the constructor could set a flag on the string object. Since
the constructor has to walk the string anyway, I don't think that will
cost much time-wise.)
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________
More information about the Python-bugs-list
mailing list