[issue18814] Add utilities to "clean" surrogate code points from strings

Steven D'Aprano report at bugs.python.org
Mon Sep 28 04:23:56 CEST 2015


Steven D'Aprano added the comment:

On Sun, Sep 27, 2015 at 04:17:45PM +0000, R. David Murray wrote:
> 
> I also want "detect if there are any surrogates".

I think that's useful enough it should be a str method. Here's a 
pure-Python implementation:

def is_surrogate(s):
    return any(0xD800 <= ord(c) <= 0xDFFF for c in s)

The actual Flexible String Representation implementation can be even 
more efficient. All-ASCII and all-Latin1 strings cannot possibly include 
surrogates, so is_surrogate will be a constant-time operation.

(If we care about making this a constant-time operation for all 
strings, the constructor could set a flag on the string object. Since 
the constructor has to walk the string anyway, I don't think that will 
cost much time-wise.)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________


More information about the Python-bugs-list mailing list