[XML-SIG] Re: Issues with Unicode type

Uche Ogbuji uche.ogbuji@fourthought.com
Mon, 23 Sep 2002 15:16:08 -0600


> On Mon, Sep 23, 2002 at 07:21:41PM +0200, Eric van der Vlist wrote:
> > Yep, and that's what James Clark is doing in his Java implementation:
> > 
> >   public int getLength(Object obj) {
> >     String str = (String)obj;
> >     int len = str.length();
> >     int nSurrogatePairs = 0;
> >     for (int i = 0; i < len; i++)
> >       if (Utf16.isSurrogate1(str.charAt(i)))
> > 	nSurrogatePairs++;
> >     return len - nSurrogatePairs;
> >   }
> > 
> > And I need to do the same in Python...
> 
>   yep, that simple,

Oh, but then Python is so much simpler:

    
SP_PAT = re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")
def smart_len(u):
    sp_count = len(SP_PAT.findall(u))
    return len(u) - sp_count


Problem solved.

The great thing about Python is even when it frustrates you one moment, it 
finds a way to quickly make up for it.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/webservices/library/ws-pyth10.html