Message from exception raised in generator disappears

Scott David Daniels Scott.Daniels at Acm.Org
Sat Oct 16 21:17:54 EDT 2004


Mike Brown wrote:
> I thought I was being pretty clever with my first attempt at using generators, 
> but I seem to be missing some crucial concept, for even though this seems to 
> work as intended, the text of the exception message does not bubble up with 
> either of the ValueErrors when one of them is raised.
> 
> 
> # This helps iterate over a unicode string. When python is built with
> # 16-bit chars (as is the default on Windows), it returns surrogate
> # pairs together (unlike 'for c in s'), and detects illegal surrogate
> # pairs. Byte strings are unaffected.
> def chars(s):
>     surrogate = None
>     for c in s:
>         cp = ord(c)
>         if surrogate is not None:
>             if cp > 56319 and cp < 57344:
>                 pair = surrogate + c
>                 surrogate = None
>                 yield pair
>             else:
>                 raise ValueError("Bad surrogate pair in %s" % s)
>         else:
>             if cp > 55295 and cp < 57344:
>                 if cp < 56320:
>                     surrogate = c
>                 else:
>                     raise ValueError("Bad surrogate pair in %s" %s)
>             else:
>                 surrogate = None
>                 yield c
>     if surrogate is not None:
>         raise ValueError("Bad surrogate pair at end of %s" % s)
> 
> 
> # as expected, returns u'example \xe9...\u2022...\U00010000...\U0010fffd'
> ''.join([c for c in chars(u'example \xe9...\u2022...\ud800\udc00...\U0010fffd')])
> 
> # now test the 3 exception conditions. Each produces a ValueError
> ''.join([c for c in chars(u'2nd half bad: \ud800bogus')])
> ''.join([c for c in chars(u'no 1st half: \udc00')])
> ''.join([c for c in chars(u'no 2nd half: \ud800')])
> 
> 
> All 3 result of the exception tests result in a bare ValueError; there's no 
> "Bad surrogate pair in" message shown. Why is thta? What am I doing wrong?

The problem is that type('abc%s' % u'\udc00') is unicode, not str.
Change your raises to something like:
          raise ValueError("Bad surrogate pair at end of %r" % s)
and the you can relax.

-Scott David Daniels
Scott.Daniels at Acm.Org




More information about the Python-list mailing list