Message from exception raised in generator disappears

Sun Oct 17 03:33:01 EDT 2004

Jeff Epler wrote:
> PS Here's my stab at writing your function, which seems like a useful
> one.
> 
> def chars(s):
>     s = iter(s)
>     for i in s:
>         if u'\ud800' <= i < u'\udc00':
>             try:
>                 j = s.next()
>             except StopIteration:
>                 raise ValueError("Bad pair: string ends after %r" % i)
>             if u'\udc00' <= j < u'\ue000':
>                 yield i + j
>             else:
>                 raise ValueError("Bad pair: %r (no second half)" % (i+j))
>         elif u'\udc00' <= i < u'\ude00':
>                 raise ValueError("Bad pair: %r (no first half)" % i)
>         else:
>             yield i

Ah, nice, thanks, and very close. You needed to account for byte strings, 
though, and the last value in the elif needs to be u'\ue000', not u'\ude00'.
Here we go:

def chars(s):
    if isinstance(s, str):
        for i in s:
            yield i
        return
    s = iter(s)
    for i in s:
        if u'\ud800' <= i < u'\udc00':
            try:
                j = s.next()
            except StopIteration:
                raise ValueError("Bad pair: string ends after %r" % i)
            if u'\udc00' <= j < u'\ue000':
                yield i + j
            else:
                raise ValueError("Bad pair: %r (bad second half)" % (i+j))
        elif u'\udc00' <= i < u'\ue000':
                raise ValueError("Bad pair: %r (no first half)" % i)
        else:
            yield i

# tests of good strings
[c for c in chars('test')]
[c for c in chars('test \xe9')]
[c for c in chars(u'test \xe9 \u2022 \ud800\udc00 \U00010000')]
[c for c in chars(u'test \xe9 \u2022 \udbff\udffd \U0010fffd')]

# tests of bad strings
[c for c in chars(u'test \ud800')]        # string ends before 2nd half
[c for c in chars(u'test \ud800 test')]   # bad 2nd half
[c for c in chars(u'test \ud800\ud800')]  # bad 2nd half
[c for c in chars(u'test \udc00 test')]   # no 1st half

Awesome!

Thanks also to Scott David Daniels for pointing out that the problem
with the exceptions was just that the messages needed to be str instead
of unicode.

-Mike