Message from exception raised in generator disappears
Mike Brown
mike at skew.org
Sun Oct 17 03:33:01 EDT 2004
Jeff Epler wrote:
> PS Here's my stab at writing your function, which seems like a useful
> one.
>
> def chars(s):
> s = iter(s)
> for i in s:
> if u'\ud800' <= i < u'\udc00':
> try:
> j = s.next()
> except StopIteration:
> raise ValueError("Bad pair: string ends after %r" % i)
> if u'\udc00' <= j < u'\ue000':
> yield i + j
> else:
> raise ValueError("Bad pair: %r (no second half)" % (i+j))
> elif u'\udc00' <= i < u'\ude00':
> raise ValueError("Bad pair: %r (no first half)" % i)
> else:
> yield i
Ah, nice, thanks, and very close. You needed to account for byte strings,
though, and the last value in the elif needs to be u'\ue000', not u'\ude00'.
Here we go:
def chars(s):
if isinstance(s, str):
for i in s:
yield i
return
s = iter(s)
for i in s:
if u'\ud800' <= i < u'\udc00':
try:
j = s.next()
except StopIteration:
raise ValueError("Bad pair: string ends after %r" % i)
if u'\udc00' <= j < u'\ue000':
yield i + j
else:
raise ValueError("Bad pair: %r (bad second half)" % (i+j))
elif u'\udc00' <= i < u'\ue000':
raise ValueError("Bad pair: %r (no first half)" % i)
else:
yield i
# tests of good strings
[c for c in chars('test')]
[c for c in chars('test \xe9')]
[c for c in chars(u'test \xe9 \u2022 \ud800\udc00 \U00010000')]
[c for c in chars(u'test \xe9 \u2022 \udbff\udffd \U0010fffd')]
# tests of bad strings
[c for c in chars(u'test \ud800')] # string ends before 2nd half
[c for c in chars(u'test \ud800 test')] # bad 2nd half
[c for c in chars(u'test \ud800\ud800')] # bad 2nd half
[c for c in chars(u'test \udc00 test')] # no 1st half
Awesome!
Thanks also to Scott David Daniels for pointing out that the problem
with the exceptions was just that the messages needed to be str instead
of unicode.
-Mike
More information about the Python-list
mailing list