Newbie question about text encoding

Chris Angelico rosuav at gmail.com
Sun Mar 8 22:18:38 EDT 2015


On Mon, Mar 9, 2015 at 1:09 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
>
>> '\udd00' should be a SyntaxError.
>
> I find your argument convincing, that attempting to construct a Unicode
> string of a lone surrogate should be an error.
>
> Shouldn't the error type be a ValueError, though? The statement is not,
> to my mind, erroneous syntax.

For the string literal, I would say SyntaxError is more appropriate
than ValueError, as a string object has to be constructed at
compilation time.

I'd still like to see a report from someone who has used a language
that specifically disallows all surrogates in strings. Does it help?
Is it more hassle than it's worth? Are there weird edge cases that it
breaks?

ChrisA



More information about the Python-list mailing list