[Python-Dev] Python-3.0, unicode, and os.environ

Glenn Linderman v+python at g.nevcal.com
Mon Dec 8 05:45:12 CET 2008


On approximately 12/7/2008 8:13 PM, came the following characters from 
the keyboard of Stephen J. Turnbull:
> Glenn Linderman writes:
> 
>  > But if you are interested in checking for security issues, shouldn't you 
>  >   _first_ decode into some canonical form,
> 
> Yes.  That's all that is being asked for: that Python do strict
> decoding to a canonical form by default.  That's a lot to ask, as it
> turns out, but that is what we (the minority of strict Unicode
> adherents, that is) want.


I have no problem with having strict validation available.  But doesn't 
validation take significantly longer than decoding?  So I think it 
should be logically decoupled... do validation when/where it is needed 
for security reasons, and allow internal [de]coding to be faster.

I'm mostly indifferent about which should be the default... maybe there 
shouldn't be a default!  Use the "vUTF-8" decoder for strict validation, 
and the "fUTF-8" decoder for the faster, non-validating version.  Or 
something like that.  With appropriate documentation.  Of course, 
"UTF-8" already exists... as "fUTF-8", so for compatibility, I guess it 
shouldn't change... but it could be deprecated.


You didn't address the issue that if the decoding to a canonical form is 
done first, many of the insecurities just go away, so why throw errors?


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking


More information about the Python-Dev mailing list