[Python-Dev] just say no...

M.-A. Lemburg mal@lemburg.com
Fri, 12 Nov 1999 16:24:33 +0100


"Fred L. Drake, Jr." wrote:
> 
> M.-A. Lemburg writes:
>  > Such a buffer is needed to implement "s" and "s#" argument
>  > parsing. It's a simple requirement to support those two
>  > parsing markers -- there's not much to argue about, really...
>  > unless, of course, you want to give up Unicode object support
>  > for all APIs using these parsers.
> 
>   Perhaps I missed the agreement that these should always receive
> UTF-8 from Unicode strings.  Was this agreed upon, or has it simply
> not been argued over in favor of other topics?

It's been in the proposal since version 0.1. The idea is to
provide a decent way of making existing script Unicode aware.

>   If this has indeed been agreed upon... at least it can be computed
> on demand rather than at initialization!

This is what I intended to implement. The <defencbuf> buffer
will be filled upon the first request to the UTF-8 encoding.
"s" and "s#" are examples of such requests. The buffer will
remain intact until the object is destroyed (since other code
could store the pointer received via e.g. "s").

> Perhaps there should be two
> pointers: one to the UTF-8 buffer and one to a PyObject; if the
> PyObject is there it's a "old-style" string that's actually providing
> the buffer.  This may or may not be a good idea; there's a lot of
> memory expense for long Unicode strings converted from UTF-8 that
> aren't ever converted back to UTF-8 or accessed using "s" or "s#".
> Ok, I've talked myself out of that.  ;-)

Note that Unicode object are completely different beast ;-)
String object are not touched in any way by the proposal.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    49 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/