[Patches] [ python-Patches-1159501 ] Improve %s support for unicode

SourceForge.net noreply at sourceforge.net
Mon Aug 22 22:57:36 CEST 2005


Patches item #1159501, was opened at 2005-03-09 01:43
Message generated for change (Comment added) made by nascheme
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1159501&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Fredrik Lundh (effbot)
Summary: Improve %s support for unicode

Initial Comment:
"'%s' % unicode_string" produces a unicode result.  I
think the following code should also return a unicode
string:

class Wrapper:
....def __str__(self):
........return unicode_string
'%s' % Wrapper()

That behavior would make it easier to write library
code that can work with either str objects or unicode
objects.

The fix is pretty simple (see that attached patch). 
Perhaps the PyObject_Text function should be called
_PyObject_Text instead.  Alternatively, if the function
is make public then we should document it and perhaps
also provide a builtin function called 'text' that uses it.




----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2005-08-22 20:57

Message:
Logged In: YES 
user_id=35752

Closing in favor of patch 1266570.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2005-04-20 21:46

Message:
Logged In: YES 
user_id=35752

Here's a quote from him:
> I'm beginning to think that we need an extra method
(__text__), that
> can return any kind of string that's compatible with
Python's text model.
>
> (in today's CPython, that's an 8-bit string with ASCII
only, or a Uni-
> code string.  future Python's may support more string
types, at least at
> the C implementation level).
>
> I'm not sure we can change __str__ or __unicode__ without
breaking
> code in really obscure ways (but I'd be happy to be proven
wrong).

My idea is that we can change __str__ without breaking code.
 The reason is that no one should be calling tp_str
directly.  Instead they use PyObject_Str.

I don't know what he meant by "string that's compatible with
Python's text model".  With my change, Python can only deal
with str or unicode instances.  I have no idea how we could
support other string implementations.

I don't want to introduce a text() builtin that calls
__str__ and then later realize that __text__ would be a
useful.  Perhaps this change is big enough to require a PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-04-20 21:27

Message:
Logged In: YES 
user_id=38388

Looks OK to me; not sure what you mean with __text__ -
__str__ already has taken that role long ago.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2005-04-20 21:00

Message:
Logged In: YES 
user_id=35752

Assigning to effbot for review.  He had mentioned something
about __text__ at one point.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2005-03-10 21:13

Message:
Logged In: YES 
user_id=35752

attempt to attach patch again

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2005-03-10 21:12

Message:
Logged In: YES 
user_id=35752

Attaching a better patch.  Add a builtin function called
"text".  Change PyObject_Text to check the return types as
suggested by Mark.  Update the documentation and the tests.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-03-09 10:10

Message:
Logged In: YES 
user_id=38388

Nice patch. 

Only nit: PyObject_Text() should check that the result of
tp_str() is indeed either a string or unicode instance
(possibly from a subclass). Otherwise, the function wouldn't
be able to guarantee this feature - which is what it's all
about.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1159501&group_id=5470


More information about the Patches mailing list