"\n" in unicode vs "regular" string: is this normal?
Michael Hudson
mwh at python.net
Wed Jan 29 07:47:05 EST 2003
Eric Brunel <eric.brunel at pragmadev.com> writes:
> Hi all,
>
> Consider the following class:
>
> class C:
> def __init__(self, s):
> self.s = s
> def __repr__(self):
> return self.s
>
> Now run:
>
> >>> o1 = C("foo\nbar")
> >>> o1
> foo
> bar
> >>> o2 = C(unicode("foo\nbar"))
> >>> o2
> foo\nbar
That's pretty odd.
> Is there a good reason why the result is different? The unicode stuff
> should take by default the standard encoding which is for me just
> plain ASCII. So why does the two objects have different
> representations?
Well, you're not returning a string (in the strict sense) from repr.
Hmm, it'd be this code, then (from Objects/object.c:PyObject_Repr):
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(res)) {
PyObject* str;
str = PyUnicode_AsUnicodeEscapeString(res);
Py_DECREF(res);
if (str)
res = str;
else
return NULL;
}
#endif
it's possible that that should be PyUnicode_AsEncodedString(res, NULL,
NULL); instead.
It might be better to ensure your __repr__ methods always return
real strings by calling __str__.
Hmm, http://www.python.org/sf/400706 seems to indicate this has been a
point of discussion in the past. You could dig, if you felt like it.
> I just stepped on this problem because of Tkinter returning plain
> strings if the text is only ASCII, but unicode strings otherwise. The
> text returned from Tkinter was wrapped in XML code by specific objects
> using __repr__ (which may not be a good idea indeed...).
Given that AIUI XML documents are strictly unicode indeed it may
not...
> And it took me ages to figure out why on earth the text I stored in
> my file contained the string "\n" instead of actual line feeds...
I'm not surprised!
Cheers,
M.
--
Good? Bad? Strap him into the IETF-approved witch-dunking
apparatus immediately! -- NTK now, 21/07/2000
More information about the Python-list
mailing list