"\n" in unicode vs "regular" string: is this normal?

Wed Jan 29 07:47:05 EST 2003

Eric Brunel <eric.brunel at pragmadev.com> writes:

> Hi all,
> 
> Consider the following class:
> 
> class C:
>    def __init__(self, s):
>      self.s = s
>    def __repr__(self):
>      return self.s
> 
> Now run:
> 
>  >>> o1 = C("foo\nbar")
>  >>> o1
> foo
> bar
>  >>> o2 = C(unicode("foo\nbar"))
>  >>> o2
> foo\nbar

That's pretty odd.

> Is there a good reason why the result is different? The unicode stuff
> should take by default the standard encoding which is for me just
> plain ASCII. So why does the two objects have different
> representations?

Well, you're not returning a string (in the strict sense) from repr.

Hmm, it'd be this code, then (from Objects/object.c:PyObject_Repr):

#ifdef Py_USING_UNICODE
		if (PyUnicode_Check(res)) {
			PyObject* str;
			str = PyUnicode_AsUnicodeEscapeString(res);
			Py_DECREF(res);
			if (str)
				res = str;
			else
				return NULL;
		}
#endif

it's possible that that should be PyUnicode_AsEncodedString(res, NULL,
NULL); instead.

It might be better to ensure your __repr__ methods always return
real strings by calling __str__.

Hmm, http://www.python.org/sf/400706 seems to indicate this has been a
point of discussion in the past.  You could dig, if you felt like it.

> I just stepped on this problem because of Tkinter returning plain
> strings if the text is only ASCII, but unicode strings otherwise. The
> text returned from Tkinter was wrapped in XML code by specific objects
> using __repr__ (which may not be a good idea indeed...).

Given that AIUI XML documents are strictly unicode indeed it may
not...

> And it took me ages to figure out why on earth the text I stored in
> my file contained the string "\n" instead of actual line feeds...

I'm not surprised!

Cheers,
M.

-- 
  Good? Bad? Strap him into the IETF-approved witch-dunking
  apparatus immediately!                        -- NTK now, 21/07/2000