Py3.3 unicode literal and input()

Tue Jun 19 19:21:23 EDT 2012

On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote:

> On 18 juin, 12:11, Steven D'Aprano <steve
> +comp.lang.pyt... at pearwood.info> wrote:
>> On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
>> > On 18 juin, 10:28, Benjamin Kaplan <benjamin.kap... at case.edu> wrote:
>> >> The u prefix is only there to
>> >> make it easier to port a codebase from Python 2 to Python 3. It
>> >> doesn't actually do anything.
>>
>> > It does. I shew it!
>>
>> Incorrect. You are assuming that Python 3 input eval's the input like
>> Python 2 does. That is wrong. All you show is that the one-character
>> string "a" is not equal to the four-character string "u'a'", which is
>> hardly a surprise. You wouldn't expect the string "3" to equal the
>> string "int('3')" would you?
>>
>> --
>> Steven
> 
> 
> A string is a string, a "piece of text", period.
> 
> I do not see why a unicode literal and an (well, I do not know how the
> call it) a "normal class <str>" should behave differently in code source
> or as an answer to an input().

They do not. As you showed earlier, in Python 3.3 the literal strings 
u'a' and 'a' have the same meaning: both create a one-character string 
containing the Unicode letter LOWERCASE-A.

Note carefully that the quotation marks are not part of the string. They 
are delimiters. Python 3.3 allows you to create a string by using 
delimiters:

' '
" "
u' '
u" "

plus triple-quoted versions of the same. The delimiter is not part of the 
string. They are only there to mark the start and end of the string in 
source code so that Python can tell the difference between the string "a" 
and the variable named "a".

Note carefully that quotation marks can exist inside strings:

my_string = "This string has 'quotation marks'."

The " at the start and end of the string literal are delimiters, not part 
of the string, but the internal ' characters *are* part of the string.

When you read data from a file, or from the keyboard using input(), 
Python takes the data and returns a string. You don't need to enter 
delimiters, because there is no confusion between a string (all data you 
read) and other programming tokens.

For example:

py> s = input("Enter a string: ")
Enter a string: 42
py> print(s, type(s))
42 <class 'str'>

Because what I type is automatically a string, I don't need to enclose it 
in quotation marks to distinguish it from the integer 42.

py> s = input("Enter a string: ")
Enter a string: This string has 'quotation marks'.
py> print(s, type(s))
This string has 'quotation marks'. <class 'str'>

What you type is exactly what you get, no more, no less.

If you type 42, you get the two character string "42" and not the int 42.

If you type [1, 2, 3], then you get the nine character string "[1, 2, 3]" 
and not a list containing integers 1, 2 and 3.

If you type 3**0.5 then you get the six character string "3**0.5" and not 
the float 1.7320508075688772.

If you type u'a' then you get the four character string "u'a'" and not 
the single character 'a'.

There is nothing new going on here. The behaviour of input() in Python 3, 
and raw_input() in Python 2, has not changed.

> Should a user write two derived functions?
> 
> input_for_entering_text()
> and
> input_if_you_are_entering_a_text_as_litteral()

If you, the programmer, want to force the user to write input in Python 
syntax, then yes, you have to write a function to do so. input() is very 
simple: it just reads strings exactly as typed. It is up to you to 
process those strings however you wish.

-- 
Steven