[Tutor] Why difference between printing string & typing its object reference at the prompt?

boB Stepp robertvstepp at gmail.com
Thu Oct 11 03:23:08 CEST 2012


On Tue, Oct 9, 2012 at 4:29 AM, eryksun <eryksun at gmail.com> wrote:
<snip>
> Python 3 lets you use any Unicode letter as an identifier, including
> letter modifiers ("Lm") and number letters ("Nl"). For example:
>
>     >>> aꘌꘌb = True
>     >>> aꘌꘌb
>     True
>
>     >>> Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6)
>     >>> Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ
>     (1, 2, 3, 4, 5)

Is doing this considered good programming practice? I recall there was
a recent discussion about using the actual characters in formulas
instead of descriptive names, where this would make more sense to
people knowledgeable in the field using the formulas; however,
descriptive names might be better for those who don't have that
specialty knowledge. Is there a Python community consensus on how and
when it is appropriate (if ever) to use Unicode characters as
identifiers?

> A potential gotcha in Unicode is the design choice to have both
> [C]omposed and [D]ecomposed forms of characters. For example:
>
>     >>> from unicodedata import name, normalize
>
>     >>> s1 = "ü"
>     >>> name(s1)
>     'LATIN SMALL LETTER U WITH DIAERESIS'
>
>     >>> s2 = normalize("NFD", s1)
>     >>> list(map(name, s2))
>     ['LATIN SMALL LETTER U', 'COMBINING DIAERESIS']
>
> These combine as one glyph when printed:
>
>     >>> print(s2)
>>
> Different forms of the 'same' character won't compare as equal unless
> you first normalize them to the same form:
>
>     >>> s1 == s2
>     False
>     >>> normalize("NFC", s1) == normalize("NFC", s2)
>     True

This looks to make alphabetical sorting potentially much more complex.
I will have to give this some thought once I know more.

>> I don't see a mention of byte strings mentioned in the index of my
>> text. Are these just the ASCII character set?

After seeing your explanation below, I was able to find the relevant
material in my book. It was under "bytes type" and "bytearray type".
For some reason these categories did not "click" in my head as what
Steve was addressing.

> A bytes object (and its mutable cousin bytearray) is a sequence of
> numbers, each in the range of a byte (0-255). bytes literals start
> with b, such as b'spam' and can only use ASCII characters, as does the
> repr of bytes. Slicing returns a new bytes object, but an index or
> iteration returns integer values:
>
>     >>> b'spam'[:3]
>     b'spa'
>     >>> b'spam'[0]
>     115
>     >>> list(b'spam')
>     [115, 112, 97, 109]
>
> bytes have string methods as a convenience, such as find, split, and
> partition. They also have the method decode(), which uses a specified
> encoding such as "utf-8" to create a string from an encoded bytes
> sequence.

What is the intended use of byte types?

Thanks! This continues to be quite informative and this thread is
greatly helping me to make better sense of the information that I am
self-studying.
-- 
Cheers!
boB


More information about the Tutor mailing list