python for everyday tasks

Mon Nov 25 11:17:34 EST 2013

Le lundi 25 novembre 2013 16:11:22 UTC+1, Michael Torrie a écrit :
> I only respond here, as unicode in general is an important concept that
> 
> the OP will to make sure his students understand in Python, and I don't
> 
> want you to dishonestly sow the seeds of uncertainty and doubt.
> 
> 
> 
> On 11/25/2013 03:12 AM, wxjmfauth at gmail.com wrote:
> 
> > Your paragraph is mixing different concepts.
> 
> 
> 
> On the contrary, it appears you are the one mixing the concepts, and
> 
> confusing a byte-encoding scheme with unicode.
> 
> 
> 
> In an ideal world, the programmer should not need to know or care about
> 
> what encoding scheme the language is using internally to store strings.
> 
>  And it does not matter whether the internal encoding scheme is endorsed
> 
> by the unicode commission or not, provided it can handle all the valid
> 
> unicode constructs.
> 
> 
> 
> A string is unicode.  Period.  Hence you must concern yourself with
> 
> encoding only when reading or writing a byte stream.
> 
> 
> 
> Inside the language itself, the encoding is irrelevant.  Ideally.  In
> 
> python 3.3+ anyway.  Of course reality is different in other languages
> 
> which is why programmers are used to worrying about things like exposing
> 
> surrogate pairs (as Javascript does), or having to tweak your algorithms
> 
> to deal with the fact that UTF-8 indexing is not O(1).  To claim that a
> 
> programmer has to concern himself with internal language encoding in
> 
> Python 3 is not only untrue, it's ingenuousness at best, given the OP's
> 
> mission.
> 
> 
> 
> > When it comes to save memory, utf-8 is the choice. It
> 
> > beats largely the FSR on the side of memory and on
> 
> > the side of performances.
> 
> 
> 
> So you would condemn everyone to use an O(n) encoding for a string when
> 
> FSR offers full unicode compliance that optimizes both speed and memory?
> 
> 
> 
> No, D'Aprano is correct.  Python 3.3+ indeed does unicode right.  It
> 
> offers O(1) slicing, is memory efficient, and never exposes things like
> 
> surrogate pairs.
> 
> 
> 
> > How and why? I suggest, you have a deeper understanding
> 
> > of unicode.
> 
> 
> 
> Indeed I'd say D'Aprano does have a deeper understanding of unicode.
> 
> 
> 
> > May I recall, it is one of the coding scheme endorsed
> 
> > by "Unicode.org" and it is intensively used. This is not
> 
> > by chance.
> 
> 
> 
> Yes, you keep saying this.  Have you encountered a real-world situation
> 
> where you are impacted by Python's FSR? You keep posting silly
> 
> benchmarks that prove nothing, and continue arguing, yet presumably you
> 
> are still using Python.  Why haven't you switched to Google Go or
> 
> another language that implements unicode strings in UTF-8?

------

Everybody has the right to have an opinion. Understand
I respect Steven's opinion.

---

I'm aware of the utf-8 indexing "effect" (it is in fact the
answer I expected), that's why I proposed to dive a little
bit more in "unicode".

Now something else.
I'm practically no more programming in the sense creating
applications, but mainly interested in unicode. I "toyed" with
many tools, C#, go, ruby2 and my favorite, the TeX unicode engines.
I just happen I have a large experience with Python and I'm finding
this FSR fascinating.

jmf