tuples, index method, Python's design

Paul Rubin http
Sat Apr 14 13:59:42 EDT 2007


"Rhamphoryncus" <rhamph at gmail.com> writes:
> > > >   i = s.index(e) => s[i] = e
> > > > Then this algorithm is no longer guaranteed to work with strings.
> > > It never worked correctly on unicode strings anyway (which becomes the
> > > canonical string in python 3.0).
> >
> > What?!   Are you sure?  That sounds broken to me.
> 
> Nope, it's pretty fundamental to working with text, unicode only being
> an extreme example: there's a wide number of ways to break down a
> chunk of text, making the odds of "e" being any particular one fairly
> low.  Python's unicode type only makes this slightly worse, not
> promising any particular one is available.

I don't understand this.  I thought that unicode was a character
coding system like ascii, except with an enormous character set
combined with a bunch of different algorithms for encoding unicode
strings as byte sequences.  But I've thought of those algorithms
(UTF-8 and so forth) as basically being kludgy data compression
schemes, and unicode strings are still just sequences of code points.



More information about the Python-list mailing list