tuples, index method, Python's design

Rhamphoryncus rhamph at gmail.com
Sat Apr 14 13:46:15 EDT 2007


On Apr 13, 11:05 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> "Rhamphoryncus" <rha... at gmail.com> writes:
> > >   i = s.index(e) => s[i] = e
> > > Then this algorithm is no longer guaranteed to work with strings.
> > It never worked correctly on unicode strings anyway (which becomes the
> > canonical string in python 3.0).
>
> What?!   Are you sure?  That sounds broken to me.

Nope, it's pretty fundamental to working with text, unicode only being
an extreme example: there's a wide number of ways to break down a
chunk of text, making the odds of "e" being any particular one fairly
low.  Python's unicode type only makes this slightly worse, not
promising any particular one is available.

For example, if you had an algorithm designed for ascii that gathered
statistics on how common each "character" is, you'd want to redesign
it to use either grapheme clusters or scalar values, then improve it
to merge duplicate characters.  You'd need to roll your own iterator
though, Python doesn't provide a method that's specifically grapheme
clusters or scalar values (and if I'm wrong I'd love to hear it!).

--
Adam Olsen, aka Rhamphoryncus




More information about the Python-list mailing list