Unicode and Python - how often do you index strings?

Tue Jun 3 22:37:17 EDT 2014

On 2014-06-04 12:16, Chris Angelico wrote:
> On Wed, Jun 4, 2014 at 11:11 AM, Tim Chase
> <python.list at tim.thechases.com> wrote:
> > I then take row 2 and use it to make a mapping of header-name to a
> > slice-object for slicing the subsequent strings:
> >
> >       slice(i.start(), i.end())
> >
> >     print("EmpID = %s" % row[header_map["EMPID"]].strip())
> >     print("Name = %s" % row[header_map["NAME"]].strip())
> >
> > which I presume uses string indexing under the hood.
> 
> Yes, it's definitely going to be indexing. If strings were
> represented internally in UTF-8, each of those calls would need to
> scan from the beginning of the string, counting and discarding
> characters until it finds the place to start, then counting and
> retaining characters until it finds the place to stop. Definite
> example of what I'm looking for, thanks!

For what it's worth, most of the lines in each file are under ~2k, so
even O(N) or O(log N) indexing wouldn't be grievous.  Noticeable, but
not grievous.

Glad my example could give you some fodder.

-tkc