Unicode and Python - how often do you index strings?

Tue Jun 3 21:18:12 EDT 2014

In article <mailman.10656.1401842403.18130.python-list at python.org>,
 Chris Angelico <rosuav at gmail.com> wrote:

> A current discussion regarding Python's Unicode support centres (or
> centers, depending on how close you are to the cent[er]{2} of the
> universe)

<sarcasm style="regex-pedant">Um, you mean cent(er|re), don't you?  The 
pattern you wrote also matches centee and centrr.</sarcasm>

> around one critical question: Is string indexing common?

Not in our code.  I've got 80008 non-blank lines of Python (2.7) source 
handy.  I tried a few heuristics to find patterns which might be string 
indexing.

$ find . -name '*.py' | xargs egrep '\[[^]][0-9]+\]'

and then looked them over manually.  I see this pattern a bunch of times 
(in a single-use script):

data['shard_key'] = hashlib.md5(str(id)).hexdigest()[:4]  

We do this once:

if tz_offset[0] == '-':

We do this somewhere in some command-line parsing:

process_match = args.process[:15]

There's this little gem:

return [dedup(x[1:-1].lower()) for x in 
re.findall('(\[[^\]\[]+\]|\([^\)\(]+\))',title)]

It appears I wrote this one, but I don't remember exactly what I had in 
mind at the time...

withhyphen = number if '-' in number else (number[:-2] + '-' + 
number[-2:]) # big assumption here

Anyway, there's a bunch more, but the bottom line is that in our code, 
indexing into a string (at least explicitly in application source code) 
is a pretty rare thing.