[Python-ideas] Add list.join() please

Chris Barker chris.barker at noaa.gov
Thu Jan 31 12:51:20 EST 2019


On Wed, Jan 30, 2019 at 10:14 PM Chris Angelico <rosuav at gmail.com> wrote:

>
> I didn't, but I don't know if Chris Barker did.
>

nope -- not me either :-)


> (Can't swing a cat without hitting someone named Steve or Chris, in
> some spelling or another!)
>

good thing there aren't a lot of cats being swung around, then.

One more note about this whole thread:

I do a lot of numerical programming, and used to use MATLAB and now numpy a
lot. So I am very used to "vectorization" -- i.e. having operations that
work on a whole collection of items at once.

Example:

a_numpy_array * 5

multiplies every item in the array by 5

In pure Python, you would do something like:

[ i * 5 for i in a_regular_list]

You can imagine that for more complex expressions the "vectorized" approach
can make for much clearer and easier to parse code. Also much faster, which
is what is usually talked about, but I think the readability is the bigger
deal.

So what does this have to do with the topic at hand?

I know that when I'm used to working with numpy and then need to do some
string processing or some such, I find myself missing this "vectorization"
-- if I want to do the same operation on a whole bunch of strings, why do I
need to write a loop or comprehension or map? that is:

[s.lower() for s in a_list_of_strings]

rather than:

a_list_of_strings.lower()

(NOTE: I prefer comprehension syntax to map, but map would work fine here,
too)

It strikes me that that is the direction some folks want to go.

If so, then I think the way to do it is not to add a bunch of stuff to
Python's str or sequence types, but rather to make a new library that
provides quick and easy manipulation of sequences of strings.  -- kind of a
stringpy -- analogous to numpy.

At the core of numpy is the ndarray: a "a multidimensional, homogeneous
array
of fixed-size items"

a strarray could be simpler -- I don't see any reason for more than 1-D,
nor more than one datatype. But it could be a "vector" of strings that was
guaranteed to be all strings, and provide operations that acted on the
entire collection in one fell swoop.

If it turned out to be useful, you could even make a version in C or Cython
that might give significant performance benefits.

I don't have a use case for this -- but if someone does, it's an idea.

-CHB






Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190131/d3e1bdc3/attachment.html>


More information about the Python-ideas mailing list