[Python-ideas] Vectorization [was Re: Add list.join() please]

Thu Feb 7 19:26:31 EST 2019

On Thu, Feb 7, 2019 at 6:48 PM Steven D'Aprano <steve at pearwood.info> wrote:

> I'm sorry, I did not see your comment that you thought new syntax was a
> bad idea. If I had, I would have responded directly to that.
>

Well... I don't think it's the worst idea ever.  But in general adding more
operators is something I am generally wary about.  Plus there's the "grit
on Uncle Timmy's screen" test.

Actually, if I wanted an operator, I think that @ is more intuitive than
extra dots.  Vectorization isn't matrix multiplication, but they are sort
of in the same ballpark, so the iconography is not ruined.

> We can perform thought-experiments and
> we don't need anything but a text editor for that. As far as I'm
> concerned, the thought experiment of comparing these two snippets:
>
>     ((seq .* 2)..name)..upper()
>
> versus
>
>     map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2,
> seq)))
>

OK... now compare:

    (Vec(seq) * 2).name.upper()

Or:

    vec_seq = Vector(seq)
    (vec_seq * 2).name.upper()
    # ... bunch more stuff
    seq = vec_seq.unwrap()

I'm not saying the double dots are terrible, but they don't read *better*
than wrapping (and optionally unwrapping) to me.

If we were to take @ as "vectorize", it might be:

    (seq @* 2) @.name @.upper()

I don't hate that.

demonstrates conclusively that even with the ugly double dot syntax,
> infix syntax easily and conclusively beats map.
>

Agreed.

> If I recall correctly, the three maps here were originally proposed by
> you as examples of why map() alone was sufficient and there was no
> benefit to the Julia syntax.

Well... your maps are kinda deliberately ugly.  Even in that direction, I'd
write:

    map(lambda s: (s*2).name.upper(), seq)

I don't *love* that, but it's a lot less monstrous than what you wrote.  A
comprehension probably even better:

    [(s*2).name.upper() for s in seq]

Again, I apologise, I did not see where you said that this was intended
> as a proof-of-concept to experiment with the concept.
>

All happy.  Puppies and flowers.

> If the Vector class is only a proof of concept, then we surely don't
> need to care about moving things in and out of "vector mode". We can
> take it as a given that "the real thing" will work that way: the syntax
> will be duck-typed and work with any iterable,

Well... I at least moderately think that a wrapper class is BETTER than new
syntax. So I'd like the proof-of-concept to be at least moderately
functional.  In any case, there is ZERO code needed to move in/out of
"vector mode." The wrapped thing is simply an attribute of the object.
When we call vectorized methods, it's just `getattr(type(item), attr)` to
figure out the method in a duck-typed way.

one of the things which lead me to believe that you thought that a
> wrapper class was in and of itself a solution to the problem. If you had
> been proposing this Vector class as a viable working solution (or at
> least a first alpha version towards a viable solution) then worrying
> about round-tripping would be important.
>

Yes, I consider the Vector class a first alpha version of a viable
solution.  I haven't seen anything that makes me prefer new syntax.  I feel
like a wrapper makes it more clear that we are "living in vector land" for
a while.

The same is true for NumPy, in my mind.  Maybe it's just familiarity, but I
LIKE the fact that I know that when my object is an ndarray, operations are
going to be vectorized ones.  Maybe 15 years ago different decisions could
have been made, and some "vectorize this operation syntax" could have made
the ndarray structure just a behavior of lists instead.  But I think the
separation is nice.

> But as a proof-of-concept of the functionality, then:
>
>     set( Vector(set_of_stuff) + spam )
>     list( Vector(list_of_stuff) + spam )
>

That's fine.  But there's no harm in the class *remembering* what it wraps
either.  We might want to distinguish:

    set(Vector(some_collection) + spam)             # Make it a set after
the operations
    (Vector(some_collection) + spam).unwrap()  # Recover whatever type it
was before

> Why do you care about type uniformity or type-checking the contents of
> the iterable?
>

Because some people have said "I want my vector to be specifically a
*sequence of strings* not of other stuff"

And MAYBE there is some optimization to be had if we know we'll never have
a non-footype in the sequence (after all, NumPy is hella optimized).
That's why the `stringpy` name that someone suggested.  Maybe we'd bypass
most of the Python-land calls when we did the vectorized operations, but
ONLY if we assume type uniformity.

But yes, I generally care about duck-typing only.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190207/95964e64/attachment.html>