[SciPy-User] Need clarification on paper "The NumPy array: a structure for efficient numerical computation": what does "vectorize" really mean?

Robert Kern robert.kern at gmail.com
Sat Jan 31 12:44:37 EST 2015


On Sat, Jan 31, 2015 at 4:49 PM, Brian Merchant <bhmerchant at gmail.com>
wrote:
>
> In "The NumPy array: a structure for efficient numerical computation"
(2011, http://arxiv.org/abs/1102.1523), the authors use the verb/adjective
"vectorize" in such a way that I need clarification.
>
> On page 3, in subsection "Numerical operations on arrays: vectorization",
I get the (perhaps incorrect?) impression that "vectorization" refers to
"an operation that can be run using C for loops over C arrays". Already,
the vocabulary seems a little weird to me, since if "vectorization" really
just means "for loops in a low level language rather than a high level
language"...why create a word for the concept based on the root word
"vector"? Perhaps there is some history there that I am missing, but I can
accept that definition. The answer to the following StackOverflow seems to
suggest that "vectorization" means "implemented in a lower level language":
http://stackoverflow.com/questions/17483042/explain-the-speed-difference-between-numpys-vectorized-function-application-vs

Rather, it means that the loop over the array is implicit in the syntax of
the language rather than explicit; that is, the language deals with arrays
("vectors") as first-class objects with their own mathematical operations
rather than just containers of numbers that one must operate on
independently. For example, Fortran 90 has vectorized operations, but it is
not calling down to a lower level language to do it. It's just part of the
language. E.g.

  http://www.cs.uwm.edu/~cs151/Bacon/Lecture/HTML/ch11s12.html

The use of the term "vector" for this does have long history in computing
to refer to things like this:

  http://en.wikipedia.org/wiki/Vector_processor#Early_work

> However, on page 5, I see the following sentence: "In a non-vectorized
language, no temporary arrays need to be allocated when the output values
are calculated in a nested for-loop, e.g. (in C)". Hold on -- how does
"vectorized" make sense here? Should I interpret it as "in a low level
language, no temporary arrays need to be allocated when the output values
are calculated..."? If yes, well...why would temporary arrays need to be
allocated in a higher level language?

This isn't true of all languages with vector operations (c.f. Fortran 90),
just some high-level languages. As for Python, the temporaries come in
because the operations are evaluated independently of the operations in the
expression, and the loops happen inside of each operation. In numpy,

  A = B * C + D

(B*C) must be calculated first into a temporary before D can be added to
it. Each one of those are a separate loop. Compare this to the typical
implementation in C which would allocate the space for A and do a single
loop:

  for (i=0; i<length; i++) {
    A[i] = B[i] * C[i] + D[i];
  }

Since Python does not generate machine code, it does not analyze
expressions to see if it can optimize the expression like this. It just
calls the function corresponding to the * operator and then the function
corresponding to the + operator.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20150131/88b391e6/attachment.html>


More information about the SciPy-User mailing list