C is it always faster than nump?

Avi Gross avigross at verizon.net
Fri Feb 25 23:37:34 EST 2022


Yes, Chris, C is real as a somewhat abstract concept. There are a whole slew of different variations each time it is released anew with changes and then some people at various times built actual compilers that implement a varying subset of what is possible, and not necessarily in quite the same way.

As you gathered, I am saying that comparing languages is not so effective as comparing implementations and even better specific programs on specific data. And yet, you can still get odd results if you cherry pick what to test. Consider a sorting algorithm that rapidly checks if the data is already sorted, and if so, does not bother sorting it. It will quite possibly be the fastest one in a comparison if the data is chosen to be already in order! But on many other sets of data it will have wasted some time checking if it is in order while other algorithms have started sorting it!

Bad example, maybe, but there are better ones. Consider an algorithm that does no checking for one of many errors that can happen. It does not see if the arguments it gets are within expected ranges of types or values. It does not intercept attempts to divide by zero and much more. Another algorithm is quite bulletproof and thus has lots more code and maybe runs much slower. Is it shocking if it tests slower . But the other code may end up failing faster in the field and need a rewrite.

A really fair comparison is often really hard. Languages are abstract and sometimes a new implementation makes a huge change.

Take interpreted languages including Python and R that specify all kinds of functions that may be written within the language at first. Someone may implement a function like sum() (just an example) that looks like the sum of a long list of items is the first item added to a slightly longer sum of the remaining items. It stops when the final recursive sum is about to be called with no remaining arguments. Clearly this implementation may be a tad slow. But does Python require this version of sum() or will it allow any version that can be called the same way and returns the same results every time? Does it even matter if the function is written in C or C++ or FORTRAN or even assembler of some kind, as long as it is placed in an accessible library and there is some interface that allows you to make the call in python notation and it is fed to the function in the way it requires, and similarly deals with returned values? A wrapper, sort of.

The use of such a shortcut is not against the spirit of the language. You can still specify you want the sum() function from some module, or write your own. This is true most places. I remember way back when how early UNIX shells did silly things like call /bin/echo to do trivial things, or call an external program to do something as trivial as i=i+1 and then they started building in such functionality and your shell scripts suddenly really speeded up. A non-programmer I once worked for wrote some truly humongous shell scripts that brought machines it was run on remotely in places like Japan during their day-time to their knees. Collecting billing data from all over by running a pipeline with 9 processes per line/row was a bit much. 

At first I sped it up quite a bit by using newer built-in features like I described, or doing more with fewer elements in pipelines. But I saw how much was caused by using the wrong tools for the job and there were programs designed to analyze data in various ways.

I replaced almost all of it with an AWK script that speeded things up many orders of magnitude. And, yes, AWK was not as fast as C but more trivial to program in for this need as it had so  many needed aspects built-in or happening automagically.

Would we do the entire project differently today? Definitely. All the billing records would not be sitting in an assortment of flat files all over the place but rather be fed into some database that made retrieval of all kinds of reports straightforward without needing to write much code at all.

How many modules or "packages" were once written largely using the language and then gradually "improved" by replacing parts, especially slower parts, with external content as we have been discussing? In a sense, some Python applications run on older versions of Python may be running faster as newer versions have improved some of the "same" code while to the user, they see it running on the same language, Python?

-----Original Message-----
From: Chris Angelico <rosuav at gmail.com>
To: python-list at python.org <python-list at python.org>
Sent: Fri, Feb 25, 2022 2:58 pm
Subject: Re: C is it always faster than nump?


On Sat, 26 Feb 2022 at 06:44, Avi Gross via Python-list
<python-list at python.org> wrote:
>
> I agree with Richard.
>
> Some people may be confused and think c is the speed of light and relativistically speaking, nothing can be faster. (OK, just joking. The uses of the same letter of the alphabet are not at all related. One is named for the language that came after the one named B, while the other may be short for celeritas meaning speed.)
>
> There is no such thing as C. C does nothing. It is a combination of a language specification and some pieces of software called compilers that implement it well or less well.
>

Uhh, that's taking it a little bit TOO far.... I agree with your
point, but saying that there's no such thing as C is slightly unfair
:)

> There is such a thing as a PROGRAM. A program completely written in C is a thing. It can run fast or slow based on a combination of how it was written and on what data it operates on, which hardware and OS and so on. AND some of it may likely be running code from libraries written in other languages like FORTRAN that get linked into it in some way at compile time or runtime, and hooks into the local OS and so on.
>
> So your program written supposedly in pure C, may run faster or slower. If you program a "sort" algorithm in C, it may matter if it is an implementation of a merge sort or at bubble sort or ...
>

More specifically: You're benchmarking a particular *implementation*
of a particular *algorithm*. Depending on what you're trying to
demonstrate, either could be significant.

Performance testing between two things written in C is a huge job.
Performance testing across languages has a strong tendency to be
meaningless (like benchmarking Python's integers against JavaScript's
numbers).

> As noted, numpy is largely written in C. It may well be optimized in some places but there are constraints that may well make it hard to optimize compared to some other implementation without those constraints. In particular, it interfaces with standard Python data structures at times such as when initializing from a Python List, or List of Lists, or needing to hold on to various attributes so it can be converted back, or things I am not even aware of.
>

(Fortran)

In theory, summing a Numpy array should be incredibly fast, but in
practice, there's a lot of variation, and it can be quite surprising.
For instance, integers are faster than floats, everyone knows that.
And it's definitely faster to sum smaller integers than larger ones.

rosuav at sikorsky:~$ python3 -m timeit -s 'import numpy; x =
numpy.array(range(1000000), dtype=numpy.float64)' 'numpy.sum(x)'
1000 loops, best of 5: 325 usec per loop
rosuav at sikorsky:~$ python3 -m timeit -s 'import numpy; x =
numpy.array(range(1000000), dtype=numpy.int64)' 'numpy.sum(x)'
500 loops, best of 5: 551 usec per loop
rosuav at sikorsky:~$ python3 -m timeit -s 'import numpy; x =
numpy.array(range(1000000), dtype=numpy.int32)' 'numpy.sum(x)'
500 loops, best of 5: 680 usec per loop

... Or not.

Summing arrays isn't necessarily the best test of numpy anyway, but as
you can see, testing is an incredibly difficult thing to get right.
The easiest thing to prove is that you have no idea how to prove
anything usefully, and most of us achieve that every time :)

ChrisA



> So, I suspect it may well be possible to make a pure C library similar to numpy in many ways but that can only be used within a C program that only uses native C data structures. It also is possible to write such a program that is horribly slow. And it is possible to write a less complex version of numpy that does not support some current numpy functionality and overall runs much faster on what it does support.
>
> I do wonder at the reason numpy and pandas and lots of other modules have to exist. Other languages like R made design choices that built in ideas of vectorization from the start. Python has lots of object-oriented extensibility that can allow you to create interpreted code that may easily extend it in areas to have some similar features. You can create an array-like data structure that holds only one object type and is extended so adding two together (or multiplying) ends up doing it componentwise. But attempts to do some such things often run into problems as they tend to be slow. So numpy was not written in python, mostly, albeit it could have been even more impressive if it took advantage of more pythonic abilities, at a cost.
>
> But now that numpy is in C, pretty much, it is somewhat locked in when and if other things in Python change.
>
> The reality is that many paradigms carried too far end up falling short.
>
>
> -----Original Message-----
> From: Richard Damon <Richard at Damon-Family.org>
> To: python-list at python.org
> Sent: Fri, Feb 25, 2022 1:48 pm
> Subject: Re: C is it always faster than nump?
>
>
> On 2/25/22 4:12 AM, BELAHCENE Abdelkader wrote:
> > Hi,
> > a lot of people think that C (or C++) is faster than python, yes I agree,
> > but I think that's not the case with numpy, I believe numpy is faster than
> > C, at least in some cases.
> >
> My understanding is that numpy is written in C, so for it to be faster
> than C, you are saying that C is faster that C.
>
> The key point is that numpy was written by skilled programmers who
> carefully optimized their code to be as fast as possible for the major
> cases. Thus it is quite possible for the numpy code to be faster in C
> than code written by a person without that level of care and effort.
>
> There are similar package available for many languages, including C/C++
> to let mere mortals get efficient numerical processing.
>
> --
> Richard Damon
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


More information about the Python-list mailing list