[Pandas-dev] Faster .apply natively

Mon Nov 23 16:36:21 EST 2020

On 21/11/2020 06:31, Abdur-Rahmaan Janhangeer wrote:
> A normal NLP function of
> reducing a sentence to it's essential lowercase version
> in 10 lines of list-comprehension processing takes an
> eternity for the ten of thousands rows. 

Calling .apply on 10k rows has an overhead of a few ms as far as I can 
tell. If it takes much longer it means that the bottleneck is in your 
function.

Then the question is more how to make that function faster, with the 
typical answer of optimizing it in Python, rewriting in a lower level 
language (Cython or maybe using numba), parallelization over rows or 
here possibly caching.

See https://pandas.pydata.org/docs/user_guide/enhancingperf.html for 
more details. The .apply function cannot really make any arbitrary 
python functions faster, and even parallelization has its limits in pure 
Python.

-- 
Roman