[Pandas-dev] Import time/size optimization - how much do people care?

Joris Van den Bossche jorisvandenbossche at gmail.com
Tue Oct 12 19:22:00 EDT 2021


On Tue, 12 Oct 2021 at 19:24, Marc Garcia <garcia.marc at gmail.com> wrote:

> +1 on all them
>
> I don't think 3 should be that complex, I might be wrong.
>
> On Tue, Oct 12, 2021 at 12:21 PM Brock Mendel <jbrockmendel at gmail.com>
> wrote:
>
>> > For this do you have in mind moving imports from the top of the file
>> into the functions that use them in our code base. Or would it be more not
>> loading components of pandas until the user uses them (components like
>> plotting, timeseries, IO connectors...)
>>
>> Some of each.  The main candidates I've looked at recently
>>
>> 1) make pyarrow import lazy (~15%
>> https://github.com/pandas-dev/pandas/issues/41432#issuecomment-939083050)
>>
>
You are linking to an issue that is explicitly about *not* having the
pyarrow import lazy (because we need to register extension types). For the
reasons mentioned in the issue, I would prefer to keep pyarrow as a
non-lazy import.

Joris


> 2) make pandas.io.api imports (into pd namespace) lazy (4-5%)
>> 3) avoid @doc/@Appender/@Substitution at runtime (~4-5% but a PITA i
>> think not worth it)
>>
>> On Tue, Oct 12, 2021 at 9:46 AM Marc Garcia <garcia.marc at gmail.com>
>> wrote:
>>
>>> Hi Brock, thanks for having a look at this.
>>>
>>> Just a question. For this do you have in mind moving imports from the
>>> top of the file into the functions that use them in our code base. Or would
>>> it be more not loading components of pandas until the user uses them
>>> (components like plotting, timeseries, IO connectors...). The main
>>> difference being that in the latter case, most Python files would keep the
>>> imports at the top, but we'd avoid loading pandas modules until needed.
>>>
>>> Feels like the latter, where it makes sense, could be a nice thing not
>>> only for the loading time and the base memory footprint.
>>>
>>> On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel <jbrockmendel at gmail.com>
>>> wrote:
>>>
>>>> I've spent some time looking at our import time and the memory
>>>> footprint at import and I _think_ we can cut another 20-30% by e.g.
>>>> lazifying imports.  The last 5-10% of that is pretty hairy though.
>>>>
>>>> My question for the community is: is this worth optimizing?  Is there
>>>> anyone (dask maybe?) for whom import time and memory footprint is a pain
>>>> point?
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>
>>> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211013/0c9085a5/attachment.html>


More information about the Pandas-dev mailing list