[Chennaipy] Chennaipy - Monday Module - 15 May 2023

selvi dct selvi.dct at gmail.com
Mon May 15 15:30:09 EDT 2023


Date: 15 May 2023


Module : modin


Installation : pip install modin


About:

        Modin is a replacement for pandas. While pandas is single-threaded,
Modin lets instantly speed up the workflows by scaling pandas so it uses
all of your cores. Modin works especially well on larger datasets, where
pandas has challenges.

        By simply replacing the import statement, Modin offers users
effortless speed and scale for their pandas workflows:

                import modin.pandas as pd


Sample:

        import modin.pandas as pd

        import numpy as np

        df = pd.read_csv("my_dataset.csv")


        left_data = np.random.randint(0, 100, size=(2**8, 2**8))

        right_data = np.random.randint(0, 100, size=(2**12, 2**12))


        left_df = pd.DataFrame(left_data)

        right_df = pd.DataFrame(right_data)

        %timeit left_df.merge(right_df, how="inner", on=10)

        3.59 s  107 ms per loop (mean  std. dev. of 7 runs, 1 loop each)


        %timeit right_df.merge(left_df, how="inner", on=10)

        1.22 s  40.1 ms per loop (mean  std. dev. of 7 runs, 1 loop each)


Reference:

https://pypi.org/project/modin/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/chennaipy/attachments/20230516/0144ebf1/attachment.html>


More information about the Chennaipy mailing list