Python on Arm: 2025 Update

Written by Diego Russo, Arm Ltd

Why Python Matters to Arm

Python is one of the most widely used programming languages today, powering applications across Machine Learning (ML), automation, data science, DevOps, web development and developer tooling. At Arm, we see Python not just as a language to support, but as a strategic priority to enable a wide and growing community of developers.

Over the past several years, we have worked closely with the Python community to make Arm a first-class platform for Python development. Thanks to consistent upstream collaboration, targeted engineering, and ecosystem investment, it is now practical to develop, test, and deploy Python workloads on Arm across Linux, Windows, and the cloud.

In 2024, we shared how Arm had increased its engagement with the Python ecosystem. One year later, we are seeing the results of that investment, with new infrastructure, improved performance, and a growing number of real-world projects running on Arm.

This post highlights the key developments from the past year and what is ahead.

What's new in 2025

Easier development: Linux and Windows GitHub-hosted runners for Arm

As part of our collaboration with GitHub, Arm helped enable GitHub-hosted CI runners for Arm-based platforms. These runners are now available for:

Arm sponsored the underlying infrastructure and provided engineering support during the beta rollout. These runners offer open-source projects a fast, reliable way to run native CI workflows without emulation or self-hosting.

The CPython project was the first open-source user of the Windows on Arm runners and continues to use them in daily CI pipelines. This helps ensure first-class support for the platform.

Performance improvements

Python 3.13 introduced an experimental Just-in-Time (JIT) compiler, developed by the CPython team to improve performance in real-world applications. Arm has contributed directly to this effort by testing, tuning, and extending the JIT on Arm platforms, particularly for the AArch64 architecture (see section below).

Our work includes fixing architecture specific issues, validating generated machine code, and improving the overall quality of JIT output on Arm. These efforts have resulted in:

Up to 4% speedup on Linux
17% reduction in generated header file size
Smarter jump handling and more efficient code generation
Lower memory overhead through trampoline reuse and targeted optimizations

The result is a faster, more reliable JIT experience for Python workloads running on Arm.

Better Windows on Arm ecosystem support

Python support for Windows on Arm continues to mature. CPython itself, along with many essential packages, now builds and runs cleanly on the platform. This is thanks to upstream fixes, improved build systems, and expanded CI coverage.

We are working closely with Microsoft to improve the overall Python experience on Windows on Arm devices. This includes:

Enabling compatibility for popular libraries
Refining build and packaging workflows
Supporting key AI and ML tools

One area of major progress is PyTorch, where the collaboration between Arm and Microsoft has delivered native builds and improved acceleration support.

With the release of PyTorch 2.7 for Windows on Arm, developers can now access Arm-native builds for Windows, available for Python 3.12. This enables ML workflows to run natively on Arm64 Windows devices, including Copilot+ PCs, with full access to hardware capabilities.

These improvements support a wide range of ML use cases. From generative models like Stable Diffusion, to natural language processing, to traditional regression and classification. Windows on Arm is now a production-ready platform for modern AI development.

Arm's commitment to the Python community

We continue to support the Python community not only through code, but with infrastructure, funding, and time through:

Hosting the CPython Core Dev Sprint 2025 in our Cambridge office
Sponsoring EuroPython 2022, 2023, and 2025
Providing a dedicated benchmarking server integrated with speed.python.org
Funding a full-time CPython developer, now a core committer

Arm is committed to supporting the Python ecosystem through sustained upstream contributions and community investment. Over the past year, we have expanded our efforts to support the community both technically and organizationally.

CPython Core Dev Sprint 2025

We are proud to host and sponsor the upcoming CPython Core Dev Sprint 2025 this September at our Cambridge office. On track to be the largest sprint ever held, the event will bring together more than 55 core developers and contributors flying in from across Europe, the United States, South Korea, Singapore, and Australia.

These sprints are vital to Python’s evolution, enabling in-person design discussions, upstream planning, and concentrated development work that is difficult to coordinate remotely. By supporting this global event, Arm is helping to accelerate Python’s technical roadmap while strengthening our collaboration with the people who maintain and shape the language.

Supporting EuroPython

Also, Arm sponsored EuroPython 2025 for the third time, having previously supported the conference in 2022 and 2023. This helps foster collaboration between maintainers, contributors, and users across the globe

EuroPython Arm Sponsor

EuroPython Arm Booth

EuroPython JIT talk

Arm presented a dedicated talk on the JIT at EuroPython 2025, titled Exploring the CPython JIT, offering a technical deep dive into the state and evolution of the JIT on Arm platforms. We will share insights from our work with the new JIT compiler in Python 3.14.

EuroPython JIT talk

Dedicated Arm benchmarking server

To assist Python’s core developers with performance tracking, Arm now provides a dedicated benchmarking server running on Arm hardware that publishes results on speed.python.org. This server integrates with Python’s performance monitoring tools, helping assess the impact of code changes on Arm platforms and enabling more data-driven optimization.

Arm benchmarks

Upstream-first contributions to CPython

Arm also continues to invest directly in upstream work. As a result of this work, I was recently promoted to CPython core developer, further strengthening our technical collaboration with the Python project.

Over the past year, Arm has contributed across multiple areas of the interpreter, including:

Improvements to the CPython JIT
Improvements in the micro-op optimizer
Enhancements to performance benchmarking infrastructure
Reviewing pull requests and triaging issues

These efforts reflect Arm’s broader goal: not just platform enablement, but sustained, upstream-first contributions to the Python language itself.

Engineering contributions

Arm’s recent contributions to CPython focus on improving the JIT and performance on AArch64 platforms. Each of these changes improves JIT-generated machine code for Arm platforms, leading to more efficient Python execution.

While still in its early stages, the JIT is already showing potential in targeted workloads. Our contributions help lay the groundwork for future speedups on Arm-based systems.

GH-123872: Generate and patch AArch64 trampolines

This pull request introduces the generation and patching of trampolines for AArch64 in the JIT compiler. By implementing these trampolines, the JIT can handle out-of-range jumps more effectively, improving the reliability and performance of JIT-compiled code on AArch64 platforms. This patch makes the JIT 0.8% faster, 0.6% less memory on Linux.

GH-121001: Replace AArch64 trampolines with LDR

Building on the previous work, this change replaces the existing trampoline mechanism with the use of the LDRinstruction for AArch64. This simplification reduces complexity and enhances the maintainability of the JIT code generation process. It decreases the size of the stencil header file generated by 17% and decreases the number of patches functions from 4 to 1.

// Old trampoline  
d2800008 mov x8, #0x0  
f2a00008 movk x8, #0x0, lsl #16  
f2c00008 movk x8, #0x0, lsl #32  
f2e00008 movk x8, #0x0, lsl #48  
d61f0100 br x8

// New trampoline  
58000048 ldr x8, 8  
d61f0100 br x8  
00000000 # Used to patch the 64-bit address  
00000000 # Used to patch the 64-bit address

GH-120250: JIT: Re-use trampolines on AArch64

This enhancement enables the JIT compiler to reuse existing trampolines on AArch64, minimizing redundancy and optimizing memory usage. By reusing trampolines, the JIT reduces the overhead associated with generating new ones for each out-of-range jump.

GH-131041: Emit AArch64 trampolines only for long jumps

This optimization ensures that trampolines are emitted only when necessary, specifically for long jumps that cannot be encoded directly. By avoiding the generation of superfluous trampolines, the JIT compiler streamlines the code generation process and reduces code size.

GH-131042: Remove trailing jump in AArch64 JIT stencils

This refinement eliminates unnecessary trailing jumps in AArch64 JIT stencils. By removing these redundant instructions, the JIT produces more efficient machine code, leading to potential performance gains during execution, including a 1.4% speedup on Linux.

// With the trailing jump that needs to be patched  
0x08, 0x00, 0x00, 0x90, // adrp x8, _JIT_OPERAND0  
0x08, 0x01, 0x40, 0xf9, // ldr x8, [x8]  
0x88, 0x1e, 0x00, 0xf9, // str x8, [x20, #0x38]  
0x00, 0x00, 0x00, 0x14, // b _JIT_CONTINUE  

// The trailing jump has been replaced with a NOP  
0x08, 0x00, 0x00, 0x90, // adrp x8, _JIT_OPERAND0  
0x08, 0x01, 0x40, 0xf9, // ldr x8, [x8]  
0x88, 0x1e, 0x00, 0xf9, // str x8, [x20, #0x38]  
0x1f, 0x20, 0x03, 0xd5, // nop (branch was removed)

Looking ahead

Arm’s engagement with the Python ecosystem will continue through 2025 and beyond. We remain focused on performance, ecosystem enablement, and upstream collaboration. Whether through infrastructure, developer funding, or contributions to CPython internals, our goal is to make Python on Arm a fast, stable, and accessible platform for all users.

We will continue to contribute across key areas of the interpreter, including the JIT, SIMD execution, and core evaluation loop. Looking ahead, we plan to explore the integration of architecture-specific instructions, such as NEON, SVE, SVE2, SME, and Arm's new SME2, into the JIT compiler to accelerate targeted workloads.

We are also committed to strengthening the community side of Python by reviewing patches, mentoring new contributors, and supporting key events that bring maintainers together.

Get involved

We’re excited about the future of Python on Arm, and we want to hear from you.

Now is the perfect time to try running your Python workloads on Arm. It’s easier than ever:

GitHub-hosted runners for Linux and Windows on Arm are available
Most PyPI packages now work out of the box on Arm systems
Arm instances are available on every major public cloud

If you run into issues, raise them upstream, or reach out to us directly.

Whether you are working on Python itself, building tools and libraries, or porting your software to Arm-based systems, we invite you to connect with us through the Arm Developer Program. It is the best way to stay up-to-date, access technical resources, and collaborate with Arm engineers and other Python developers.

Let’s keep improving Python on Arm, together.