Bleeding Edge Dependency Testing Using Python
Dependency management in any language can be a challenge and Python is no exception. Tools like pip and conda use dependency resolvers to try and honor the requirements given to them, but oftentimes version conflicts prevent installation; this problem became more apparent when pip introduced a new resolver in October 2020. New versions of an upstream package can break your code, and tracking down the culprit can be even more challenging if you have a long list of transient dependencies.
edgetest is an open source plugin-based Python package designed to help developers test their code against new versions of their existing dependencies. edgetest helps alleviate some of the burden of dependency management by: - creating a virtual environment; - installing your local package into the environment; - upgrading specified dependency package(s); and - running your test command (e.g. pytest).
Maintenance cost and environment management has become a part of “running the engine” with the pip resolver. Now, edgetest can help reduce the maintenance cost of packages by automating bleeding edge dependency testing. For example, if you depend on pandas>=0.25.1,<=1.0.0, edgetest will test your project against the most current pandas version (1.4.1 as of writing). With an effective test suite, you will know whether you can safely upgrade to pandas>=0.25.1,<=1.4.1 or not. edgetest will report whether or not it is safe to upgrade based on the test results. It will do this for each dependency individually before upgrading all upstream packages to identify any potential interactions.
Why We Built edgetest
After pip introduced a dependency resolver in October 2020, we decided to take a more prescriptive approach to dependency pinning for internal projects at Capital One. Specifically, this involved adding both lower and upper pins to any direct dependencies for all packages. However, this decision added a new form of maintenance cost: updating the pins. We needed an automated way to help remediate security vulnerabilities identified in packages and continue to support the latest version of dependencies in a way that scaled. edgetest was a solution to this problem given the number of Python packages our team supported during that time. Machine learning packages can often have complex dependency structures and experimentation with new features is critical. While implementations of models should always pin packages to ensure deterministic behavior and auditability, we don't want the tools themselves to be unnecessarily restrictive, but to allow for some flexibility in their implementation.
We can now have scheduled CI/CD jobs which automatically run edgetest against many internal libraries to run unit tests and bump dependency pins, ensuring some level of trust in the latest versions. We should note that having robust unit testing is really critical to getting the most out of edgetest.
Is This Different from GitHub’s Dependabot?
edgetest is not tied to a particular version control system like GitHub. It also prevents accidental updates by only upgrading package dependencies when unit tests are passing. Finally, some users want to focus on a subset of their dependency tree for updates. Sometimes these are dependencies that release often (e.g. boto3), and sometimes these packages are core to their library’s functionality. edgetest offers multiple configuration options to help users create a test and upgrade a system that is functional for their use case.
edgetest in Action
For example, let’s imagine a simple toy_package like so:
To configure edgetest we can include the following within our setup.cfg:
And run with the following command line statement: edgetest -c setup.cfg
This will tell edgetest to use Python 3.9 and also to install the tests extra into each conda environment. edgetest will create three environments: pandas, numpy, and all-requirements. In the first two, it will only upgrade the respective packages with those names, and the all-requirements will upgrade both pandas and numpy. Next, the test command (pytest is the default) is run in each environment and the results are reported back to the user:
Alternatively you can also provide the –export flag to write the changes to setup.cfg for you if you wish.
Using a Plugin Architecture
One of the goals for edgetest was to ensure easy extensibility, which led us to use pluggy. Pluggy enables the extension or modification of the core package enabling the Python community to build custom plugins to interact with edgetest. Currently there are three plugins: conda, pip-tools, and hub.
With pluggy we mapped out several hookspecs:
- This hook allows the user to add global or environment-level hooks to the configuration schema
- This hook returns the path to the Python executable
- This hook creates the virtual environment.
These hooks allow a user to override and inject their custom code to complement or override the base functionality. The create_environment hook for example enables using custom environment managers such as conda. Below is a comparison of the hookimpl between the base functionality and plugin override:
One of the biggest benefits of edgetest is the ability to automate. A GitHub Action built on edgetest is available for users of the CI/CD platform and allows you to automate your build, test, and deploy dependency management. Using the GitHub Action is relatively easy, below is an example of using the action in your YAML:
Run-edgetest-action can be found on the GitHub Marketplace, where users can search for tools that add functionality and improve workflows. More details can be found in the README of the run-edgetest-action.
In conclusion, edgetest is an example of Capital One’s commitment to an “open source first” approach to software development using Python. edgetest helps developers test their code against new versions of their existing dependencies and reduces the maintenance cost of packages by automating bleeding edge dependency testing. It creates a virtual environment, installs your library, upgrades specified dependencies, and runs test commands. Afterwards, edgetest will report whether or not it is safe to upgrade based on the test results. We encourage readers to check out edgetest on GitHub for more details on the project and how to contribute to edgetest.
Capital One is a proud member of the Python Software Foundation and our team presented on Data Profiler, an open source machine learning technology for data monitoring, at PyCon US earlier this year.