On this page

PyBaMM GSoC 2024 Project Ideas

↩ Go back to GSoC Home

Migrate from unittest to pytest and improve PyBaMM’s testing infrastructure

PyBaMM’s inception predates the rise in popularity of pytest, and so we have used the unittest testing framework for our tests. However, pytest is now the de facto standard for testing in Python, and it is time to migrate our test cases over to it. The reason for this is that pytest is much more flexible and powerful than unittest, and will allow us to write better tests with less code. This may also involve migrating our test cases to use hypothesis for property-based testing, which will allow us to write even more powerful tests via the use of strategies, which are like generators for test cases.

PyBaMM already uses pytest with the nbmake package to run our example notebooks as tests, but still uses unittest for the rest of the tests. See (pybamm-team/PyBaMM #3617, which shall be one of the precursors to this project). pytest-xdist is also used in order to run the tests in parallel, which should end up being faster than the current serial implementation and should be adapted to be used with the new test framework and test cases.

The student will be expected to look at other popular Python packages to see how they use pytest and hypothesis to write tests, and then migrate our tests over to use these frameworks. This will involve writing new tests, and also migrating existing tests over to the new framework, making use of and exploring the pytest documentation for scouting potential features to be incorporated. The student will also be expected to improve the test coverage of the codebase, and to write tests for any new code that they end up writing. One of the niceties about pytest is its extensively configurable nature via pytest plugins, for which relevant ones can be scouted to be incorporated as a part of the migration process for unit and integration tests, coverage tests, doctests, and more.

As a stretch goal, the packaging infrastructure for PyBaMM should be updated to include a pytest test suite and runner that can be run by users to check that their installation is working correctly. This shall be intended for users who do not have access to a source installation of PyBaMM, i.e., through pybamm.test() or similar. This shall involve liaising with developers and maintainers working on the underlying build infrastructure – which may undergo modifications as PyBaMM migrates to a new build system.

Expected outcomes

Desired skills

Difficulty and suitable project length

Potential mentors

Migrate to a modern build-backend such as scikit-build-core or meson-python as a new build system for PyBaMM

There are two new build systems that are gaining popularity in the Scientific Python ecosystem: scikit-build-core and meson-python. Both of these build systems are designed to be more flexible and powerful in order to support the needs of compiled Python packages, which are becoming more common in line with Python’s use in the field of scientific computing. PyBaMM relies on a C++-based (IDAKLU) solver based on SUNDIALS, SuiteSparse, and CasADi, and thereby requires significant compilation prerequisites and build-time configuration for various platforms for installations and editable installations.

The goal of this project will be to migrate PyBaMM’s build system over to either of these new build systems, and to deprecate the current build system that is based on setuptools and wheel (refer to the pertinent issue: pybamm-team/PyBaMM #3564). This may involve writing new build scripts, adhering to the new build system’s conventions for compilation and linkage, setting up compilers and toolchains accordingly to ensure that PyBaMM works correctly on all platforms and architectures that are currently supported. It is to be noted that PyBaMM is a compiled package when installed using the wheels from PyPI releases, however, the compilation of the IDAKLU solver is optional when building from source owing to a two-stage build process – and therefore, care must be constituted through build-time flags to ensure that the compilation of this solver is not a necessity to install or use PyBaMM.

As a stretch goal, the student can explore various possibilities based on the choice of and features available in the build-system between the two proposed ones: cross-compiling PyBaMM wheels for different platforms and architectures that are not currently supported (such as ARM-based systems for Linux), establishing build caching and testing various compiler configurations to simplify the Windows builds and make them speedier, utilising partial rebuilds to speed up local development, and more.

Expected outcomes

Desired skills

Difficulty and suitable project length

Potential mentors

Build and publish pybamm-cookiecutter as a template for new PyBaMM-based projects

There is a cookiecutter template at https://github.com/pybamm-team/pybamm-cookiecutter/ that was started as a part of GSoC 2023. The goal of this project is to finish the template and release it on PyPI so that it can be used by the community to create new PyBaMM-based projects. The template has had a start as of November 2023, but it is not ready for use by researchers and scientists who are looking to add their own parameter sets and models to PyBaMM. These users may not be acquainted with managing or setting up their Python development environments or repositories, where pybamm-cookiecutter would provide a standardised template and workflow for them to set up their simulations and experiments. Please refer to pybamm-team/pybamm-cookiecutter #1 for a tentative roadmap for this project.

The student will receive an opportunity to perform each and every aspect of software engineering tasks with Python, including adding features, writing tests, writing user-facing documentation, usage examples, CI/CD pipelines for testing automation and deployment, and so on. The template is supposed to be an opinionated one, combining all the best ideas from the original PyBaMM repository and new practices in other templates used for data science and scientific computing projects and their distribution, in order to provide both extensibility and ease of use (as noted above) for new users in the battery modeling fraternity.

Expected outcomes

Desired skills

Difficulty and suitable project length

Potential mentors

Training an RAG-based machine learning model for chatbot assistance on the PyBaMM documentation

PyBaMM’s extensive documentation serves as a valuable resource for users, but accessing information efficiently can be challenging. This project aims to develop a chatbot using machine learning techniques trained on PyBaMM documentation. The chatbot will act as a virtual assistant, providing users with prompt and accurate responses to basic queries related to PyBaMM functionalities, installation instructions, usage guidelines, and troubleshooting tips.

The project will involve collecting and preprocessing a comprehensive dataset comprising PyBaMM documentation, including tutorials, API references, user guides, and FAQs. This data will then be used to train a machine learning model, selecting from various architectures such as sequence-to-sequence models or transformers. Additionally, the model will incorporate Retrieval-Augmented Generation (RAG) techniques to generate responses based on version-specific documentation, ensuring compatibility with recent features and updates (see NVIDIA’s blog on Demystifying Retrieval-Augmented Generation Pipelines).

Natural language understanding techniques will be implemented to preprocess user queries and extract relevant features. The trained model will be integrated into an interactive chatbot interface, allowing users to interact in real-time. To ensure efficient hosting and storage of the model and embeddings, knowledge of suitable platforms will be required. Priority will be given to free and open-source platforms that offer scalability, accessibility, and ease of maintenance. Finally, the chatbot’s performance will be evaluated using metrics such as accuracy and user satisfaction, with feedback used to refine and improve its responses iteratively.

Expected outcomes

Desired skills

Difficulty and suitable project length

Potential mentors