Contributing

Algorithm

Coppafisher is built on the principle: An algorithm that performs well does not need to be changed. So, algorithms are only updated when there is evidence that it can perform better and that the current algorithm is performing worse.

Installation

We use a protected staging branch called dev, for a future release. This must be pull requested into and pass continuous integration tests. The main branch is pushed into from dev to publish a new release. main is always the latest stable release for users to easily install the software.

While changing code, install coppafisher as usual but keep the downloaded local source code directory. Then install dev packages

Local Code Location

Avoid cloning coppafisher inside a subdirectory named coppafisher because this could cause strange errors to occur.

pip install -r requirements-dev.txt

Also, put coppafisher into editable mode while changing source code

pip install -e .

Now all local code changes immediately take affect.

Pre-Commit

Pre-commit hooks will automatically run on every git commit. This will ensure files are consistently formatted and checked. It also runs linting rules through ruff. Use pre-commit hooks by

pre-commit install

You can run pre-commit checks manually as well:

pre-commit run --all-files

Auto-update pre-commits (recommended):

pre-commit autoupdate

If a commit is pushed that fails a pre-commit hook, then the GitHub integration workflow will catch it.

Tests

Tests are run via pytest. Scripts are unit tested by placing the test scripts inside a directory called test within the script's directory. All test script file names should start with test_. The scripts must end with their relative directory (directories) and their script file name, separated by underscores. For example, the test script for coppafisher/omp/coefs.py is named test_omp_coefs.py. Check existing tests for examples.

Run Tests

Run unit tests (~10s)

pytest

Run integration tests (~90s)

pytest -m integration

Run unit tests requiring a notebook (~12s)

pytest -m notebook

View code coverage by appending --cov=coppafisher --cov-report term to each command.

Run Documentation Locally

mkdocs serve

Then go to http://127.0.0.1:8000/ in a modern browser.

Code Philosophy

We follow basic rules when coding. Anyone can code something that works, but coding it in a scaleable, maintainable way is another struggle altogether.

Here are some specific standards to follow:

Knowledge written down twice is bad code. Don't Repeat Yourself (DRY)!
If a bug is found, the bug must be automatically found if it is to occur again.
All code is black formatted.
Every time a function is modified or created, a new unit test must be created for the function. A pre-existing unit test can be drawn from to build a new unit test, but it should be clear in your mind that you are affectively building a new function.
Minimise if/else branching as much as possible. Exit if/else nesting as soon as possible through the use of keywords like raise, continue, break and return, whenever feasible.
Do not over-shorten a variable or function name.
Variables and functions are not capitalised, classes are.
In most cases, a line of code should do only one operation.
Every docstring for a function must be complete so a developer can re-create the function without seeing any of the existing source code.
Each parameter in a function must have an independent, clear functionality. If two parameters are derivable from one another, you are doing something wrong. This also applies to the function's return variables.
Minimise the number of data types a parameter can be and use common sense. For example, a parameter that can be int or None is reasonable. A parameter that can be bool or float is not reasonable.
The documentation should update in parallel with the code. Having the documentation as part of the github repository makes this easier.

Docstrings

While not all docstrings are consistent yet, future docstrings follow the rules below:

The code must be reproducible from the docstring alone.
Use Google's style.
`ndarray` represents a numpy ndarray and `zarray` represents a zarr Array.
`zgroup` represents a zarr Group.
Specify datatype of a ndarray/zarray when applicable. For example, to represent any floating point datatype, `ndarray[float]` or a uint16 by `ndarray[uint16]`
Specify the shape of a ndarray/zarray in brackets when applicable. For example, `(n_tiles x n_rounds x n_channels_use x 3) ndarray[int32]`
The use of n_rounds refers to the number of rounds, including the sequencing and anchor round. So, this is equal to n_seq_rounds + 1. We label all sequencing rounds 0, 1, 2, 3, ... and then the anchor round is given the next unused integer. Whereas, n_rounds_use refers to len(use_rounds) which is the total number of sequencing rounds.
Channels are slightly different because use_channels in the notebook can have channel indices of any positive integer value. These represent the sequencing channels. For example, use_channels = 0, 5, ..., 27. So, n_channels refers to size max(use_channels) + 1, i.e. the smallest shape that can be indexed by use_channels. Whereas, n_channels_use means len(use_channels) such that 0 represents use_channels[0] etc. Note that neither of these definitions includes the dapi channel/anchor channel¹, which can be found at nb.basic_info.dapi_channel and nb.basic_info.anchor_channel respectively.

Below is a docstring example that demonstrates most of the rules.

from typing import Tuple

import numpy as np
import torch
import zarr


def large_function(
    arr_0: np.ndarray[np.float16],
    arr_1: torch.Tensor,
    arr_2: zarr.Array,
    number: float | None = None,
) -> Tuple[zarr.Group, float, int]:
    """
    An description of exactly what the function does. This docstring must contain
    enough detail to make the exact function again, without looking at any code.

    Args:
        arr_0 (`(n_pixels x 3) ndarray[float32]`): a description of arr_0.
        arr_1 (`(n_pixels x n_rounds x n_channels) tensor[uint32]`): a description
            of arr_1.
        arr_2 (`(n_pixels x n_rounds x (n_channels + 1)) zarray[uint16]`): a
            description of arr_2.
        number (float or none, optional): a description of number. Default: none.

    Returns:
        A tuple containing:
            - (zgroup): zgroup_0. A zarr Group containing arrays named zarr_0,
                zarr_1, and zarr_2.
            - (float): variable_0. A description of variable_0.
            - (int): variable_1. A description of variable_1.
    """
    ...

The anchor channel can be a sequencing channel, but this does not have to be the case. ↩