Contributing
Algorithm
Coppafisher is built on the principle: An algorithm that performs well does not need to be changed. So, algorithms are only updated when there is evidence that it can perform better and that the current algorithm is performing worse.
Installation
We use a protected staging branch called dev
, for a future release. This must be pull requested into and pass
continuous integration tests. The main
branch is pushed into from dev
to publish a new release. main
is always the
latest stable release for users to easily install the software.
While changing code, install coppafisher as usual but keep the downloaded local source code directory. Then install dev packages
Local Code Location
Avoid cloning coppafisher inside a subdirectory named coppafisher
because this could cause strange errors to
occur.
pip install -r requirements-dev.txt
Also, put coppafisher into editable mode while changing source code
pip install -e .
Now all local code changes immediately take affect.
Pre-Commit
Pre-commit hooks will automatically run on every git commit. This will ensure files are consistently formatted and checked. It also runs linting rules through ruff. Use pre-commit hooks by
pre-commit install
You can run pre-commit checks manually as well:
pre-commit run --all-files
Auto-update pre-commits (recommended):
pre-commit autoupdate
If a commit is pushed that fails a pre-commit hook, then the GitHub integration workflow will catch it.
Tests
Tests are run via pytest. Scripts are unit tested by placing the test scripts
inside a directory called test
within the script's directory. All test script file names should start with test_
.
The scripts must end with their relative directory (directories) and their script file name, separated by underscores.
For example, the test script for coppafisher/omp/coefs.py
is named test_omp_coefs.py
. Check existing tests for
examples.
Run Tests
Run unit tests (~10s)
pytest
Run integration tests (~90s)
pytest -m integration
Run unit tests requiring a notebook (~12s)
pytest -m notebook
View code coverage by appending --cov=coppafisher --cov-report term
to each command.
Run Documentation Locally
mkdocs serve
Then go to http://127.0.0.1:8000/ in a modern browser.
Code Philosophy
We follow basic rules when coding. Anyone can code something that works, but coding it in a scaleable, maintainable way is another struggle altogether.
Here are some specific standards to follow:
- Knowledge written down twice is bad code. Don't Repeat Yourself (DRY)!
- If a bug is found, the bug must be automatically found if it is to occur again.
- All code is black formatted.
- Every time a function is modified or created, a new unit test must be created for the function. A pre-existing unit test can be drawn from to build a new unit test, but it should be clear in your mind that you are affectively building a new function.
- Minimise
if
/else
branching as much as possible. Exitif
/else
nesting as soon as possible through the use of keywords likeraise
,continue
,break
andreturn
, whenever feasible. - Do not over-shorten a variable or function name.
- Variables and functions are not capitalised, classes are.
- In most cases, a line of code should do only one operation.
- Every docstring for a function must be complete so a developer can re-create the function without seeing any of the existing source code.
- Each parameter in a function must have an independent, clear functionality. If two parameters are derivable from one another, you are doing something wrong. This also applies to the function's return variables.
- Minimise the number of data types a parameter can be and use common sense. For example, a parameter that can be
int
orNone
is reasonable. A parameter that can bebool
orfloat
is not reasonable. - The documentation should update in parallel with the code. Having the documentation as part of the github repository makes this easier.
Docstrings
While not all docstrings are consistent yet, future docstrings follow the rules below:
- The code must be reproducible from the docstring alone.
- Use Google's style.
`ndarray`
represents a numpy ndarray and`zarray`
represents a zarr Array.`zgroup`
represents a zarr Group.- Specify datatype of a
ndarray
/zarray
when applicable. For example, to represent any floating point datatype,`ndarray[float]`
or a uint16 by`ndarray[uint16]`
- Specify the shape of a
ndarray
/zarray
in brackets when applicable. For example,`(n_tiles x n_rounds x n_channels_use x 3) ndarray[int32]`
- The use of
n_rounds
refers to the number of rounds, including the sequencing and anchor round. So, this is equal ton_seq_rounds + 1
. We label all sequencing rounds0, 1, 2, 3, ...
and then the anchor round is given the next unused integer. Whereas,n_rounds_use
refers tolen(use_rounds)
which is the total number of sequencing rounds. - Channels are slightly different because
use_channels
in the notebook can have channel indices of any positive integer value. These represent the sequencing channels. For example,use_channels = 0, 5, ..., 27
. So,n_channels
refers to sizemax(use_channels) + 1
, i.e. the smallest shape that can be indexed byuse_channels
. Whereas,n_channels_use
meanslen(use_channels)
such that0
representsuse_channels[0]
etc. Note that neither of these definitions includes the dapi channel/anchor channel1, which can be found atnb.basic_info.dapi_channel
andnb.basic_info.anchor_channel
respectively.
Below is a docstring example that demonstrates most of the rules.
from typing import Tuple
import numpy as np
import torch
import zarr
def large_function(
arr_0: np.ndarray[np.float16],
arr_1: torch.Tensor,
arr_2: zarr.Array,
number: float | None = None,
) -> Tuple[zarr.Group, float, int]:
"""
An description of exactly what the function does. This docstring must contain
enough detail to make the exact function again, without looking at any code.
Args:
arr_0 (`(n_pixels x 3) ndarray[float32]`): a description of arr_0.
arr_1 (`(n_pixels x n_rounds x n_channels) tensor[uint32]`): a description
of arr_1.
arr_2 (`(n_pixels x n_rounds x (n_channels + 1)) zarray[uint16]`): a
description of arr_2.
number (float or none, optional): a description of number. Default: none.
Returns:
A tuple containing:
- (zgroup): zgroup_0. A zarr Group containing arrays named zarr_0,
zarr_1, and zarr_2.
- (float): variable_0. A description of variable_0.
- (int): variable_1. A description of variable_1.
"""
...
-
The anchor channel can be a sequencing channel, but this does not have to be the case. ↩