Basic Usage

Input data

Coppafisher requires raw, uint16 microscope images, metadata, and a configuration file. We currently only support raw data in ND2, JOBs, or numpy format. If your data is not already in one of these formats, we recommend configuring your data into numpy format.

There must be an anchor round. There must be an anchor channel (this can be a sequencing channel). There must be a dapi channel in every sequencing round and the anchor round. The tiles must have at least four z planes. Use a number of z planes that is a multiple of two.

Tile Indexing Conventions

Input tiles can be indexed differently to coppafisher. You can use this diagnostic.

Numpy

Each round is separated between directories. Label sequencing round directories 0, 1, etc. We recommend using dask, this is installed in your coppafisher environment by default. The code to save data in the right format would look something like

import os
import dask.array

raw_path = "/path/to/raw/data"
dask_chunks = (1, n_total_channels, n_y, n_x, n_z)
for r in range(n_seq_rounds):
    save_path = os.path.join(raw_path, f"{r}")
    image_dask = dask.array.from_array(seq_image_tiles[r], chunks=dask_chunks)
    dask.array.to_npy_stack(save_path, image_dask)

# Anchor round
save_path = os.path.join(raw_path, "anchor")
image_dask = dask.array.from_array(anchor_image, chunks=dask_chunks)
dask.array.to_npy_stack(save_path, image_dask)

where n_... variables represent counts (integers), n_total_channels can include other channels other than the sequencing channel (e.g. a DAPI channel and anchor channel). seq_image_tiles is a numpy array of shape (n_seq_rounds, n_tiles, n_total_channels, n_y, n_x, n_z), while anchor_image is a numpy array of shape (n_tiles, n_total_channels, n_y, n_x, n_z). Note that n_y must equal n_x.

Metadata

The experimental metadata must be saved in the same location as the raw input files. This can be done using Python:

import json

metadata = {
    "n_tiles": n_tiles,
    "n_rounds": n_rounds,
    "n_channels": n_total_channels,
    "tile_sz": n_y, # or n_x
    "pixel_size_xy": 0.26,
    "pixel_size_z": 0.9,
    "tile_centre": [n_y / 2, n_x / 2, n_z / 2],
    "tilepos_yx": tile_origins_yx,
    "tilepos_yx_nd2": list(reversed(tile_origins_yx)),
    "channel_camera": [1] * n_total_channels,
    "channel_laser": [1] * n_total_channels,
    "xy_pos": tile_xy_pos,
    "nz": n_z,
}
file_path = os.path.join(raw_path, "metadata.json")
with open(file_path, "w") as f:
    json.dump(metadata, f, indent=4)

Code book

A code book is a .txt file that tells coppafisher the gene codes for each gene. Each digit is the dye index for each sequencing round. An example of a four gene code book is

gene_0 0123012
gene_1 1230123
gene_2 2301230
gene_3 3012301

the names (gene_0, gene_1, ...) can be changed. Do not assign any genes a constant gene code like 0000000. To learn how the codes can be generated, see advanced usage. For details on how the codes are best generated, see reed_solomon_codes in the source code. See Wikipedia for algorithmic details on how gene codes are best selected.

Configuration

There are configuration variables used throughout the coppafisher pipeline. Most of these have reasonable default values, but some must be set by the user and you may wish to tweak other values for better performance. Save the config text file, like dataset_name.ini. The config file should contain, at the minimum:

[file_names]
input_dir = /path/to/input/data
output_dir = /path/to/output/directory
tile_dir = /path/to/tile/directory
; Go up to the number of sequencing rounds used.
round = round0, round1, round2, round3, round4, round5, round6
; 'anchor' given here since the anchor file is called anchor.npy.
anchor = anchor
raw_extension = .npy
raw_metadata = /path/to/metadata.json

[basic_info]
dye_names = dye_0, dye_1, dye_2, dye_3
use_rounds = 0, 1, 2, 3, 4, 5, 6
use_z = 0, 1, 2, 3, 4
use_tiles = 0, 1
anchor_round = 7
use_channels = 1, 2, 3, 4
anchor_channel = 1
dapi_channel = 0

[stitch]
expected_overlap = 0.15

[call_spots]
target_values = 1, 1, 1, 1
d_max = 0, 1, 2, 3

where the dapi_channel is the index in the numpy arrays that the dapi channel is stored at. use_channels includes the anchor_channel in this case because the anchor channel can also be used as a sequencing channel in the sequencing rounds. dye_names does not have to be set explicitly if n_seq_channels == n_dyes. expected_overlap is the fraction of the tile in x (y) dimension that is overlapping between adjacent tiles, typically 0.1-0.15. use_z contains all selected z planes, they should all be adjacent planes. It is recommended to use microscopic images where the middle z plane is roughly the brightest for best performance; this can be configured by changing the selected z planes in use_z. The z direction can be treated differently to the y and x directions because typically a z pixel corresponds to a larger, real distance. tile_dir is the tile directory, where extract images are saved to. output_dir is where the notebook and PDF diagnostics are saved. More details about every config variable can be found at coppafisher/setup/default.ini in the source code.

target_values and d_max must both have n_seq_channels numbers, one for each channel. See call spots for details on how to set the values.

Unique anchor raw file indices

If your anchor raw file has unique channel locations compared to the sequencing raw files, set raw_anchor_channel_indices under the file_names section in the config. Go to coppafisher/setup/default.ini and search for raw_anchor_channel_indices for a description and usage.

Running

Coppafisher must be run with a configuration file. In the command line

python3 -m coppafisher /path/to/config.ini

Or programmatically, using a python script

from coppafisher import run_pipeline

run_pipeline("/path/to/config.ini")

which can then be run from the command line

python3 coppafisher_script_name.py

Runtime

For an estimate of your pipeline runtime¹, in the Python terminal:

from coppafisher.utils import estimate_runtime

estimate_runtime()

then type in the relevant information when prompted.

All time estimations are made using an Intel i9-13900K @ 5.500GHz, NVIDIA RTX 4070Ti Super (optional), and NVMe local SSD. Raw, ND2 input files were saved on a server with read speed of ~200 MB/s. ↩