Skip to content

Basic Usage

Input data

Coppafisher requires raw, uint16 microscope images, metadata, and a configuration file. We currently only support raw data in ND2, JOBs, numpy, or tif format. If your data is not already in one of these formats, we recommend configuring your data into numpy format. The tif file format is also explained below.

There must be an anchor round. There must be an anchor channel (this can be a sequencing channel). There must be a dapi channel in every sequencing round and the anchor round. The tiles must have at least four z planes. Use a number of z planes that is a multiple of two.

Tile Indexing Conventions

Input tiles can be indexed differently to coppafisher. You can use this diagnostic.

Numpy

Each round is separated between directories. Label sequencing round directories 0, 1, etc. We recommend using dask, this is installed in your coppafisher environment by default. The code to save input data:

import os
import dask.array

raw_path = "/path/to/raw/data"
dask_chunks = (1, n_total_channels, n_y, n_x, n_z)
for r in range(n_seq_rounds):
    save_path = os.path.join(raw_path, f"{r}")
    image_dask = dask.array.from_array(seq_image_tiles[r], chunks=dask_chunks)
    dask.array.to_npy_stack(save_path, image_dask)

# Anchor round
save_path = os.path.join(raw_path, "anchor")
image_dask = dask.array.from_array(anchor_image, chunks=dask_chunks)
dask.array.to_npy_stack(save_path, image_dask)

where n_... variables represent counts (integers), seq_image_tiles is a numpy array of shape (n_seq_rounds, n_tiles, n_total_channels, n_y, n_x, n_z), while anchor_image is a numpy array of shape (n_tiles, n_total_channels, n_y, n_x, n_z). Note that n_y must equal n_x.

Tif

Every round (anchor included) must be a .tif file located inside of the input_dir. They must have the shape (n_tiles * n_total_channels * n_z, n_y, n_x). The first axis is flattened such that the first n_total_channels are tile 0 and z plane 0 on each channel, then the next n_total_channels are tile 0 and z plane 1 on each channel. Then after n_z z planes the next n_total_channels are tile 1 and z plane 0 on each channel etc...

Metadata

The metadata file required for numpy and tif input formats. It must be saved in the same location as the raw input files. This can be done using Python:

import json

metadata = {
    "n_tiles": n_tiles,
    "n_rounds": n_rounds,
    "n_channels": n_total_channels,
    "tile_sz": n_y, # or n_x
    "pixel_size_xy": pixel_size_xy,
    "pixel_size_z": pixel_size_z,
    "tile_centre": [n_y / 2, n_y / 2, n_z / 2],
    "tilepos_yx": tile_origins_yx,
    "tilepos_yx_nd2": list(reversed(tile_origins_yx)),
    "channel_camera": [1] * n_total_channels,
    "channel_laser": [1] * n_total_channels,
}
file_path = os.path.join(raw_path, "metadata.json")
with open(file_path, "w") as f:
    json.dump(metadata, f, indent=4)

n_tiles must be the total number of tiles inside of the raw inputted files (even if you only plan on selecting a subset of them). Similarly, n_total_channels must be the total number of channels in the inputted raw files.

pixel_size_xy is the size of a pixel along the y/x axes in microns. pixel_size_z is the size of a pixel along the z axis in microns. n_y is the number of pixels along y/x for a single tile. n_z is the number of pixels along z for a single tile. tile_origins_yx is a list of lists which tells coppafisher where each tile is relative to one another. For example, a 2x2 of tiles going around clockwise starting from the top-left would be tile_origins_yx = [[0, 0], [0, 1], [1, 1], [1, 0]].

Code book

A code book is a .txt file that tells coppafisher the gene codes for each gene. Each digit is the dye index for each sequencing round. An example of a four gene code book is

gene_0 0123012
gene_1 1230123
gene_2 2301230
gene_3 3012301

the names (gene_0, gene_1, ...) can be changed. Do not assign any genes a constant gene code like 0000000. To learn how the codes can be generated, see advanced usage. For details on how the codes are best generated, see reed_solomon_codes in the source code. See Wikipedia for algorithmic details on how gene codes are best selected.

Configuration

There are configuration variables used throughout the coppafisher pipeline. Most of these have reasonable default values, but some must be set by the user and you may wish to tweak other values for better performance. Save the config text file, like dataset_name.ini. The config file should contain, at the minimum:

[file_names]
; MUST SPECIFY input_dir, output_dir, tile_dir, code_book.
input_dir =
output_dir =
tile_dir =
code_book =
; This can be .npy, .tif, .nd2 or jobs.
raw_extension = .npy
; The names of the ND2 files (excluding the file extension above).
round = round0, round1, round2, round3, round4, round5, round6
anchor = anchor
; Optional, leave blank if you do not have a fluorescent bead file.
fluorescent_bead_path =

[basic_info]
; The names of the dyes given, must match the number of dyes used in the gene codebook.
dye_names = dye_0, dye_1, dye_2, dye_3
; Optional, leave blank to run on all tiles.
use_tiles =
; Round indices (starting from 0) located in the input files.
use_rounds = 0, 1, 2, 3, 4, 5, 6
; Channel indices (starting from 0) located in the input files.
use_channels = 5, 9, 10, 14, 15, 18, 19, 23, 27
; Optional, leave blank to run on all z planes.
use_z =
; The index of the anchor round.
anchor_round = 7
; The index of the anchor channel.
anchor_channel = 1
; The index of the dapi channel.
dapi_channel = 0

[stitch]
; The percentage overlap between adjacent tiles.
expected_overlap = 0.1

[call_spots]
target_values = 1, 1, 1, 1
d_max = 0, 1, 2, 3

raw_extension is .npy for numpy input, .tif for tif input, .nd2 for nd2 input, and jobs for JOBs input.

tile_dir is the tile directory, where extract images are saved to, it should be empty before running coppafisher. output_dir is where the notebook and PDF diagnostics are saved, it should also be blank before running. More details about every config variable can be found at coppafisher/setup/default.ini in the source code.

target_values and d_max must both have n_seq_channels numbers, one for each channel. See call spots for details on how to set the values. If you are unsure, set target_values to all ones and d_max to the brightest channel in each dye.

Unique anchor raw file indices

If your anchor raw file has unique channel locations compared to the sequencing raw files, set raw_anchor_channel_indices under the file_names section in the config. Go to coppafisher/setup/default.ini and search for raw_anchor_channel_indices for a description and usage.

Running

Coppafisher must be run with a configuration file. In the command line

python3 -m coppafisher /path/to/config.ini

Or programmatically, using a python script

from coppafisher import run_pipeline

run_pipeline("/path/to/config.ini")

which can then be run from the command line

python3 coppafisher_script_name.py

Runtime

For an estimate of your pipeline runtime1, in the Python terminal:

from coppafisher.utils import estimate_runtime

estimate_runtime()

then type in the relevant information when prompted.


  1. All time estimations are made using an Intel i9-13900K @ 5.500GHz, NVIDIA RTX 4070Ti Super (optional), and NVMe local SSD. Raw, ND2 input files were saved on a server with read speed of ~200 MB/s.