Usage
Input data
Coppafish requires raw, uint16
microscope images, metadata, and a configuration file. We currently only support raw
data in ND2, JOBs, or numpy format. If your data is not already in one of these formats, we recommend configuring your
data into numpy format (see below).
Numpy
Each round is separated between directories. Label sequencing round directories 0
, 1
, etc. We recommend using
dask, this is installed in your coppafish environment by default. The code to save data in the
right format would look something like
import os
import dask.array
raw_path = "/path/to/raw/data"
dask_chunks = (1, n_total_channels, n_y, n_x, n_z)
for r in range(n_seq_rounds):
save_path = os.path.join(raw_path, f"{r}")
image_dask = dask.array.from_array(seq_image_tiles[r], chunks=dask_chunks)
dask.array.to_npy_stack(save_path, image_dask)
# Anchor round
save_path = os.path.join(raw_path, "anchor")
image_dask = dask.array.from_array(anchor_image, chunks=dask_chunks)
dask.array.to_npy_stack(save_path, image_dask)
# Presequence round (optional)
save_path = os.path.join(raw_path, "presequence")
image_dask = dask.array.from_array(preseq_image, chunks=dask_chunks)
dask.array.to_npy_stack(save_path, image_dask)
where n_...
variables represent counts (integers), n_total_channels
can include other channels other than the
sequencing channel (e.g. a DAPI channel and anchor channel). seq_image_tiles
is a numpy array of shape
(n_seq_rounds, n_tiles, n_total_channels, n_y, n_x, n_z)
, while anchor_image
and preseq_image
are numpy arrays of
shape (n_tiles, n_total_channels, n_y, n_x, n_z)
. Note that n_y
must be equal to n_x
.
Metadata
The metadata can be saved using python:
import json
metadata = {
"n_tiles": n_tiles,
"n_rounds": n_rounds,
"n_channels": n_total_channels,
"tile_sz": n_y, # or n_x
"pixel_size_xy": 0.26,
"pixel_size_z": 0.9,
"tile_centre": [n_y / 2, n_x / 2, n_z / 2],
"tilepos_yx": tile_origins_yx,
"tilepos_yx_nd2": list(reversed(tile_origins_yx)),
"channel_camera": [1] * n_total_channels,
"channel_laser": [1] * n_total_channels,
"xy_pos": tile_xy_pos,
"nz": n_z,
}
file_path = os.path.join(raw_path, "metadata.json")
with open(file_path, "w") as f:
json.dump(metadata, f, indent=4)
Code book
A code book is a .txt
file that tells coppafish the expected gene codes for each gene. An example of a four
gene code book is
gene_0 0123012
gene_1 1230123
gene_2 2301230
gene_3 3012301
the names (gene_0
, gene_1
, ...) can be changed. Do not assign any genes a constant gene code, e.g. 0000000
. To
learn how the codes can be generated, see advanced usage. For details on how the codes are
generated, see reed_solomon_codes
in
coppafish/utils/base.py
. See the
Wikipedia article for how gene codes are best selected.
Configuration
There are configuration variables used throughout the coppafish pipeline. Most of these have reasonable default values,
but some must be set by the user and you may wish to tweak other values for better performance. Save the config file as
something like config.ini
. The config file should contain, at the minimum:
[file_names]
input_dir = path/to/input/data
output_dir = path/to/output/directory
tile_dir = path/to/tile/output
round = 0, 1, 2, 3, 4, 5, 6 ; Go up to the number of sequencing rounds used
anchor = anchor
pre_seq = presequence
raw_extension = .npy
raw_metadata = path/to/metadata.json
[basic_info]
is_3d = True
dye_names = dye_0, dye_1, dye_2, dye_3
use_rounds = 0, 1, 2, 3, 4, 5, 6
use_z = 0, 1, 2, 3, 4
use_tiles = 0, 1
anchor_round = 7
use_channels = 1, 2, 3, 4
anchor_channel = 1
dapi_channel = 0
[stitch]
expected_overlap = 0.15
where the dapi_channel
is the index in the numpy arrays that the dapi channel is stored at. use_channels
includes
the anchor_channel
in this case because the anchor channel can also be used as a sequencing channel in the sequencing
rounds. dye_names
does not have to be set explicitly if n_seq_channels == n_dyes
. expected_overlap
is the
fraction of the tile in x (y) dimension that is overlapping between adjacent tiles, typically 0.1-0.15
. More details
about every config variable can be found at
coppafish/setup/settings.default.ini
in the source code. use_z
contains all selected z planes, they should all
be adjacent planes. It is recommended to use microscopic images where the middle z plane is roughly the brightest for
best performance; this can be configured by changing the selected z planes in use_z
. The z direction can be treated
differently to the y and x directions because typically a z pixel corresponds to a larger, real distance.
Running
Coppafish can be run with a config file. In the terminal
python -m coppafish /path/to/config.ini
Or using a python script
from coppafish import run_pipeline
run_pipeline("/path/to/config.ini")