readfish console scripts

readfish is a set of scripts that are accessed as sub-commands. These scripts are primarily aimed at implementing Adaptive Sampling (also known as Read Until or selective sequencing) for Oxford Nanopore Sequencers.

Commands

readfish targets

Run a targeted sequencing experiment on a device.

When given a TOML experiment configuration readfish targets will:
  1. Initialise a connection to the sequencing device

  2. Load the experiment configuration

  3. Initialise a connection to the Read Until API

  4. Initialise a connection to your chosen basecaller

  5. Initialise the read aligner

Then, during sequencing the start of each read is sampled. These chunks of raw data are processed by the basecaller to produce FASTA, which is then aligned against the chosen reference genome. The result of the alignment is used, along with the targets provided in the TOML file, to make a decision on each read.

In the new experimental Duplex mode, it is possible to override a decision for a read based on the action taken for a previous read. This is done by passing –chemistry and setting either duplex, or duplex simple. duplex_simple accepts a read if the previous channels read was stop receiving, duplex checks that the previous reads alignment was on the same contig and opposite strand. The default chemistry is simplex.

Running this should result in a very short (<1kb, ideally 400-600 bases) unblock peak at the start of a read length histogram and longer sequenced reads.

Example run command:

readfish targets --device X3 \
        --experiment-name "test" \
        --toml my_exp.toml \
        --log-file rf.log \
        --debug-log chunks.tsv

Example experimental duplex command:

readfish targets --device X3 \
       --experiment-name "test" \
       --toml my_exp.toml \
       --log-file rf.log \
       --debug-log chunks.tsv
       --chemistry duplex

In the debug_log chunks.tsv file, if this argument is passed, each line represents detailed information about a batch of read signal that has been processed in an iteration.

The format of each line is as follows: loop_counter number_reads read_id channel read_number seq_length seen_count decision action condition barcode previous_action timestamp action_overridden

Parameter

Type

Description

loop_counter

int

The iteration number for the loop.

number_reads

int

The number of reads processed in this iteration.

read_id

str

UUID4 string representing the reads unique read_id.

channel

int

Channel number the read is being sequenced on.

read_number

int

The number this read is in the sequencing run as a whole.

seq_length

int

Length of the base-called signal chunk (includes any previous chunks).

seen_count

int

Number of times this read has been seen in previous iterations.

decision

str

The name of the Decision variant taken for this read, see Regions sub-tables for values.

action

str

The name of the Action variant sent to the sequencer for this read, see Regions sub-tables for values.

condition

str

Name of the Condition that the read has been addressed with.

barcode

str or None

The name of the Barcode for this read if present, otherwise None.

previous_action

str or None

Name of the last Action taken for a read sequenced by this channel or None if first read on a chann

action_overridden

bool

Indicates if the action has been overridden. Currently, actions are always overridden to be stop_receiving.

timestamp

float

Current time as given by the time module in seconds.

Actions being overridden occurs when the readfish run is a dry run and the action is unblock, or when the read is the first read seen for a channel by readfish. This prevents trying to unblock reads of unknown length.

readfish unblock-all

An unblock all script. This will attempt to unblock all reads on all channels. This should result in a read length histogram that has very short peaks (<1kb) as these are the smallest chunks that we can acquire. If you are not seeing these peaks, the break_reads_after_seconds parameter in the MinKNOW configuration file may need to be reduced to 0.5-0.8.

This script is primarily for testing a computer’s response to processing data from the Read Until API without any other overheads (basecalling or mapping). It is only recommended to use this script when running a simulated (playback) sequencing experiment.

The unblock all command only requires the target device and a small description of the experiment, for example:

readfish unblock-all --device X3 --experiment-name "test unblock all"

readfish validate

Validate experiment configuration TOML files.

This script is used to check that an experiment configuration file can be loaded. By default, this will attempt to load the Caller and Aligner plugins as specified in the TOML file. The --no-check-plugins flag can be used to skip this step. The --no-describe flag can be used to skip printing a description of the configuration to terminal. Description will error if there a re contigs listed as targets which are not found in the reference.

These basic checks are for compatibility and do not indicate that a configuration/plugins will work efficiently with readfish.

Any errors are passed back through and printed to terminal. If you require help understanding or resolving an error you can check the TOML documentation pages or open an issue.

Example run command:

readfish validate my_exp.toml

readfish stats

This module is the stats entry point into the Readfish package, focussed specifically on calculating summary information about a Readfish experiment. It parses base-called FASTQ files produced after a Readfish experiment and, by using the same aligner setup as the experiment, aligns reads, aggregates stats and prints out a summary.

This module can be used to assess the results of a Readfish experiment, where the user wants to get insights into the experiment’s outcome within the context of readfish. It accepts several command-line arguments to customize its behaviour, including specifying the TOML file used in the Readfish experiment, the directory containing the FASTQ files, and options to control the output format.

The alignment and aggregation of the FASTQ happens in a separate package, readfish_summarise. https://github.com/Adoni5/ReadfishSummarise

FASTQ are demultiplexed into separate files for each combination of the Condition ( Region or Barcode ) name and the decision that readfish makes about the read (stop_receiving, unblock or proceed). For example:

  1. barcode01_stop_receiving.fastq.gz

  2. control_region_unblock.fastq.gz

returns:

An exit code. 0 if the process completes successfully without errors, otherwise the number of errors.

Command-Line Arguments

The module accepts the following command-line arguments:

  • toml (required):

    Path to the TOML file used in the Readfish experiment.

  • fastq_dir (required):

    Path to the directory containing the FASTQ files produced by the Readfish experiment.

  • --no-paf-out (optional):

    Disables the output of the alignments in PAF format. Enabled by default.

  • --no-demultiplex (optional):

    If specified, the module won’t demultiplex and write out FASTQ. Demultiplexing is enabled by default.

  • --prom (optional):

    If specified, indicates that the target platform was a PromethION.

Usage

An example of how to use this module without outputting demultiplexed FASTQ and PAF alignments is shown below:

readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/ --no-paf-out --no-demultiplex

To run and demultiplex, but not output PAF alignments

readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/ --no-paf-out

To run and output PAF alignments and demutiplexed FASTQ

readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/

To run and output PAF alignments and demutiplexed FASTQ, and output a HTML summary file at summary_adaptive.html

readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/ --html summary_adaptive