readfish console scripts¶
readfish
is a set of scripts that are accessed as sub-commands.
These scripts are primarily aimed at implementing Adaptive Sampling (also known as Read Until or selective sequencing) for Oxford Nanopore Sequencers.
Commands¶
readfish targets
¶
Run a targeted sequencing experiment on a device.
- When given a TOML experiment configuration
readfish targets
will: Initialise a connection to the sequencing device
Load the experiment configuration
Initialise a connection to the Read Until API
Initialise a connection to your chosen basecaller
Initialise the read aligner
Then, during sequencing the start of each read is sampled. These chunks of raw data are processed by the basecaller to produce FASTA, which is then aligned against the chosen reference genome. The result of the alignment is used, along with the targets provided in the TOML file, to make a decision on each read.
In the new experimental Duplex mode, it is possible to override a decision for a read based on the action taken for a previous read. This is done by passing –chemistry and setting either duplex, or duplex simple. duplex_simple accepts a read if the previous channels read was stop receiving, duplex checks that the previous reads alignment was on the same contig and opposite strand. The default chemistry is simplex.
Running this should result in a very short (<1kb, ideally 400-600 bases) unblock peak at the start of a read length histogram and longer sequenced reads.
Example run command:
readfish targets --device X3 \
--experiment-name "test" \
--toml my_exp.toml \
--log-file rf.log \
--debug-log chunks.tsv
Example experimental duplex command:
readfish targets --device X3 \
--experiment-name "test" \
--toml my_exp.toml \
--log-file rf.log \
--debug-log chunks.tsv
--chemistry duplex
In the debug_log chunks.tsv file, if this argument is passed, each line represents detailed information about a batch of read signal that has been processed in an iteration.
The format of each line is as follows: loop_counter number_reads read_id channel read_number seq_length seen_count decision action condition barcode previous_action timestamp action_overridden
Parameter |
Type |
Description |
---|---|---|
loop_counter |
int |
The iteration number for the loop. |
number_reads |
int |
The number of reads processed in this iteration. |
read_id |
str |
UUID4 string representing the reads unique read_id. |
channel |
int |
Channel number the read is being sequenced on. |
read_number |
int |
The number this read is in the sequencing run as a whole. |
seq_length |
int |
Length of the base-called signal chunk (includes any previous chunks). |
seen_count |
int |
Number of times this read has been seen in previous iterations. |
decision |
str |
The name of the Decision variant taken for this read, see Regions sub-tables for values. |
action |
str |
The name of the Action variant sent to the sequencer for this read, see Regions sub-tables for values. |
condition |
str |
Name of the Condition that the read has been addressed with. |
barcode |
str or None |
The name of the Barcode for this read if present, otherwise None. |
previous_action |
str or None |
Name of the last Action taken for a read sequenced by this channel or None if first read on a chann |
action_overridden |
bool |
Indicates if the action has been overridden. Currently, actions are always overridden to be stop_receiving. |
timestamp |
float |
Current time as given by the time module in seconds. |
Actions being overridden occurs when the readfish run is a dry run and the action is unblock, or when the read is the first read seen for a channel by readfish. This prevents trying to unblock reads of unknown length.
readfish unblock-all
¶
An unblock all script.
This will attempt to unblock all reads on all channels.
This should result in a read length histogram that has very short peaks (<1kb) as these are the smallest chunks that we can acquire.
If you are not seeing these peaks, the break_reads_after_seconds
parameter in the MinKNOW configuration file may need to be reduced to 0.5-0.8.
This script is primarily for testing a computer’s response to processing data from the Read Until API without any other overheads (basecalling or mapping). It is only recommended to use this script when running a simulated (playback) sequencing experiment.
The unblock all command only requires the target device and a small description of the experiment, for example:
readfish unblock-all --device X3 --experiment-name "test unblock all"
readfish validate
¶
Validate experiment configuration TOML files.
This script is used to check that an experiment configuration file can be loaded.
By default, this will attempt to load the Caller
and Aligner
plugins as specified in the
TOML file. The --no-check-plugins
flag can be used to skip this step.
The --no-describe
flag can be used to skip printing a description of the configuration to terminal. Description will error if there a re contigs listed as targets which are not found in the reference.
These basic checks are for compatibility and do not indicate that a configuration/plugins will work efficiently with readfish.
Any errors are passed back through and printed to terminal. If you require help understanding or resolving an error you can check the TOML documentation pages or open an issue.
Example run command:
readfish validate my_exp.toml
readfish stats
¶
This module is the stats entry point into the Readfish package, focussed specifically on calculating summary information about a Readfish experiment. It parses base-called FASTQ files produced after a Readfish experiment and, by using the same aligner setup as the experiment, aligns reads, aggregates stats and prints out a summary.
This module can be used to assess the results of a Readfish experiment, where the user wants to get insights into the experiment’s outcome within the context of readfish. It accepts several command-line arguments to customize its behaviour, including specifying the TOML file used in the Readfish experiment, the directory containing the FASTQ files, and options to control the output format.
The alignment and aggregation of the FASTQ happens in a separate package, readfish_summarise. https://github.com/Adoni5/ReadfishSummarise
FASTQ are demultiplexed into separate files for each combination of the Condition ( Region or Barcode ) name and the decision that readfish makes about the read (stop_receiving, unblock or proceed). For example:
barcode01_stop_receiving.fastq.gz
control_region_unblock.fastq.gz
- returns:
An exit code. 0 if the process completes successfully without errors, otherwise the number of errors.
Command-Line Arguments¶
The module accepts the following command-line arguments:
toml
(required):Path to the TOML file used in the Readfish experiment.
fastq_dir
(required):Path to the directory containing the FASTQ files produced by the Readfish experiment.
--no-paf-out
(optional):Disables the output of the alignments in PAF format. Enabled by default.
--no-demultiplex
(optional):If specified, the module won’t demultiplex and write out FASTQ. Demultiplexing is enabled by default.
--prom
(optional):If specified, indicates that the target platform was a PromethION.
Usage¶
An example of how to use this module without outputting demultiplexed FASTQ and PAF alignments is shown below:
readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/ --no-paf-out --no-demultiplex
To run and demultiplex, but not output PAF alignments
readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/ --no-paf-out
To run and output PAF alignments and demutiplexed FASTQ
readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/
To run and output PAF alignments and demutiplexed FASTQ, and output a HTML summary file at summary_adaptive.html
readfish stats --toml tests/static/stats_test/yeast_summary_test.toml --fastq-directory tests/static/stats_test/ --html summary_adaptive