Genome plots
Bakta allows the creation of circular genome plots via pyCirclize. Plots are generated as part of the default workflow and saved as PNG and SVG files. In addition to the default workflow, Bakta provides a dedicated CLI entry point bakta_plot:
Examples:
bakta_plot input.json
bakta_plot --output test --prefix test --config config.yaml --sequences 1,2 input.json
It accepts the results of a former annotation process in JSON format and allows the selection of distinct sequences, either denoted by their FASTA identifiers or sequential number starting by 1. Colors for each feature type can be adopted via a simple configuration file in YAML format, e.g. config.yaml. Currently, two default plot types are supported, i.e. features and cog. Examples for chromosomes and plasmids are provided in here
Usage
usage: bakta_plot [--config CONFIG] [--output OUTPUT] [--prefix PREFIX]
[--sequences SEQUENCES] [--type {features,cog}] [--label LABEL] [--size {4,8,16}] [--dpi {150,300,600}]
[--help] [--verbose] [--debug] [--tmp-dir TMP_DIR] [--version]
<input>
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
positional arguments:
<input> Bakta annotations in (zipped) JSON format
Input / Output:
--config CONFIG, -c CONFIG
Plotting configuration in YAML format
--output OUTPUT, -o OUTPUT
Output directory (default = current working directory)
--prefix PREFIX, -p PREFIX
Prefix for output files
Plotting:
--sequences SEQUENCES
Sequences to plot: comma separated number or name (default = all, numbers one-based)
--type {features,cog}
Plot type: feature/cog (default = features)
--label LABEL Plot center label (for line breaks use '|')
--size {4,8,16} Plot size in inches: 4/8/16 (default = 8)
--dpi {150,300,600} Plot resolution as dots per inch: 150/300/600 (default = 300)
General:
--help, -h Show this help message and exit
--verbose, -v Print verbose information
--debug Run Bakta in debug mode. Temp data will not be removed.
--tmp-dir TMP_DIR Location for temporary files (default = system dependent auto detection)
--version show program's version number and exit
Description
Currently, there are two types of plots: features (the default) and cog. In default mode (features), all features are plotted on two rings representing the forward and reverse strand from outer to inner, respectively using the following feature colors:
- CDS:
#cccccc - tRNA/tmRNA:
#b2df8a - rRNA:
#fb8072 - ncRNA:
#fdb462 - ncRNA-region:
#80b1d3 - CRISPR:
#bebada - Gap:
#000000 - Misc:
#666666
In the cog mode, all protein-coding genes (CDS) are colored due to assigned COG functional categories. To better distinguish non-coding genes, these are plotted on an additional 3rd ring.
In addition, both plot types share two innermost GC content and GC skew rings. The first ring represents the GC content per sliding window over the entire sequence(s) in green (#33a02c) and red #e31a1c representing GC above and below average, respectively. The 2nd ring represents the GC skew in orange (#fdbf6f) and blue (#1f78b4). The GC skew gives hints on a replicon's replication bubble and hence, on the completeness of the assembly. On a complete & circular bacterial chromosome, you normally see two inflection points at the origin of replication and at its opposite region -> Wikipedia
Custom plot labels (text in the center) can be provided via --label:
bakta_plot --sequences 2 --dpi 300 --size 8 --prefix plot-cog-p2 --type cog --label="pO157|plasmid, 92.7 kbp"
