Protein bulk annotation

For the direct bulk annotation of protein sequences aside from the genome, Bakta provides a dedicated CLI entry point bakta_proteins:

Examples:

bakta_proteins --db <db-path> input.fasta

bakta_proteins --db <db-path> --prefix test --output test --proteins special.faa --threads 8 input.fasta

Output

Annotation results are provided in standard bioinformatics file formats:

  • <prefix>.tsv: annotations as simple human readble TSV
  • <prefix>.faa: protein sequences as FASTA
  • <prefix>.hypotheticals.tsv: further information on hypothetical proteins as simple human readble tab separated values
  • <prefix>.json: all (internal) annotation & sequence information as JSON

The <prefix> can be set via --prefix <prefix>. If no prefix is set, Bakta uses the input file prefix.

Usage

usage: bakta_proteins [--db DB] [--output OUTPUT] [--prefix PREFIX] [--force]
                      [--proteins PROTEINS]
                      [--help] [--verbose] [--debug] [--threads THREADS] [--tmp-dir TMP_DIR] [--version]
                      <input>

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

positional arguments:
  <input>               Protein sequences in (zipped) fasta format

Input / Output:
  --db DB, -d DB        Database path (default = <bakta_path>/db). Can also be provided as BAKTA_DB environment variable.
  --output OUTPUT, -o OUTPUT
                        Output directory (default = current working directory)
  --prefix PREFIX, -p PREFIX
                        Prefix for output files
  --force, -f           Force overwriting existing output folder

Annotation:
  --proteins PROTEINS   Fasta file of trusted protein sequences for annotation

General:
  --help, -h            Show this help message and exit
  --verbose, -v         Print verbose information
  --debug               Run Bakta in debug mode. Temp data will not be removed.
  --threads THREADS, -t THREADS
                        Number of threads to use (default = number of available CPUs)
  --tmp-dir TMP_DIR     Location for temporary files (default = system dependent auto detection)
  --version, -V         show program's version number and exit