Bakta API-Documentation
Bakta provides a open-access REST-API that can be used to annotate own genomes programmatically. The API and the corresponding Swagger documentation can be found here
Usage
Using the REST-API requires a three staged process. First, a job must be initialized, then the initialized job must be started and its status monitored, in the last step the results of a finished job can be retrieved. These stages must occur in this exact order, any deviation from this order will result in a failing job.
Initialization
Initialization of a Bakta Job must start with a init request. The API responds with a unique jobID
and a corresponding secret
as well as three pre-authenticated S3 urls (uploadLinkFasta
, uploadLinkProdigal
, uploadLinkReplicons
).
jobID
and the corresponding secret
must be stored locally and are used as credentials to identify the user in following requests.
The initialization procedure is finished by using the three S3-URLs (uploadLinkFasta
, uploadLinkProdigal
, uploadLinkReplicons
) to upload data to the internal storage system. For this PUT
requests with raw data as request body should be used.
uploadLinkFasta
should be used to upload the (fasta) sequence data for annotation.
uploadLinkProdigal
(optional) can be used to upload an additional prodigal training file
uploadLinkReplicons
(optional) should be used to upload a replicon table in tsv
format that describes the provided replicons in the fasta input file
Note
A PUT
request to all three S3-URLs is necessary to finish the initialization procedure, optional URLs should be satisfied with a request that has an empty body with length zero.
Job start & Monitoring
After initialization the specific job (identified by jobID
and a corresponding secret
) can be scheduled via the start request.
Scheduled jobs are monitored via the job list request. This requests contains a list of all monitored jobIDs
and secrets
. The API responds with a matching JSON list of jobStatuses
.
The job list request should be repeated until all jobs have either the SUCCESSFUL
or ERROR
status. Recently scheduled jobs have the INIT
status, currently running jobs RUNNING
.
Note
The INIT
status refers to a started job that is currently initialized for excecution. Depending on the current load of the underlying hardware and position in the scheduling queue, it may take a while before a job transitions to the RUNNING
status. A failed job is always indicated by the ERROR
status.
If multiple jobs are monitored simultaneosly the finalization procedure can be started for a job with SUCCESSFUL
status while others are still RUNNING
. In this case the monitoring should continue in parallel for the remaining jobs and the finished job can be removed from the list.
Getting results
Results for jobs with a SUCCESFUL
status can be retrieved via the result request. The response contains a list (ResultFiles) of different file-formats with corresponding Download URLs. The result files can be retrieved with GET
requests to the URL, or via a regular Webbrowser.
Currently the following file formats for results are provided:
EMBL
FAA
FAAHypothetical
FNA
GBFF
GFF3
JSON
TSV
TSVHypothetical
More information about the structure of these output formats can be found in the CLI Documentation
Note
The JSON
output format can be visualized locally via the WebUI at https://bakta.computational.bio.
Endpoints
/api/v1/job/init
The init endpoint is used to initialize a new job. Initialized jobs can be started via the start request.
HTTP-Method: POST
Expected request body:
{
"repliconTableType": "CSV",
"name": "string"
}
repliconTableType
describes the file format of the provided replicontable, this should be either CSV
or TSV
. name
is an arbitrary name, usually the name of the fasta input file.
Expected response body:
{
"uploadLinkFasta": "string",
"uploadLinkProdigal": "string",
"uploadLinkReplicons": "string",
"job": {
"secret": "string",
"jobID": "string"
}
}
The response contains three S3-URLs (uploadLinkFasta
, uploadLinkProdigal
, uploadLinkReplicons
). These URLs are pre-authenticated and can be used to upload data to the internal storage using PUT requests. For a detailed, step-by-step guide to use these URLs see Usage. Additionally the init-request-response contains a job description with an unique jobID
and a corresponding secret
that are used by future request to identify and authorize the initialized job.
/api/v1/job/start
This endpoint is used to start a job that has been initialized via the init request.
HTTP-Method: POST
Expected request body:
{
"job": {
"secret": "string",
"jobID": "string"
},
"config": {
"hasProdigal": true,
"hasReplicons": true,
"translationalTable": 0,
"completeGenome": true,
"keepContigHeaders": true,
"minContigLength": "string",
"dermType": "UNKNOWN",
"genus": "string",
"species": "string",
"strain": "string",
"plasmid": "string",
"locus": "string",
"locusTag": "string"
},
}
A successful response is indicated by a 200 status code and an empty response body.
/api/v1/job/list
Endpoint to query the current status of one (or more) running jobs.
HTTP-Method: POST
Expected request body:
{
"jobs": [
{
"secret": "string",
"jobID": "string"
}
]
}
Response:
{
"jobs": [
{
"jobID": "string",
"jobStatus": "INIT",
"started": "2021-07-02T11:41:10.675Z",
"updated": "2021-07-02T11:41:10.675Z",
"name": "string"
}
],
"failedJobs": [
{
"jobID": "string",
"jobStatus": "NOT_FOUND"
}
]
}
/api/v1/job/result
Endpoint to query the results of a finished job.
HTTP-Method: POST
Request:
{
"secret": "string",
"jobID": "string"
}
Response:
{
"jobID": "string",
"ResultFiles":
{
"EMBL": "S3-URL",
"FAA": "S3-URL",
"FAAHypothetical": "S3-URL",
"FNA": "S3-URL",
"GBFF": "S3-URL",
"GFF3": "S3-URL",
"JSON": "S3-URL",
"TSV": "S3-URL",
"TSVHypothetical": "S3-URL"
},
"started": "2021-07-14T11:10:31.838Z",
"updated": "2021-07-14T11:10:31.838Z",
"name": "string"
}
/api/v1/version
Method that can be used to determine the internal database and Bakta version.
HTTP-METHOD: GET
Response:
{
"toolVersion": "string",
"dbVersion": "string",
"backendVersion": "string"
}