gempipe autopilot
gempipe autopilot is an additional command line program, which internally calls gempipe recon and gempipe derive, linking them together performing an automated gap-filling on the draft pan-GSMM, as a (discouraged) alternative to the manual curation. By design, it has the options of both gempipe recon and gempipe derive.
usage: gempipe autopilot [-h] [-v] [-c] [-o] [--verbose] [--overwrite] [--dbs]
[-t] [-g] [-p] [-gb] [-s] [-b] [--buscoM] [--buscoF]
[--ncontigs] [--N50] [--identity] [--coverage] [-rm]
[-rp] [-rs] [-mc] [--tcdb] [--dedup] [--norec]
[--dbmem] [--sbml] [--nofig] [-md] [-m] [--minflux]
[--minpanflux] [--biolog] [--aux] [--cnps]
[--cnps_minmed] [--biosynth]
gempipe v1.38.4, please cite "TODO". Full documentation available at
https://gempipe.readthedocs.io/en/latest/index.html.
optional arguments:
-h, --help Show this help message and exit.
-v, --version Show version number and exit.
-c , --cores Number of parallel processes to use. (default: 1)
-o , --outdir Main output directory (will be created if not
existing). (default: ./)
--verbose Make stdout messages more verbose, including debug
messages. (default: False)
--overwrite Delete the working/ directory at the startup.
(default: False)
--dbs Path were the needed databases are stored (or
downloaded if not already existing). (default:
./working/dbs/)
-t , --taxids Taxids of the species to model (comma separated, for
example '252393,68334'). (default: -)
-g , --genomes Input genome files or folder containing the genomes
(see documentation). (default: -)
-p , --proteomes Input proteome files or folder containing the
proteomes (see documentation). (default: -)
-gb , --genbanks Input genbank files (.gb, .gbff) or folder containing
the genbanks (see documentation). (default: -)
-s , --staining Gram staining, 'pos' or 'neg'. (default: neg)
-b , --buscodb Busco database to use ('show' to see the list of
available databases). (default: bacteria_odb10)
--buscoM Maximum number of missing Busco's single copy
orthologs (absolute or percentage). (default: 2%)
--buscoF Maximum number of fragmented Busco's single copy
orthologs (absolute or percentage). (default: 100%)
--ncontigs Maximum number of contigs allowed per genome.
(default: 200)
--N50 Minimum N50 allowed per genome. (default: 50000)
--identity Minimum percentage amino acidic sequence identity to
use when aligning against the BiGG gene database.
(default: 30)
--coverage Minimum percentage coverage to use when aligning
against the BiGG gene database. (default: 70)
-rm , --refmodel Model to be used as reference. (default: -)
-rp , --refproteome Proteome to be used as reference. (default: -)
-rs , --refspont Reference gene marking spontaneous reactions.
(default: spontaneous)
-mc , --mancor Manual corrections to apply during the reference
expansion. (default: -)
--tcdb Experimental feature: try to build transport reactions
using TCDB. (default: False)
--dedup Try to remove duplicate metabolites and reactions
using MNX annotation, when a reference is provided.
(default: False)
--norec Skip gene recovery when starting from genomes.
(default: False)
--dbmem Load the entire eggNOG-mapper database into memory
(should speed up the functional annotation step).
(default: False)
--sbml Save the output GSMMs in SBML format (L3V1 FBC2) in
addition to JSON. (default: False)
--nofig Skip the generation of figures. (default: False)
-md , --metadata Table for manual correction of genome metadata.
(default: -)
-m , --media Medium definition file or folder containing media
definitions, to be used during the automatic gap-
filling. (default: -)
--minflux Minimum flux through the objective of strain-specific
models. (default: 0.1)
--minpanflux Minimum flux through the objective of the pan model.
(default: 0.3)
--biolog Simulate Biolog's utilization tests on strain-specific
models. (default: False)
--aux Test auxotrophies for aminoacids and vitamins.
(default: False)
--cnps Sistematically simulate growth on all the available
C-N-P-S sources. (default: False)
--cnps_minmed Base the C-N-P-S simulations on a minimal medium
leading to the specified minimum objective value. If
0, user-defined medium will be used. (default: 0.0)
--biosynth Check biosynthesis of each metabolite while granting
the specified minimum fraction of objective. If 0,
this step will be skipped. (default: 0.0)
With gempipe autopilot, an automatic gap-filling is applied to the draft pan-GSMM. The --minpanflux parameter specifies the minimal flux through the objective for the pan-GSMM, usually the biomass equation. The gapfilling is repeated once for each growth media indicated with -m/--media. Instructions on how to encode a medium recipe can be found on Gap-filling strain-specific models. For more information on how this gapfilling step is implemented, please read Methods in the Gempipe paper.