gempipe autopilot

gempipe autopilot is an additional command line program, which internally calls gempipe recon and gempipe derive, linking them together performing an automated gap-filling on the draft pan-GSMM, as a (discouraged) alternative to the manual curation. By design, it has the options of both gempipe recon and gempipe derive.

usage: gempipe autopilot [-h] [-v] [-c] [-o] [--verbose] [--overwrite] [--dbs]
                         [-t] [-g] [-p] [-gb] [-s] [-b] [--buscoM] [--buscoF]
                         [--ncontigs] [--N50] [--identity] [--coverage] [-rm]
                         [-rp] [-rs] [-mc] [--tcdb] [--dedup] [--norec]
                         [--dbmem] [--sbml] [--nofig] [-md] [-m] [--minflux]
                         [--minpanflux] [--biolog] [--aux] [--cnps]
                         [--cnps_minmed] [--biosynth]

gempipe v1.38.4, please cite "TODO". Full documentation available at
https://gempipe.readthedocs.io/en/latest/index.html.

optional arguments:
  -h, --help            Show this help message and exit.
  -v, --version         Show version number and exit.
  -c , --cores          Number of parallel processes to use. (default: 1)
  -o , --outdir         Main output directory (will be created if not
                        existing). (default: ./)
  --verbose             Make stdout messages more verbose, including debug
                        messages. (default: False)
  --overwrite           Delete the working/ directory at the startup.
                        (default: False)
  --dbs                 Path were the needed databases are stored (or
                        downloaded if not already existing). (default:
                        ./working/dbs/)
  -t , --taxids         Taxids of the species to model (comma separated, for
                        example '252393,68334'). (default: -)
  -g , --genomes        Input genome files or folder containing the genomes
                        (see documentation). (default: -)
  -p , --proteomes      Input proteome files or folder containing the
                        proteomes (see documentation). (default: -)
  -gb , --genbanks      Input genbank files (.gb, .gbff) or folder containing
                        the genbanks (see documentation). (default: -)
  -s , --staining       Gram staining, 'pos' or 'neg'. (default: neg)
  -b , --buscodb        Busco database to use ('show' to see the list of
                        available databases). (default: bacteria_odb10)
  --buscoM              Maximum number of missing Busco's single copy
                        orthologs (absolute or percentage). (default: 2%)
  --buscoF              Maximum number of fragmented Busco's single copy
                        orthologs (absolute or percentage). (default: 100%)
  --ncontigs            Maximum number of contigs allowed per genome.
                        (default: 200)
  --N50                 Minimum N50 allowed per genome. (default: 50000)
  --identity            Minimum percentage amino acidic sequence identity to
                        use when aligning against the BiGG gene database.
                        (default: 30)
  --coverage            Minimum percentage coverage to use when aligning
                        against the BiGG gene database. (default: 70)
  -rm , --refmodel      Model to be used as reference. (default: -)
  -rp , --refproteome   Proteome to be used as reference. (default: -)
  -rs , --refspont      Reference gene marking spontaneous reactions.
                        (default: spontaneous)
  -mc , --mancor        Manual corrections to apply during the reference
                        expansion. (default: -)
  --tcdb                Experimental feature: try to build transport reactions
                        using TCDB. (default: False)
  --dedup               Try to remove duplicate metabolites and reactions
                        using MNX annotation, when a reference is provided.
                        (default: False)
  --norec               Skip gene recovery when starting from genomes.
                        (default: False)
  --dbmem               Load the entire eggNOG-mapper database into memory
                        (should speed up the functional annotation step).
                        (default: False)
  --sbml                Save the output GSMMs in SBML format (L3V1 FBC2) in
                        addition to JSON. (default: False)
  --nofig               Skip the generation of figures. (default: False)
  -md , --metadata      Table for manual correction of genome metadata.
                        (default: -)
  -m , --media          Medium definition file or folder containing media
                        definitions, to be used during the automatic gap-
                        filling. (default: -)
  --minflux             Minimum flux through the objective of strain-specific
                        models. (default: 0.1)
  --minpanflux          Minimum flux through the objective of the pan model.
                        (default: 0.3)
  --biolog              Simulate Biolog's utilization tests on strain-specific
                        models. (default: False)
  --aux                 Test auxotrophies for aminoacids and vitamins.
                        (default: False)
  --cnps                Sistematically simulate growth on all the available
                        C-N-P-S sources. (default: False)
  --cnps_minmed         Base the C-N-P-S simulations on a minimal medium
                        leading to the specified minimum objective value. If
                        0, user-defined medium will be used. (default: 0.0)
  --biosynth            Check biosynthesis of each metabolite while granting
                        the specified minimum fraction of objective. If 0,
                        this step will be skipped. (default: 0.0)

With gempipe autopilot, an automatic gap-filling is applied to the draft pan-GSMM. The --minpanflux parameter specifies the minimal flux through the objective for the pan-GSMM, usually the biomass equation. The gapfilling is repeated once for each growth media indicated with -m/--media. Instructions on how to encode a medium recipe can be found on Gap-filling strain-specific models. For more information on how this gapfilling step is implemented, please read Methods in the Gempipe paper.