{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "f3e0a364-c8cb-4512-ba1b-3817c7e48b38", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%aimport gempipe, gempipe.flowchart\n", "%autoreload 1" ] }, { "cell_type": "code", "execution_count": 4, "id": "1d09aca0-417b-4658-b2af-b670c22a2ae4", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "\n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from gempipe import Flowchart\n", "\n", "file = open('flowcharts/part_1.flowchart', 'r')\n", "part_1 = file.read()\n", "file.close()\n", "\n", "file = open('flowcharts/autopilot.flowchart', 'r')\n", "autopilot = file.read()\n", "file.close()\n", "\n", "file = open('flowcharts/part_3.flowchart', 'r')\n", "part_3 = file.read()\n", "file.close()\n", "\n", "header = 'flowchart LR \\n'\n", "flowchart = Flowchart(header + part_1 + autopilot + part_3)\n", "flowchart.render(height=300, zoom=2)" ] }, { "cell_type": "markdown", "id": "30567491-89fa-4907-9520-5f4425cc31cc", "metadata": { "tags": [] }, "source": [ "# gempipe autopilot\n", "\n", "`gempipe autopilot` is an additional command line program, which internally calls [`gempipe recon`](part_1_gempipe_recon.ipynb) and [`gempipe derive`](part_3_gempipe_derive.ipynb), linking them together performing an automated gap-filling on the draft pan-GSMM, as a (_discouraged_) alternative to the manual curation. By design, it has the options of both `gempipe recon` and `gempipe derive`.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "4bf72f77-e116-49ce-a195-29ce6345742e", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: gempipe autopilot [-h] [-v] [-c] [-o] [--verbose] [--overwrite] [--dbs]\n", " [-t] [-g] [-p] [-gb] [-s] [-b] [--buscoM] [--buscoF]\n", " [--ncontigs] [--N50] [--identity] [--coverage] [-rm]\n", " [-rp] [-rs] [-mc] [--tcdb] [--dedup] [--norec]\n", " [--dbmem] [--sbml] [--nofig] [-md] [-m] [--minflux]\n", " [--minpanflux] [--biolog] [--aux] [--cnps]\n", " [--cnps_minmed] [--biosynth]\n", "\n", "gempipe v1.38.4, please cite \"TODO\". Full documentation available at\n", "https://gempipe.readthedocs.io/en/latest/index.html.\n", "\n", "optional arguments:\n", " -h, --help Show this help message and exit.\n", " -v, --version Show version number and exit.\n", " -c , --cores Number of parallel processes to use. (default: 1)\n", " -o , --outdir Main output directory (will be created if not\n", " existing). (default: ./)\n", " --verbose Make stdout messages more verbose, including debug\n", " messages. (default: False)\n", " --overwrite Delete the working/ directory at the startup.\n", " (default: False)\n", " --dbs Path were the needed databases are stored (or\n", " downloaded if not already existing). (default:\n", " ./working/dbs/)\n", " -t , --taxids Taxids of the species to model (comma separated, for\n", " example '252393,68334'). (default: -)\n", " -g , --genomes Input genome files or folder containing the genomes\n", " (see documentation). (default: -)\n", " -p , --proteomes Input proteome files or folder containing the\n", " proteomes (see documentation). (default: -)\n", " -gb , --genbanks Input genbank files (.gb, .gbff) or folder containing\n", " the genbanks (see documentation). (default: -)\n", " -s , --staining Gram staining, 'pos' or 'neg'. (default: neg)\n", " -b , --buscodb Busco database to use ('show' to see the list of\n", " available databases). (default: bacteria_odb10)\n", " --buscoM Maximum number of missing Busco's single copy\n", " orthologs (absolute or percentage). (default: 2%)\n", " --buscoF Maximum number of fragmented Busco's single copy\n", " orthologs (absolute or percentage). (default: 100%)\n", " --ncontigs Maximum number of contigs allowed per genome.\n", " (default: 200)\n", " --N50 Minimum N50 allowed per genome. (default: 50000)\n", " --identity Minimum percentage amino acidic sequence identity to\n", " use when aligning against the BiGG gene database.\n", " (default: 30)\n", " --coverage Minimum percentage coverage to use when aligning\n", " against the BiGG gene database. (default: 70)\n", " -rm , --refmodel Model to be used as reference. (default: -)\n", " -rp , --refproteome Proteome to be used as reference. (default: -)\n", " -rs , --refspont Reference gene marking spontaneous reactions.\n", " (default: spontaneous)\n", " -mc , --mancor Manual corrections to apply during the reference\n", " expansion. (default: -)\n", " --tcdb Experimental feature: try to build transport reactions\n", " using TCDB. (default: False)\n", " --dedup Try to remove duplicate metabolites and reactions\n", " using MNX annotation, when a reference is provided.\n", " (default: False)\n", " --norec Skip gene recovery when starting from genomes.\n", " (default: False)\n", " --dbmem Load the entire eggNOG-mapper database into memory\n", " (should speed up the functional annotation step).\n", " (default: False)\n", " --sbml Save the output GSMMs in SBML format (L3V1 FBC2) in\n", " addition to JSON. (default: False)\n", " --nofig Skip the generation of figures. (default: False)\n", " -md , --metadata Table for manual correction of genome metadata.\n", " (default: -)\n", " -m , --media Medium definition file or folder containing media\n", " definitions, to be used during the automatic gap-\n", " filling. (default: -)\n", " --minflux Minimum flux through the objective of strain-specific\n", " models. (default: 0.1)\n", " --minpanflux Minimum flux through the objective of the pan model.\n", " (default: 0.3)\n", " --biolog Simulate Biolog's utilization tests on strain-specific\n", " models. (default: False)\n", " --aux Test auxotrophies for aminoacids and vitamins.\n", " (default: False)\n", " --cnps Sistematically simulate growth on all the available\n", " C-N-P-S sources. (default: False)\n", " --cnps_minmed Base the C-N-P-S simulations on a minimal medium\n", " leading to the specified minimum objective value. If\n", " 0, user-defined medium will be used. (default: 0.0)\n", " --biosynth Check biosynthesis of each metabolite while granting\n", " the specified minimum fraction of objective. If 0,\n", " this step will be skipped. (default: 0.0)\n" ] } ], "source": [ "import subprocess\n", "\n", "command = f\"\"\"gempipe autopilot -h\"\"\"\n", "process = subprocess.Popen(command, shell=True)\n", "response = process.wait()\n" ] }, { "cell_type": "markdown", "id": "3c82cf33-251c-407d-bff7-7195c87201da", "metadata": {}, "source": [ "With `gempipe autopilot`, an automatic gap-filling is applied to the draft pan-GSMM. The `--minpanflux` parameter specifies the minimal flux through the objective for the pan-GSMM, usually the biomass equation. The gapfilling is repeated once for each growth media indicated with `-m`/`--media`. Instructions on how to encode a medium recipe can be found on [Gap-filling strain-specific models](https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#gap-filling-strain-specific-models). For more information on how this gapfilling step is implemented, please read Methods in the [Gempipe paper](how_to_cite.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "id": "797038d4-bb9d-4681-ac96-c4d6da0ba2f7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 5 }