{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f3e0a364-c8cb-4512-ba1b-3817c7e48b38",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%aimport gempipe, gempipe.flowchart\n",
    "%autoreload 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1d09aca0-417b-4658-b2af-b670c22a2ae4",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "        <style> #outcellbox {display: flex; justify-content: center; overflow: hidden; width: 99%; height: 300px; background-color: #ffffff; border: 1px solid grey;} </style>\n",
       "        \n",
       "        <script src=\"https://unpkg.com/@panzoom/panzoom@4.5.1/dist/panzoom.min.js\"></script>\n",
       "\n",
       "        <div class=\"mermaid-ce422542-bec1-452f-99ec-b54576251bfa\" id=\"outcellbox\"></div> \n",
       "        <script type=\"module\">\n",
       "            import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10.6.1/+esm';\n",
       "            const graphDefinition = 'flowchart LR \\n\\nsubgraph Part_1[Part 1]\\n\\nInput_proteomes{{-p/--proteomes}} --> Proteomes --> BRH\\nInput_genbanks{{-gb/--genbanks}} --> Proteomes\\nInput_genomes{{-g/--genomes}} --> initial_Genomes\\nInput_taxids{{-t/--taxids}} --> NCBI_download --> initial_Genomes{{initial_Genomes}}\\nInput_ref_proteome{{-rp/--refproteome}} --> BRH\\nInput_ref_model{{-rm/--refmodel}} --> BRH --> translated_ref{{translated_ref}}\\nInput_manual_corrections{{-mc/--mancor}} --> expanded_RB_recon\\n\\ninitial_Genomes --> CDS_prediction --> initial_Proteomes{{initial_Proteomes}} --> Bio_metrics --> Busco(Busco)\\ninitial_Genomes --> Tech_metrics --> N50(N50) & n_contigs(n_contigs)\\nN50 & n_contigs & Busco & initial_Genomes & initial_Proteomes --> Filtering \\nFiltering --> Genomes{{Genomes}} & Proteomes{{Proteomes}}\\nBusco_db{{Busco_db}} --> Bio_metrics\\n\\nProteomes --> Clustering --> initial_PAM{{initial_PAM}} & initial_Ref_seqs{{initial_Ref_seqs}}\\ninitial_PAM & initial_Ref_seqs & Proteomes & Genomes --> Gene_recovery\\n\\n\\nGene_recovery --> PAM{{PAM}} & Ref_seqs{{Ref_seqs}}\\nRef_seqs --> Func_annot --> Gene_functions{{Gene_functions}} --> RF_recon\\nBiGG_genes{{BiGG_genes}} & Ref_seqs & Universe{{Universe}} --> RF_recon --> RF_draft_panmodel{{RF_draft_panmodel}}\\neggNOG_db{{eggNOG_db}} --> Func_annot\\n\\nRF_draft_panmodel & translated_ref --> expanded_RB_recon --> Initial_draft_panmodel{{Initial_draft_panmodel}}\\nRF_draft_panmodel -. reference not provided .-> Initial_draft_panmodel\\nInitial_draft_panmodel --> TCDB_expansion --> reannotation[\"reannotation\"] --> deduplication --> draft_panmodel{{draft_panmodel}}\\n\\nend\\n\\n\\n\\nclick Input_proteomes href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#starting-from-proteomes\" \"Link\"\\nclick Input_genomes href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#starting-from-genomes\" \"Link\"\\nclick Input_taxids href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#starting-from-genomes\" \"Link\"\\nclick Input_genbanks href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#starting-from-genbanks\" \"Link\"\\n\\n\\nclick Filtering href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#filtering-the-genomes\" \"Link\"\\nclick Gene_recovery href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#gene-recovery\" \"Link\"\\nclick RF_recon href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#draft-pan-gsmm-reference-free-reconstruction\" \"Link\"\\nclick expanded_RB_recon href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#draft-pan-gsmm--expanded-reference-based-reconstruction\" \"Link\"\\nclick TCDB_expansion href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#transport-reactions-expansion-with-tcdb\" \"Link\"\\nclick reannotation href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#annotation-and-duplicates-removal\" \"Link\"\\nclick deduplication href \"https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#annotation-and-duplicates-removal\" \"Link\"\\n\\n\\nstyle Part_1 fill:white\\n\\n\\n\\n\\nstyle Input_proteomes fill:lightgreen\\nstyle Input_genomes fill:lightgreen\\nstyle Input_taxids fill:lightgreen\\nstyle Input_genbanks fill:lightgreen\\nstyle Input_ref_proteome fill:lightgreen\\nstyle Input_ref_model fill:lightgreen\\nstyle Input_manual_corrections fill:lightgreen\\n\\nstyle initial_Genomes fill:lightblue \\nstyle Busco fill:burlywood\\nstyle N50 fill:burlywood\\nstyle n_contigs fill:burlywood\\nstyle Genomes fill:lightblue\\nstyle initial_Proteomes fill:lightblue\\nstyle Proteomes fill:lightblue\\nstyle initial_PAM fill:lightblue\\nstyle initial_Ref_seqs fill:lightblue\\nstyle PAM fill:gold\\nstyle Ref_seqs fill:lightblue\\nstyle BiGG_genes fill:violet\\nstyle Busco_db fill:violet\\nstyle Universe fill:violet\\nstyle eggNOG_db fill:violet\\nstyle RF_draft_panmodel fill:lightblue\\nstyle translated_ref fill:lightblue\\nstyle Initial_draft_panmodel fill:lightblue\\nstyle draft_panmodel fill:lightblue\\nstyle Gene_functions fill:lightblue\\n\\n\\nstyle Gene_recovery fill:whitesmoke\\nstyle NCBI_download fill:whitesmoke\\nstyle CDS_prediction fill:whitesmoke\\nstyle Filtering fill:whitesmoke\\nstyle Bio_metrics fill:whitesmoke\\nstyle Tech_metrics fill:whitesmoke\\nstyle Clustering fill:whitesmoke\\nstyle BRH fill:whitesmoke\\nstyle RF_recon fill:whitesmoke\\nstyle expanded_RB_recon fill:whitesmoke\\nstyle Func_annot fill:whitesmoke\\nstyle TCDB_expansion fill:whitesmoke\\nstyle reannotation fill:whitesmoke\\nstyle deduplication fill:whitesmoke \\n\\n\\n\\nPrio_gapfilling --> panmodel{{panmodel}}\\n\\n\\ndraft_panmodel{{draft_panmodel}} --> Prio_gapfilling\\nUniverse{{Universe}} --> Prio_gapfilling\\nPAM{{PAM}} --> Prio_gapfilling\\n\\n\\nstyle panmodel fill:gold\\n\\nstyle Prio_gapfilling fill:whitesmoke\\nstyle draft_panmodel fill:lightblue\\nstyle Universe fill:violet\\nstyle PAM fill:gold\\nsubgraph Part_3[Part 3]\\n\\nstrain_derivation --> strain_models{{strain_models}} --> gapfilling --> gf_strain_models{{gf_strain_models}} \\ngf_strain_models --> species_derivation --> species_models{{species_models}}\\ngf_strain_models --> aux_prediction --> aux{{aux}}\\ngf_strain_models --> cnps_prediction --> cnps{{cnps}}\\ngf_strain_models --> biosynth_prediction --> biosynth{{biosynth}}\\ngf_strain_models --> rpam{{rpam}}\\n\\nend\\n\\npanmodel{{panmodel}} & PAM{{PAM}} --> strain_derivation\\n\\n\\n\\nclick strain_derivation href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#deriving-strain-specific-models\" \"Link\"\\nclick gapfilling href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#gap-filling-strain-specific-models\" \"Link\"\\nclick species_derivation href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#deriving-species-specific-models\" \"Link\"\\nclick aux_prediction href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#strain-specific-screenings\" \"Link\"\\nclick cnps_prediction href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#strain-specific-screenings\" \"Link\"\\nclick biosynth_prediction href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#strain-specific-screenings\" \"Link\"\\n\\n\\n\\nstyle Part_3 fill:white\\n\\nstyle strain_models fill:lightblue\\nstyle gf_strain_models fill:gold\\nstyle species_models fill:gold\\nstyle gapfilling fill:whitesmoke\\nstyle aux_prediction fill:whitesmoke\\nstyle cnps_prediction fill:whitesmoke\\nstyle biosynth_prediction fill:whitesmoke\\nstyle panmodel fill:gold\\nstyle PAM fill:gold\\nstyle aux fill:salmon\\nstyle cnps fill:salmon\\nstyle biosynth fill:salmon\\nstyle rpam fill:salmon\\n\\nstyle strain_derivation fill:whitesmoke\\nstyle species_derivation fill:whitesmoke \\n\\n\\n';\n",
       "            const element = document.querySelector('.mermaid-ce422542-bec1-452f-99ec-b54576251bfa');\n",
       "            const { svg } = await mermaid.render('graphDiv-ce422542-bec1-452f-99ec-b54576251bfa', graphDefinition);\n",
       "            element.innerHTML = svg;\n",
       "            \n",
       "            const elem = document.getElementById('graphDiv-ce422542-bec1-452f-99ec-b54576251bfa');\n",
       "            const panzoom = Panzoom(elem, {maxScale: 50});\n",
       "            panzoom.zoom(2);\n",
       "            elem.parentElement.addEventListener('wheel', panzoom.zoomWithWheel);\n",
       "        </script>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from gempipe import Flowchart\n",
    "\n",
    "file = open('flowcharts/part_1.flowchart', 'r')\n",
    "part_1 = file.read()\n",
    "file.close()\n",
    "\n",
    "file = open('flowcharts/autopilot.flowchart', 'r')\n",
    "autopilot = file.read()\n",
    "file.close()\n",
    "\n",
    "file = open('flowcharts/part_3.flowchart', 'r')\n",
    "part_3 = file.read()\n",
    "file.close()\n",
    "\n",
    "header = 'flowchart LR \\n'\n",
    "flowchart = Flowchart(header + part_1 + autopilot + part_3)\n",
    "flowchart.render(height=300, zoom=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30567491-89fa-4907-9520-5f4425cc31cc",
   "metadata": {
    "tags": []
   },
   "source": [
    "# gempipe autopilot\n",
    "\n",
    "`gempipe autopilot` is an additional command line program, which internally calls [`gempipe recon`](part_1_gempipe_recon.ipynb) and [`gempipe derive`](part_3_gempipe_derive.ipynb), linking them together performing an automated gap-filling on the draft pan-GSMM, as a (_discouraged_) alternative to the manual curation. By design, it has the options of both `gempipe recon` and `gempipe derive`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4bf72f77-e116-49ce-a195-29ce6345742e",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "usage: gempipe autopilot [-h] [-v] [-c] [-o] [--verbose] [--overwrite] [--dbs]\n",
      "                         [-t] [-g] [-p] [-gb] [-s] [-b] [--buscoM] [--buscoF]\n",
      "                         [--ncontigs] [--N50] [--identity] [--coverage] [-rm]\n",
      "                         [-rp] [-rs] [-mc] [--tcdb] [--dedup] [--norec]\n",
      "                         [--dbmem] [--sbml] [--nofig] [-md] [-m] [--minflux]\n",
      "                         [--minpanflux] [--biolog] [--aux] [--cnps]\n",
      "                         [--cnps_minmed] [--biosynth]\n",
      "\n",
      "gempipe v1.38.4, please cite \"TODO\". Full documentation available at\n",
      "https://gempipe.readthedocs.io/en/latest/index.html.\n",
      "\n",
      "optional arguments:\n",
      "  -h, --help            Show this help message and exit.\n",
      "  -v, --version         Show version number and exit.\n",
      "  -c , --cores          Number of parallel processes to use. (default: 1)\n",
      "  -o , --outdir         Main output directory (will be created if not\n",
      "                        existing). (default: ./)\n",
      "  --verbose             Make stdout messages more verbose, including debug\n",
      "                        messages. (default: False)\n",
      "  --overwrite           Delete the working/ directory at the startup.\n",
      "                        (default: False)\n",
      "  --dbs                 Path were the needed databases are stored (or\n",
      "                        downloaded if not already existing). (default:\n",
      "                        ./working/dbs/)\n",
      "  -t , --taxids         Taxids of the species to model (comma separated, for\n",
      "                        example '252393,68334'). (default: -)\n",
      "  -g , --genomes        Input genome files or folder containing the genomes\n",
      "                        (see documentation). (default: -)\n",
      "  -p , --proteomes      Input proteome files or folder containing the\n",
      "                        proteomes (see documentation). (default: -)\n",
      "  -gb , --genbanks      Input genbank files (.gb, .gbff) or folder containing\n",
      "                        the genbanks (see documentation). (default: -)\n",
      "  -s , --staining       Gram staining, 'pos' or 'neg'. (default: neg)\n",
      "  -b , --buscodb        Busco database to use ('show' to see the list of\n",
      "                        available databases). (default: bacteria_odb10)\n",
      "  --buscoM              Maximum number of missing Busco's single copy\n",
      "                        orthologs (absolute or percentage). (default: 2%)\n",
      "  --buscoF              Maximum number of fragmented Busco's single copy\n",
      "                        orthologs (absolute or percentage). (default: 100%)\n",
      "  --ncontigs            Maximum number of contigs allowed per genome.\n",
      "                        (default: 200)\n",
      "  --N50                 Minimum N50 allowed per genome. (default: 50000)\n",
      "  --identity            Minimum percentage amino acidic sequence identity to\n",
      "                        use when aligning against the BiGG gene database.\n",
      "                        (default: 30)\n",
      "  --coverage            Minimum percentage coverage to use when aligning\n",
      "                        against the BiGG gene database. (default: 70)\n",
      "  -rm , --refmodel      Model to be used as reference. (default: -)\n",
      "  -rp , --refproteome   Proteome to be used as reference. (default: -)\n",
      "  -rs , --refspont      Reference gene marking spontaneous reactions.\n",
      "                        (default: spontaneous)\n",
      "  -mc , --mancor        Manual corrections to apply during the reference\n",
      "                        expansion. (default: -)\n",
      "  --tcdb                Experimental feature: try to build transport reactions\n",
      "                        using TCDB. (default: False)\n",
      "  --dedup               Try to remove duplicate metabolites and reactions\n",
      "                        using MNX annotation, when a reference is provided.\n",
      "                        (default: False)\n",
      "  --norec               Skip gene recovery when starting from genomes.\n",
      "                        (default: False)\n",
      "  --dbmem               Load the entire eggNOG-mapper database into memory\n",
      "                        (should speed up the functional annotation step).\n",
      "                        (default: False)\n",
      "  --sbml                Save the output GSMMs in SBML format (L3V1 FBC2) in\n",
      "                        addition to JSON. (default: False)\n",
      "  --nofig               Skip the generation of figures. (default: False)\n",
      "  -md , --metadata      Table for manual correction of genome metadata.\n",
      "                        (default: -)\n",
      "  -m , --media          Medium definition file or folder containing media\n",
      "                        definitions, to be used during the automatic gap-\n",
      "                        filling. (default: -)\n",
      "  --minflux             Minimum flux through the objective of strain-specific\n",
      "                        models. (default: 0.1)\n",
      "  --minpanflux          Minimum flux through the objective of the pan model.\n",
      "                        (default: 0.3)\n",
      "  --biolog              Simulate Biolog's utilization tests on strain-specific\n",
      "                        models. (default: False)\n",
      "  --aux                 Test auxotrophies for aminoacids and vitamins.\n",
      "                        (default: False)\n",
      "  --cnps                Sistematically simulate growth on all the available\n",
      "                        C-N-P-S sources. (default: False)\n",
      "  --cnps_minmed         Base the C-N-P-S simulations on a minimal medium\n",
      "                        leading to the specified minimum objective value. If\n",
      "                        0, user-defined medium will be used. (default: 0.0)\n",
      "  --biosynth            Check biosynthesis of each metabolite while granting\n",
      "                        the specified minimum fraction of objective. If 0,\n",
      "                        this step will be skipped. (default: 0.0)\n"
     ]
    }
   ],
   "source": [
    "import subprocess\n",
    "\n",
    "command = f\"\"\"gempipe autopilot -h\"\"\"\n",
    "process = subprocess.Popen(command, shell=True)\n",
    "response = process.wait()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c82cf33-251c-407d-bff7-7195c87201da",
   "metadata": {},
   "source": [
    "With `gempipe autopilot`, an automatic gap-filling is applied to the draft pan-GSMM. The `--minpanflux` parameter specifies the minimal flux through the objective for the pan-GSMM, usually the biomass equation. The gapfilling is repeated once for each growth media indicated with `-m`/`--media`. Instructions on how to encode a medium recipe can be found on [Gap-filling strain-specific models](https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#gap-filling-strain-specific-models). For more information on how this gapfilling step is implemented, please read Methods in the [Gempipe paper](how_to_cite.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "797038d4-bb9d-4681-ac96-c4d6da0ba2f7",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}