{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6ab8296a-c2cf-4eaf-a47c-87d165e01aa4",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%aimport gempipe, gempipe.flowchart\n",
    "%autoreload 1 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "b5a6b58b-aa71-4e3c-be9b-2343ee178888",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "        <style> #outcellbox {display: flex; justify-content: center; overflow: hidden; width: 99%; height: 300px; background-color: #ffffff; border: 1px solid grey;} </style>\n",
       "        \n",
       "        <script src=\"https://unpkg.com/@panzoom/panzoom@4.5.1/dist/panzoom.min.js\"></script>\n",
       "\n",
       "        <div class=\"mermaid-cd83c407-3a61-450f-b46d-5fbabb56525a\" id=\"outcellbox\"></div> \n",
       "        <script type=\"module\">\n",
       "            import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10.6.1/+esm';\n",
       "            const graphDefinition = 'flowchart LR \\nsubgraph Part_3[Part 3]\\n\\nstrain_derivation --> strain_models{{strain_models}} --> gapfilling --> gf_strain_models{{gf_strain_models}} \\ngf_strain_models --> species_derivation --> species_models{{species_models}}\\ngf_strain_models --> aux_prediction --> aux{{aux}}\\ngf_strain_models --> cnps_prediction --> cnps{{cnps}}\\ngf_strain_models --> biosynth_prediction --> biosynth{{biosynth}}\\ngf_strain_models --> rpam{{rpam}}\\n\\nend\\n\\npanmodel{{panmodel}} & PAM{{PAM}} --> strain_derivation\\n\\n\\n\\nclick strain_derivation href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#deriving-strain-specific-models\" \"Link\"\\nclick gapfilling href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#gap-filling-strain-specific-models\" \"Link\"\\nclick species_derivation href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#deriving-species-specific-models\" \"Link\"\\nclick aux_prediction href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#strain-specific-screenings\" \"Link\"\\nclick cnps_prediction href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#strain-specific-screenings\" \"Link\"\\nclick biosynth_prediction href \"https://gempipe.readthedocs.io/en/latest/part_3_gempipe_derive.html#strain-specific-screenings\" \"Link\"\\n\\n\\n\\nstyle Part_3 fill:white\\n\\nstyle strain_models fill:lightblue\\nstyle gf_strain_models fill:gold\\nstyle species_models fill:gold\\nstyle gapfilling fill:whitesmoke\\nstyle aux_prediction fill:whitesmoke\\nstyle cnps_prediction fill:whitesmoke\\nstyle biosynth_prediction fill:whitesmoke\\nstyle panmodel fill:gold\\nstyle PAM fill:gold\\nstyle aux fill:salmon\\nstyle cnps fill:salmon\\nstyle biosynth fill:salmon\\nstyle rpam fill:salmon\\n\\nstyle strain_derivation fill:whitesmoke\\nstyle species_derivation fill:whitesmoke \\n\\n\\n';\n",
       "            const element = document.querySelector('.mermaid-cd83c407-3a61-450f-b46d-5fbabb56525a');\n",
       "            const { svg } = await mermaid.render('graphDiv-cd83c407-3a61-450f-b46d-5fbabb56525a', graphDefinition);\n",
       "            element.innerHTML = svg;\n",
       "            \n",
       "            const elem = document.getElementById('graphDiv-cd83c407-3a61-450f-b46d-5fbabb56525a');\n",
       "            const panzoom = Panzoom(elem, {maxScale: 50});\n",
       "            panzoom.zoom(1);\n",
       "            elem.parentElement.addEventListener('wheel', panzoom.zoomWithWheel);\n",
       "        </script>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from gempipe import Flowchart\n",
    "\n",
    "file = open('flowcharts/part_3.flowchart', 'r')\n",
    "header = 'flowchart LR \\n'\n",
    "flowchart = Flowchart(header + file.read())\n",
    "file.close()\n",
    "flowchart.render(height=300, zoom=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42513ac2-81c9-42fa-9f02-0fdf5f977092",
   "metadata": {},
   "source": [
    "# Part 3: gempipe derive\n",
    "\n",
    "`gempipe derive` is designed to derive strain-specific GSMMs, using as main inputs the PAM (presence/absence matrix) and the pan-GSMM. Below are shown all its options: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "10f5f830-219f-46e1-94b0-ea8807aa1344",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "usage: gempipe derive [-h] [-v] [-c] [-o] [--verbose] [-im] [-ip] [-ir] [-ig]\n",
      "                      [-m] [--minflux] [--biolog] [--sbml] [--skipgf]\n",
      "                      [--nofig] [--aux] [--cnps] [--cnps_minmed] [--biosynth]\n",
      "\n",
      "gempipe v1.38.4, please cite \"TODO\". Full documentation available at\n",
      "https://gempipe.readthedocs.io/en/latest/index.html.\n",
      "\n",
      "optional arguments:\n",
      "  -h, --help           Show this help message and exit.\n",
      "  -v, --version        Show version number and exit.\n",
      "  -c , --cores         How many parallel processes to use. (default: 1)\n",
      "  -o , --outdir        Main output directory (will be created if not\n",
      "                       existing). (default: ./)\n",
      "  --verbose            Make stdout messages more verbose, including debug\n",
      "                       messages. (default: False)\n",
      "  -im , --inpanmodel   Path to the input pan-model. (default: -)\n",
      "  -ip , --inpam        Path to the input PAM. (default: -)\n",
      "  -ir , --inreport     Path to the input report file. (default: -)\n",
      "  -ig , --ingannots    Path to the input genes annotation file. (default: -)\n",
      "  -m , --media         Medium definition file or folder containing media\n",
      "                       definitions, to be used during the automatic gap-\n",
      "                       filling. (default: -)\n",
      "  --minflux            Minimum flux through the objective of strain-specific\n",
      "                       models. (default: 0.1)\n",
      "  --biolog             Simulate Biolog's utilization tests on strain-specific\n",
      "                       models. (default: False)\n",
      "  --sbml               Save the output GSMMs in SBML format (L3V1 FBC2) in\n",
      "                       addition to JSON. (default: False)\n",
      "  --skipgf             Skip the gap-filling step applied to the strain-\n",
      "                       specific models. (default: False)\n",
      "  --nofig              Skip the generation of figures. (default: False)\n",
      "  --aux                Test auxotrophies for aminoacids and vitamins.\n",
      "                       (default: False)\n",
      "  --cnps               Sistematically simulate growth on all the available\n",
      "                       C-N-P-S sources. (default: False)\n",
      "  --cnps_minmed        Base the C-N-P-S simulations on a minimal medium\n",
      "                       leading to the specified minimum objective value. If 0,\n",
      "                       user-defined medium will be used. (default: 0.0)\n",
      "  --biosynth           Check biosynthesis of each metabolite while granting\n",
      "                       the specified minimum fraction of objective. If 0, this\n",
      "                       step will be skipped. (default: 0.0)\n"
     ]
    }
   ],
   "source": [
    "import subprocess\n",
    "\n",
    "command = f\"\"\"gempipe derive -h\"\"\"\n",
    "process = subprocess.Popen(command, shell=True)\n",
    "response = process.wait()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1f1c1a8-4246-4b77-bfcd-755a6bfb1cb2",
   "metadata": {},
   "source": [
    "## Deriving strain-specific models\n",
    "\n",
    "Three inputs are required by `gempipe derive` to start: \n",
    "\n",
    "1. The pan-GSMM (`-im`/`--inpanmodel`), in its final version. Ideally, the  pan-GSMM was produced with `gempipe recon` ([Part 1](part_1_gempipe_recon.ipynb)) and subsequently curated ([Part 2](part_2_manual_curation.ipynb)).\n",
    "\n",
    "2. The presence/absence matrix (PAM) (`-ip`/`--inpam`), as it was provided by `gempipe recon`. As discussed in [Part 1](https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#creation-of-a-pam), this table contains clusters in row and strains in column, and each cell contains the strain's genes belonging to the corresponding cluster.\n",
    "\n",
    "3. The report (`-ir`/`--inreport`) produced by `gempipe recon`, contaning the strain-to-species relationships. \n",
    "\n",
    "An example of command line is reported below: \n",
    "\n",
    "    gempipe derive -c 16 -im curation/panmodel.json -ip reconout/pam.csv -ir reconout/report.csv\n",
    "\n",
    "When `gempipe derive` starts, a species-specific GSMM is created for each genome appearing in the PAM. This procedure may be called \"derivation by subtraction\", meaning that a copy of the pan-GSMM is subtracted from all the genes that are missing from the corresponding strain, according to the PAM content. Removing genes can eventually lead to the removal of metabolic reactions, in accordance to the GPR definitions. \n",
    "\n",
    "💡 **Tip!** If [`gempipe recon`](part_1_gempipe_recon.ipynb) started from genbanks, then a genome annotation file was produced. It can be passed to `gempipe derive` using `-ig`/`--ingannots` to improve the \"gene annotation\" section of the [Memote reports](https://memote.readthedocs.io/en/latest/index.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5690d1d-1238-45eb-87e8-02d1d2b570d8",
   "metadata": {},
   "source": [
    "## Gap-filling strain-specific models\n",
    "\n",
    "After strain-specific GSMMs are derived, they may not grow. The reasons are several:\n",
    "\n",
    "* the biomass equation coming from the pan-GSMM contains some strain-specific precursors. In this case, it would be better to first remove the strain-specific precursors from the biomass assembly during the manual curation in [Part 2](part_2_manual_curation.ipynb).\n",
    "\n",
    "* the GPR of some reactions essential to form biomass doesn't take into account strain-specific gene variants. By design, the [clustering step](https://gempipe.readthedocs.io/en/latest/part_1_gempipe_recon.html#creation-of-a-pam) in `gempipe recon` is performed at high sequence identity (90%), meaning that two orthologous genes, if sufficiently dissimilar, may be grouped in different clusters even if the function is the same. Despite `gempipe recon` does its best to manage these cases, it could be necessary to expand some GPR during the manual curation in [Part 2](part_2_manual_curation.ipynb).\n",
    "\n",
    "To solve the cases where a strain-specific GSMM doesn't grow, an automated and strain-specific step of **gap-filling** is performed. As the gap-filling is dependant on the nutritional inputs, one or more growth media recipes, better if minimal, have to be provided (`--media`): they are assumed to support growth of **all** the strains in input. Despite this automated gap-filling, each strain that doesn't grow should deserve a manual inspection, to see if one of the cases discussed above has occurred, leading to an improved manual curation.  \n",
    "\n",
    "Medium recipes can be described using a **json file** following a rigid syntax, composed by the keys `name` and `exchanges`, plus the lower bounds of each component. For example:\n",
    "\n",
    "```\n",
    "{\"name\": \"my_medium_name\", \"exchanges\": {\"EX_ca2_e\": -1000, \"EX_cl_e\": -1000, \"EX_co2_e\": -1000, \"EX_cobalt2_e\": -1000, \"EX_cu2_e\": -1000, \"EX_h_e\": -1000, \"EX_fe2_e\": -1000, \"EX_fe3_e\": -1000, \"EX_h2o_e\": -1000, \"EX_pi_e\": -1000, \"EX_glc__D_e\": -1000, \"EX_so4_e\": -1000, \"EX_k_e\": -1000, \"EX_zn2_e\": -1000, \"EX_mg2_e\": -1000, \"EX_mn2_e\": -1000, \"EX_mobd_e\": -1000, \"EX_nh4_e\": -1000, \"EX_o2_e\": -1000}}\n",
    "```\n",
    "\n",
    "However, please note that the gap-filling may hide strain-specific auxotrophies. If the detection of auxotrophies is of interest, and the medium provided is not rich (e.g. it does not include amino acids and vitamins), then the gap-filling phase should be skipped using `--skipgf`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb963baf-27e9-49d5-8aab-ef5963c0580b",
   "metadata": {},
   "source": [
    "## Biolog® PM simulations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "199d5ede-9463-4c16-83a7-9c5d4bcd7f2b",
   "metadata": {},
   "source": [
    "[Biolog® PM](https://www.biolog.com/products/metabolic-characterization-microplates/microbial-phenotype/) growth assays are commonly used to drive manual curation and to validate a GSMM. Using the `--biolog` option, the user can automatically simulate all the main [Biolog® PM plates](https://www.biolog.com/wp-content/uploads/2024/06/00A-042-Rev-E-Phenotype-MicroArrays-1-10-Plate-Maps.pdf) (such as PM1, PM2A, PM3, PM4, and so on) for each modeled strain. Alternatively, the same function can be interactively called via the [Gempipe API](autoapi/gempipe/interface/index) (see the function [gempipe.biolog_preview](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/gaps/index.html#gempipe.interface.gaps.biolog_preview)). \n",
    "\n",
    "With this feature, substrates composing the Biolog® PM plates are iterated and tested for utilization, see below for an example output. The same substrate (`substrate`) can be considered as a C, N, P, or S source (`source`) according to the plate or origin; in these cases, a different growth simulation will be performed for each nutritive source. For example, if the starting C source is glucose (`exr_before`), then it is first removed and an FBA is performed; then the new C source (`exr_after`) is introduced, if it is present in the GSMM (`exr_after_present`), and the FBA is performed again; if the objective value increases, then the strain is assumed to be able to use the substrate (`growth=True`). For more information please read the [Gempipe paper](how_to_cite.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8cd70cbc-fd35-4394-b516-37a7b67d168e",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>substrate</th>\n",
       "      <th>source</th>\n",
       "      <th>formula</th>\n",
       "      <th>exr_before</th>\n",
       "      <th>exr_before_present</th>\n",
       "      <th>exr_after</th>\n",
       "      <th>exr_after_present</th>\n",
       "      <th>growth</th>\n",
       "      <th>value</th>\n",
       "      <th>status</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>L-Arabinose</td>\n",
       "      <td>C</td>\n",
       "      <td>NaN</td>\n",
       "      <td>EX_glc__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_arab__L_e</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>D-Saccharic Acid</td>\n",
       "      <td>C</td>\n",
       "      <td>C6H8O8</td>\n",
       "      <td>EX_glc__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_glcr_e</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>19.728400</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>D-Galactose</td>\n",
       "      <td>C</td>\n",
       "      <td>C6H12O6</td>\n",
       "      <td>EX_glc__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_gal_e</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>L-Aspartic Acid</td>\n",
       "      <td>N</td>\n",
       "      <td>C4H6NO4</td>\n",
       "      <td>EX_nh4_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_asp__L_e</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>28.743739</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>L-Aspartic Acid</td>\n",
       "      <td>C</td>\n",
       "      <td>C4H6NO4</td>\n",
       "      <td>EX_glc__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_asp__L_e</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>17.614535</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>646</th>\n",
       "      <td>Thiourea</td>\n",
       "      <td>S</td>\n",
       "      <td>NaN</td>\n",
       "      <td>EX_so4_e</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>647</th>\n",
       "      <td>D-Serine</td>\n",
       "      <td>N</td>\n",
       "      <td>C3H7NO3</td>\n",
       "      <td>EX_nh4_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_ser__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>34.731331</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>648</th>\n",
       "      <td>D-Serine</td>\n",
       "      <td>C</td>\n",
       "      <td>C3H7NO3</td>\n",
       "      <td>EX_glc__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_ser__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>33.295776</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>649</th>\n",
       "      <td>Succinic Acid</td>\n",
       "      <td>C</td>\n",
       "      <td>C4H4O4</td>\n",
       "      <td>EX_glc__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_succ_e</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>17.473351</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>650</th>\n",
       "      <td>Fumaric Acid</td>\n",
       "      <td>C</td>\n",
       "      <td>C4H2O4</td>\n",
       "      <td>EX_glc__D_e</td>\n",
       "      <td>True</td>\n",
       "      <td>EX_fum_e</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>17.614535</td>\n",
       "      <td>optimal</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>651 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            substrate source  formula   exr_before  exr_before_present   \n",
       "0         L-Arabinose      C      NaN  EX_glc__D_e                True  \\\n",
       "1    D-Saccharic Acid      C   C6H8O8  EX_glc__D_e                True   \n",
       "2         D-Galactose      C  C6H12O6  EX_glc__D_e                True   \n",
       "3     L-Aspartic Acid      N  C4H6NO4     EX_nh4_e                True   \n",
       "4     L-Aspartic Acid      C  C4H6NO4  EX_glc__D_e                True   \n",
       "..                ...    ...      ...          ...                 ...   \n",
       "646          Thiourea      S      NaN     EX_so4_e                True   \n",
       "647          D-Serine      N  C3H7NO3     EX_nh4_e                True   \n",
       "648          D-Serine      C  C3H7NO3  EX_glc__D_e                True   \n",
       "649     Succinic Acid      C   C4H4O4  EX_glc__D_e                True   \n",
       "650      Fumaric Acid      C   C4H2O4  EX_glc__D_e                True   \n",
       "\n",
       "        exr_after  exr_after_present growth      value   status  \n",
       "0    EX_arab__L_e              False    NaN        NaN      NaN  \n",
       "1       EX_glcr_e               True   True  19.728400  optimal  \n",
       "2        EX_gal_e               True  False   0.000000  optimal  \n",
       "3     EX_asp__L_e               True   True  28.743739  optimal  \n",
       "4     EX_asp__L_e               True   True  17.614535  optimal  \n",
       "..            ...                ...    ...        ...      ...  \n",
       "646           NaN              False    NaN        NaN      NaN  \n",
       "647   EX_ser__D_e               True   True  34.731331  optimal  \n",
       "648   EX_ser__D_e               True   True  33.295776  optimal  \n",
       "649     EX_succ_e               True   True  17.473351  optimal  \n",
       "650      EX_fum_e               True   True  17.614535  optimal  \n",
       "\n",
       "[651 rows x 10 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pnd\n",
    "\n",
    "biolog_example = pnd.read_csv('tutoring_materials/K27.csv', index_col=0)\n",
    "biolog_example"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d7dc9bf-a7fe-4de6-926f-9a32bb4f7ded",
   "metadata": {},
   "source": [
    "## Strain-specific screenings\n",
    "\n",
    "GSMMs can be used to predict strain-specific metabolic features, on which a multi-strain analysis can be based (see [Part 4](part_4_multi_strain_analysis.ipynb)). With `gempipe derive`, for each strain, it is possible to predict some of these features. They are binary, meaning that only their presence/absence is discribed, not their magnitude.\n",
    "\n",
    "Using the `--aux` option, each strain is screened for auxotrophies for amino acids and vitamins. The output is a binary feature table (`aux.csv`) having strains in column and amino acids or vitamins in row, for which the auxotrophy is evaluated. Cells contain 1 when the auxotrophy is present, otherwise 0.\n",
    "\n",
    "Using the `--cnps` option, the ability to catabolize alternative C/N/P/S sources is evaluated for each strain. Similarly to Biolog substrates, a starting C source is first removed and an FBA is performed; then an alternative C source is introduced and the FBA is performed again; if the objective value increases, this indicates that the strain can potentially catabolize the alternative source. This is iterated for every C source having a dedicated exchange reaction in the GSMM, then alternative N, P, and S sources are evaluated using the same procedure. The output is a binary feature table (`cnps.csv`) containing 1 where a catabolic activity is detected, otherwise 0. To perform this analysis on a computed minimal medium, add the `--cnps_minmed` option.\n",
    "\n",
    "Using the `--biosynth` option, the potential biosynthesis of target metabolites, is evaluated for each strain. Biomass production is constrained to a user-specified fraction of its maximum. The resulting a binary feature table (`biosynth.csv`) contains 1 when strains are considered potential producers of the tested metabolite. By default, all cytosolic metabolites are tested. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c91b0213-2c2b-42b4-90d4-d2343b00abed",
   "metadata": {},
   "source": [
    "## Deriving species-specific models\n",
    "\n",
    "A species-specific GSMM is here defined as a GSMM composed by all the reactions **always** present in **all** the strain-specific GSMMs of the same species. Having a representative GSMM for each species enables to perform **comparisons** at the species level, detecting for example which pathways distinguish a species from other phylogenetically close species.\n",
    "\n",
    "In `gempipe derive`, after gap-filling the strain-specific GSMMs, one species-specific GSMM is derived for each of the species in input. Moreover, like strain-specific GSMMs, species-specific GSMMs are also gapfilled on the media indicated with `--media`, unless `--skipgf` is activated. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1e1ffd9-70f0-4da7-adea-601c89c38939",
   "metadata": {},
   "source": [
    "## Output files\n",
    "\n",
    "`gempipe derive` produces 2 **main** output files in the current directory (`-o`/`--outdir`):\n",
    "\n",
    "* **strain_models_gf/*.json.** The gap-filled strain-specific GSMMs.\n",
    "\n",
    "* **species_model_gf/*.json.** The gap-filled species-specific GSMMs.\n",
    "\n",
    "Other useful files are the following:\n",
    "\n",
    "* **strain_models/*.json.** The strain-specific GSMMs, before gap-filling. \n",
    "\n",
    "* **species_models/*.json.** The species-specific GSMMs, before gap-filling. \n",
    "\n",
    "* **derive_strains.csv.** Table showing metrics for each strain-specific GSMM, such as the number of reactions, the objective value and the solver status, before and after the gap-filling step. \n",
    "\n",
    "* **derive_species.csv.** Table showing metrics for each species-specific GSMM, such as the number of reaction, the objective value and the solver status. One row for each species inputted in `gempipe recon`.\n",
    "\n",
    "* **rpam.csv** Binary feature table reporting the presence of each reaction in each strain-specific GSMM.\n",
    "\n",
    "* **aux.csv** Binary feature table reporting the presence amino acids and vitamins auxotrophies for each strain-sspecific GSMM (requires `--aux`).\n",
    "\n",
    "* **cnps.csv** Binary feature table reporting the potential catabolic activity for alternative C, N, P, and S sources (requires `--cnps`).\n",
    "\n",
    "* **biosynth.csv** Binary feature table reporting potential biosynthetic activities (requires `--biosynth`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82206a8b-97e7-409b-9716-c8119095f691",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}