{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "21ff81aa-2c70-470f-8e45-47ca02741f8f",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%aimport gempipe, gempipe.interface, gempipe.interface.sanity, gempipe.interface.gaps, gempipe.interface.medium\n",
    "%autoreload 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2167d1a0-e0b8-4cf6-b37b-a27e22865fd9",
   "metadata": {},
   "source": [
    "# _Tutorial:_ sanity check"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15f5aad9-6693-4b83-a414-f79513357622",
   "metadata": {},
   "source": [
    "The main output of `gempipe recon` is a folder (`-o`/`--outdir`) containing, among the other files, the draft pan-GSMM. Like every draft GSMM, it needs some sanity checks, and eventually some gap-filling. In this tutorial we show some handy functions contained in Gempipe to perform the sanity check. This is not meant to replace the community effort [Memote](https://memote.readthedocs.io/en/latest/), but just to provide a quick and convenient way to check the main sanity standards before going on deriving strain-specific GSMMs with `gempipe derive`. \n",
    "\n",
    "Here we start from an expanded **reference**-based reconstruction created with `gempipe recon`, based on a curated model for _Lactiplantibacillus plantarum_ (formerly _Lactobacillus plantarum_). This reference was originally created in [Teusink2006](https://doi.org/10.1074/jbc.M606263200), but the GSMM used during this reconstruction is an updated version taken from the supplementary materials of [Mendoza2021](https://doi.org/10.1186/s13059-019-1769-1). 22 _L. plantarum_ genomes were selected from the strains included in the comparative analysis by [Siezen2010](https://doi.org/10.1111/j.1462-2920.2009.02119.x).\n",
    "\n",
    "    gempipe recon -s pos -g plantarum_genomes/ -b lactobacillales_odb10 -c 16 -rm from_mendoza/iLP728.xml -rp from_mendoza/protein_fasta.faa -mc mancor.txt --outdir docs/tutoring_materials/plantarum\n",
    "    \n",
    "To begin, we load the library and the draft pan-GSMM we want to check:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "93f17d33-0266-44f7-b26b-5d73dabe8af3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading PAM (tutoring_materials/plantarum/pam.csv)...\n",
      "Loading functional annotation table (tutoring_materials/plantarum/annotation.csv)...\n",
      "Loading report table (tutoring_materials/plantarum/report.csv)...\n",
      "Loading draft pan-GSMM (tutoring_materials/plantarum/draft_panmodel.json)...\n"
     ]
    }
   ],
   "source": [
    "import gempipe\n",
    "\n",
    "# initialize gempipe on the 'gempipe recon' --outdir:\n",
    "panmodel = gempipe.initialize(\"tutoring_materials/plantarum\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17e7365a-ab14-4480-83f8-1e97ff925bd4",
   "metadata": {},
   "source": [
    "We start from setting the unconstrained lower and upper bounds. Originally they were set as -999999/999999, and this doesn't represent an issue. Anyway, we prefer to have them in the style -1000/1000, that seems more established today. We use the function [gempipe.reset_unconstrained_bounds](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.reset_unconstrained_bounds):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e312222d-a402-40ef-a6a7-0534dc811f08",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(-999999.0, 999999.0)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get the current unconstrained lb/ub:\n",
    "gempipe.get_unconstrained_bounds(panmodel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ed10f80c-8eaf-49bf-9419-36cf660494a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(-1000, 1000)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# set the unconstrained lb/ub to -1000/1000:\n",
    "gempipe.reset_unconstrained_bounds(panmodel)\n",
    "\n",
    "# verify the edit:\n",
    "gempipe.get_unconstrained_bounds(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e3cdc9e-c30f-4990-a28a-f248ce822dbe",
   "metadata": {},
   "source": [
    "Below we seek to identify the biomass assembly reaction of this pan-GSMM using the [gempipe.search_biomass](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.search_biomass) function. The model appears to contain two alternative definitions, one of them lacking the GAM (growth-associated maintainence) term, which is also set as the current **objective**. It's usually adopted to simulate growth in a [retentostat](https://doi.org/10.1038/msb.2010.67), that is a particular kind of chemostat for studying physiology at near-zero growth rates.\n",
    "\n",
    "💡 **Tip!** In [gempipe.search_biomass](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.search_biomass), use `show_reaction=True` to print also the full biomass reaction string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c7edbda3-e605-4596-8264-f7aede69219b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : biomass_LPL_RETB_t576_NoATP : R_biomass_equation_LPL_Retentostat_B_t576h_No_ATP_costs : (0.000435, 1000.0)\n",
      "2 : biomass_LPL60 : R_biomass_equation_LPL_specific : (0.0, 0.0)\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.search_biomass(panmodel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5490314c-5961-44b6-9659-d41e635cb8c0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['biomass_LPL_RETB_t576_NoATP']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gempipe.get_objectives(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc8fc893-f6da-45e3-9079-49ae357803db",
   "metadata": {},
   "source": [
    "Since we are interesting in \"classic\" growth simulations, we need to (1) reset the objective, and (2) **close** the bounds of the other biomass reaction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5ef0d822-3820-4210-863b-834172c6bb22",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set the new objective:\n",
    "panmodel.objective = 'biomass_LPL60'\n",
    "\n",
    "# enable biomass formation:\n",
    "panmodel.reactions.get_by_id('biomass_LPL60').bounds = (0, 1000)\n",
    "\n",
    "# close the other biomass assembly:\n",
    "panmodel.reactions.get_by_id('biomass_LPL_RETB_t576_NoATP').bounds = (0, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae042a80-dd78-4829-ad43-55fcdec38718",
   "metadata": {},
   "source": [
    "Now it's time for a preliminary check of wheter the growth predictions are realistic or not. First of all, we need to define the growth conditions. Instead of using the medium **concentrations** like in the [gap-filling tutorial](tutorial_gapfilling.ipynb), here we set the exchange reactions to the uptake and secretion **rates** experimentally determined in [Teusink2006](https://doi.org/10.1074/jbc.M606263200) for the growth rate **0.3** 1/h. _L. plantarum_ was grown in a 25 mM glucose chemically defined medium (CDM) with a long list of components: those lacking an experimentally determined rate are left unconstrained.\n",
    "\n",
    "💡 **Tip!** In [gempipe.set_bounded_uptakes](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/medium/index.html#gempipe.interface.medium.set_bounded_uptakes) and [gempipe.set_bounded_secretions](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/medium/index.html#gempipe.interface.medium.set_bounded_secretions), bounds can be provided also as a pair of (value, error), if the **experimental error** is available (for example: `'EX_glc__D_e': (11.609, 0.5311)`). Otherwise, just an absolute value is sufficient (for example: `'EX_glc__D_e': 11.609`).\n",
    "\n",
    "As we can see below, the predicted growth rate looks realistic. Later in this tutorial we will try to simulate using medium concentrations too."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "de670b73-dbed-4ca3-903d-8b5160f4e2af",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.36310820624546114"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# define the growth conditions:\n",
    "def apply_rates(model):\n",
    "    gempipe.reset_growth_env(model)\n",
    "    gempipe.set_bounded_uptakes(model, {'EX_glc__D_e': (7.41, 0.339), 'EX_cit_e': (0.6, 0.063), 'EX_ala__L_e': (0.03, 0.006), 'EX_arg__L_e': (0.036, 0.006), 'EX_asp__L_e': (0.147, 0.021), 'EX_cys__L_e': (0.021, 0.009), 'EX_glu__L_e': (0.096, 0.009), 'EX_gly_e': (0.078, 0.021), 'EX_ile__L_e': (0.042, 0.003), 'EX_leu__L_e': (0.066, 0.027), 'EX_lys__L_e': (0.051, 0.006), 'EX_phe__L_e': (0.027, 0.003), 'EX_pro__L_e': (0.054, 0.03), 'EX_ser__L_e': (0.474, 0.318), 'EX_thr__L_e': (0.21, 0.054), 'EX_tyr__L_e': (0.069, 0.033), 'EX_val__L_e': (0.096, 0.015)}) \n",
    "    gempipe.set_bounded_secretions(model, {'EX_lac__D_e': (12.15, 0.495), 'EX_pyr_e': (0.06, 0.009), 'EX_for_e': (1.17, 0.513), 'EX_ac_e': (2.04, 0.444), 'EX_etoh_e': (0.42, 0.114), 'EX_succ_e': (0.72, 0.015)})\n",
    "    gempipe.set_unbounded_exchanges(model, ['EX_h2o_e', 'EX_h_e'])  \n",
    "    # other components in CDM (no experimental rate provided):\n",
    "    gempipe.set_unbounded_exchanges(model, ['EX_pnto__R_e', 'EX_btn_e', 'EX_nac_e', 'EX_4abz_e', 'EX_pydam_e', 'EX_pydxn_e', 'EX_ribflv_e', 'EX_thm_e', 'EX_fol_e', 'EX_pi_e', 'EX_na1_e', 'EX_nh4_e', 'EX_mn2_e', 'EX_ade_e', 'EX_gua_e', 'EX_ins_e', 'EX_orot_e', 'EX_thymd_e', 'EX_ura_e', 'EX_xan_e', 'EX_his__L_e', 'EX_met__L_e', 'EX_trp__L_e'])\n",
    "\n",
    "    \n",
    "# apply medium to the panmodel:\n",
    "apply_rates(panmodel)\n",
    "\n",
    "# simulate biomass production:\n",
    "panmodel.slim_optimize()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38c4a19e-8e44-4813-9db0-b3c70811a089",
   "metadata": {},
   "source": [
    "We now check the eventual presence of **sink** and **demand** reactions, using the function [gempipe.check_sinks_demands](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.check_sinks_demands). There is a demand reaction for the metabolite \"4-hydroxy-5-methyl-3(2H)-furanone\" (`hmfurn_c`). Looking at [literature](https://doi.org/10.1021/jf60160a008), we know that this metabolite is volatile, and must be able to leave the cell. Without any information on specific transporters, it's legitimate to leave this demand reaction, otherwise the FBA would result to be infeasible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "fde53897-6444-4fe1-a380-b14e18cbfd1e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : DM_hmfurn_c : hmfurn_c -->  : (0.0, 1000)\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_sinks_demands(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2214abc-bc43-44d6-bb08-be4f50532cc9",
   "metadata": {},
   "source": [
    "Next, we check that every exchange reaction in the model has an ID starting with `EX_`. This way, we respect **conventions**. Here, exchange reactions are defined simply as reaction having a single metabolite involved, present in the extracellular compartment. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "783b2ad0-4854-4f81-8110-3b8fdc4200d0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No EX_change reaction with bad ID found.\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_exr_notation(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "930cee63-5450-483e-a980-2d6ef20d7b65",
   "metadata": {},
   "source": [
    "We now check the presence of metabolic reactions (no exchange, demand or sink reactions) with **constrained** bounds, which we define as everything except for (-1000, 1000) or reversible, and (-0, 1000) or irreversible. This means that reactions defined to go in the opposite direction will also be highlighted. We note for example, that 3 reactions have been closed, such as the catalase, inherited from the original reference GSMM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "ac0c6e17-ffd8-4159-88b1-236ff0eec743",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : CAT : (0.0, 0.0) : 2.0 h2o2_c --> 2.0 h2o_c + o2_c\n",
      "2 : RHCYS : (0.0, 0.0) : h2o_c + rhcys_c --> hcys__L_c + rib__D_c\n",
      "3 : CYTB_B2 : (0.0, 0.0) : 2.0 h_c + mql7_c + 0.5 o2_c --> h2o_c + 2.0 h_e + mqn7_c\n",
      "4 : AH6PI : (-1000.0, 0.0) : ah6p__D_c <-- f6p_c\n",
      "5 : CLt3_2pp : (-1000.0, 0.0) : 2.0 cl_p + h_c <-- 2.0 cl_c + h_p\n",
      "6 : INOSR : (-1000.0, 0.0) : inost_c + nadp_c <-- 2ins_c + h_c + nadph_c\n",
      "7 : SPMDt3i : (-1000.0, 0.0) : h_c + spmd_e <-- h_e + spmd_c\n",
      "8 : SUCD1 : (-1000.0, 0.0) : fad_c + succ_c <-- fadh2_c + fum_c\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_constrained_metabolic(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "516b020c-3e89-4894-a44d-a168774628a8",
   "metadata": {},
   "source": [
    "Going on, we check which metabolites are using **artificial** atoms, here defined as those not appearing on the periodic table. For example, it's common to see fatty acids linked to ACP (acyl-carrier protein) being modeled with 'X' in the formula: in these cases, a common **paradigm** is used which involves the sobstitution of the ACP part with just an 'X'. We should be aware of all the modeling paradigms adopted in our GSMMs. Below we get an overview using the [gempipe.check_artificial_atoms](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.check_artificial_atoms) function. We can see that, apart from ACP-fatty acids, also the oxidized/reduced thioredoxin (`trdox_c`/`trdrd_c`) is modeled with X. Moreover, all the transfer-RNA are modeled with 'R', together with `apoACP_c`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "f9d8faaa-200b-48f4-9979-004fb5c6cead",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : X : but2eACP_c[C4H4OX], 2chdeacp_c[C16H28OX], 2cocdacp_c[C18H32OX], tddec2eACP_c[C12H20OX], tdec2eACP_c[C10H16OX], tpalm2eACP_c[C16H28OX], thex2eACP_c[C6H8OX], toctd2eACP_c[C18H32OX], toct2eACP_c[C8H12OX], tmrs2eACP_c[C14H24OX], 3haACP_c[C4H6O2X], 3hddecACP_c[C12H22O2X], 3hdecACP_c[C10H18O2X], 3hhexACP_c[C6H10O2X], 3hoctACP_c[C8H14O2X], 3hoctaACP_c[C18H34O2X], 3hpalmACP_c[C16H30O2X], 3hmrsACP_c[C14H26O2X], 3oddecACP_c[C12H20O2X], 3odecACP_c[C10H16O2X], 3ohexACP_c[C6H8O2X], 3opalmACP_c[C16H28O2X], 3ooctACP_c[C8H12O2X], 3ooctdACP_c[C18H32O2X], 3omrsACP_c[C14H24O2X], actACP_c[C4H4O2X], ACP_c[X], butACP_c[C4H6OX], cpocdacp_c[C19H34OX], ddcaACP_c[C12H22OX], dcaACP_c[C10H18OX], palmACP_c[C16H30OX], hexACP_c[C6H10OX], malACP_c[C3HO3X], ocdcaACP_c[C18H34OX], ocACP_c[C8H14OX], myrsACP_c[C14H26OX], trdox_c[X], trdrd_c[XH2], 3hcmrs7eACP_c[C14H25O2X], t3c7mrseACP_c[C14H23OX], tdeACP_c[C14H25OX]\n",
      "2 : R : alatrna_c[C18H28NO17P2R3], apoACP_c[RHO], argtrna_c[C21H36N4O17P2R3], asntrna_c[C19H29N2O18P2R3], asptrna_c[C19H27NO19P2R3], cystrna_c[C18H28NO17P2SR3], glntrna_c[C20H31N2O18P2R3], glutrna_c[C20H29NO19P2R3], glytrna_c[C17H26NO17P2R3], histrna_c[C21H30N3O17P2R3], iletrna_c[C21H34NO17P2R3], leutrna_c[C21H34O17P2R3N], lystrna_c[C21H36N2O17P2R3], mettrna_c[C20H32NO17P2SR3], phetrna_c[C24H32NO17P2R3], protrna_c[C20H30NO17P2R3], sertrna_c[C18H28NO18P2R3], thrtrna_c[C19H30NO18P2R3], trnaala_c[C15H23O16P2R3], trnaarg_c[C15H23O16P2R3], trnaasn_c[C15H23O16P2R3], trnaasp_c[C15H23O16P2R3], trnacys_c[C15H23O16P2R3], trnaglu_c[C15H23O16P2R3], trnagly_c[C15H23O16P2R3], trnahis_c[C15H23O16P2R3], trnaile_c[C15H23O16P2R3], trnaleu_c[C15H23O16P2R3], trnalys_c[C15H23O16P2R3], trnamet_c[C15H23O16P2R3], trnaphe_c[C15H23O16P2R3], trnapro_c[C15H23O16P2R3], trnaser_c[C15H23O16P2R3], trnathr_c[C15H23O16P2R3], trnatrp_c[C15H23O16P2R3], trnatyr_c[C15H23O16P2R3], trnaval_c[C15H23O16P2R3], trptrna_c[C26H33N2O17P2R3], tyrtrna_c[C24H32NO18P2R3], valtrna_c[C20H32NO17P2R3]\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_artificial_atoms(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd441d1c-5fd7-4ee2-a089-cb8b39eef157",
   "metadata": {},
   "source": [
    "Going on, we check that each metabolite in the GSMM has a chemical **formula** and a **charge**. We use two Gempipe functions, [gempipe.check_missing_formulas](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.check_missing_formulas) and [gempipe.check_missing_charges](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.check_missing_charges), respectively. We aim at having all metabolites fully defined. Two metabolites without a defined chemical formula pop out: `MCOOH_c` is the small subunit of the molybdopterin (MPT) synhtese, while `MCOSH_c` is its sulfurylated form. We are going to fix them in the next few cells. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "f2dd4f35-4e12-4b87-aa71-269b449c76c8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : MCOOH_c\n",
      "2 : MCOSH_c\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_missing_formulas(panmodel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "b48bafc8-1194-4e88-b694-28045ebdbb77",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No metabolite with missing charge attribute found.\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_missing_charges(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e61b3d9-6f88-4141-9a43-226b62ec4d78",
   "metadata": {},
   "source": [
    "But first, let's have an overview of which reactions are currently **unbalanced** in mass or charge. This is a crucial part of the sanity check, as unbalances may alter predictions. Keep in mind that, during `gempipe recon`, the user can provide a text file of manual corrections (`-m`/`--mancor`) to be applied during the phase of reference expansion (see [Part 1](part_1_gempipe_recon.ipynb)). These corrections apply to the reactions coming from the reference-free reconstruction to be inserted into the reference GSMM. This means that reactions contained in the reference are assumed as good, and are not affected by `gempipe recon`. Anyway, as we see below, the reference may still have some imbalances, which we therefore find again in the reconstructed pan-GSMM: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "61d263f1-885f-45d0-9c40-99912043b371",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : ACPS1 : apoACP_c + coa_c --> ACP_c + h_c + pap_c : {'X': 1.0}\n",
      "2 : THZPSN2 : MCOSH_c + dxyl5p_c + gly_c + nadp_c --> 4mpetz_c + MCOOH_c + co2_c + 2.0 h2o_c + 2.0 h_c + nadph_c : {'H': 1.0, 'S': 1.0}\n",
      "3 : MPTS_LPL : 2.0 MCOSH_c + cpmp_c + h2o_c --> 2.0 MCOOH_c + 3.0 h_c + mpt_c : {'H': 2.0, 'S': 2.0}\n",
      "4 : MOADCST : MCOOH_c + atp_c + cys__L_c + h2o_c --> MCOSH_c + amp_c + ppi_c + ser__L_c : {'O': 1.0}\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_mass_unbalances(panmodel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "dbd79203-3498-4ba9-97d5-7aa56900ac5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 : ACPS1 : apoACP_c + coa_c --> ACP_c + h_c + pap_c : {'charge': 1.0}\n",
      "2 : LTAS_LPL : 0.01 dgdag_LPL_c + 0.25 pg_LPL_c --> 0.01 LTA_LPL_c + 0.25 dag_LPL_c : {'charge': 25.0}\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_charge_unbalances(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36680565-f094-427d-833e-fd02aa4c1373",
   "metadata": {},
   "source": [
    "In the next cell, we are going to set some chemical formulas and charges in order to: (1) put a formula where was lacking (MTP subunit, see above), and (2) solve all the unbalances of the model. Doing so, as we see below, the resulting pan-GSMM no longer contains unbalanced reactions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "c4ca9c73-44e2-4102-87c1-8db8ed20cb8d",
   "metadata": {},
   "outputs": [],
   "source": [
    "panmodel.metabolites.get_by_id(\"MCOSH_c\").formula = 'XSH'\n",
    "panmodel.metabolites.get_by_id(\"MCOOH_c\").formula = 'XO'\n",
    "\n",
    "panmodel.metabolites.get_by_id(\"apoACP_c\").formula = 'X'\n",
    "panmodel.metabolites.get_by_id(\"apoACP_c\").charge = 1\n",
    "\n",
    "panmodel.metabolites.get_by_id(\"ACP_c\").formula = 'X'\n",
    "panmodel.metabolites.get_by_id(\"ACP_c\").charge = 0\n",
    "\n",
    "panmodel.metabolites.get_by_id('LTA_LPL_c').charge = -2500\n",
    "panmodel.metabolites.get_by_id('LTAglc_LPL_c').charge = -2500"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "6cec38f6-2b11-4919-a260-c5fda89d5b29",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No mass-unbalanced reactions found.\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_mass_unbalances(panmodel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "b39afba4-bfc2-4554-acd8-f3571795e9cd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No charge-unbalanced reactions found.\n"
     ]
    }
   ],
   "source": [
    "_ = gempipe.check_charge_unbalances(panmodel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d786af7-dc66-46a8-aea4-a4071cc221a8",
   "metadata": {},
   "source": [
    "Now we proceed checking the presence of eventual **EGCs** (energy-generating cycles). An EGC is defined as a set of reactions producing an energy-containing molecule in absence of any external input. EGCs are **artifacts** of reconstruction pipelines, and must be solved to prevent wrong predictions. For example, if the EGC for ATP is present, an unrealistic biomass could be predicted via FBA, due to the fact that the cell has more ATP to spend than the pool physiologically available. \n",
    "\n",
    "Before starting, to have a reference, we compute the current maximal thoeretical biomass yield (g/L). To do so, we have to switch the environment definition from experimental rates (mmol/gDW/h) to medium concentrations (mmol/L). Therefore, we set the exchange reactions to the CDM concentrations as defined in [Teusink2006](https://doi.org/10.1074/jbc.M606263200). The biomass results as 4.07 g/L, souspiciously high considering that the original publication reports ~1 g/L."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "60640ee3-180a-4343-bd95-ebf2c2bdf0ad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4.071587468671884"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# define the growth conditions:\n",
    "def apply_medium(model):\n",
    "    gempipe.reset_growth_env(model)\n",
    "    gempipe.set_bounded_uptakes(model, {'EX_glc__D_e': 25, 'EX_ala__L_e': 2.693905040, 'EX_arg__L_e': 0.717566016, 'EX_asp__L_e': 3.155451042, 'EX_cys__L_e': 1.073014510, 'EX_glu__L_e': 3.398355196, 'EX_gly_e': 2.331157586, 'EX_his__L_e': 0.966806316, 'EX_ile__L_e': 1.600853789, 'EX_leu__L_e': 3.621116829, 'EX_lys__L_e': 3.009781791, 'EX_met__L_e': 0.837756689, 'EX_phe__L_e': 1.664749682, 'EX_pro__L_e': 5.862937549, 'EX_ser__L_e': 3.235229749, 'EX_thr__L_e': 1.888851578, 'EX_trp__L_e': 0.244823213, 'EX_tyr__L_e': 1.379767095, 'EX_val__L_e': 2.774268447, 'EX_pnto__R_e': 0.004196919, 'EX_btn_e': 0.010232901, 'EX_nac_e': 0.008122817, 'EX_4abz_e': 0.072918186, 'EX_pydam_e': 0.020737423, 'EX_pydxn_e': 0.009725734, 'EX_ribflv_e': 0.002657031, 'EX_thm_e': 0.002964984, 'EX_fol_e': 0.002265519, 'EX_ade_e': 0.074002812, 'EX_gua_e': 0.0661682, 'EX_ins_e': 0.018640788, 'EX_orot_e': 0.032031365, 'EX_thymd_e': 0.020641539, 'EX_ura_e': 0.089215616, 'EX_xan_e': 0.065741897, 'EX_pi_e': 42.48157966, 'EX_na1_e': 12.19066195, 'EX_cit_e': 2.652637163, 'EX_nh4_e': 5.318382608, 'EX_mn2_e': 0.134807226}) \n",
    "    gempipe.set_unbounded_exchanges(model, ['EX_h2o_e', 'EX_h_e'])  \n",
    "\n",
    "    \n",
    "# apply medium to the panmodel:\n",
    "apply_medium(panmodel)\n",
    "\n",
    "# simulate biomass production:\n",
    "panmodel.slim_optimize()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0844ee15-0a79-41ba-8666-be47abc62a54",
   "metadata": {},
   "source": [
    "We now start to verify the EGC for ATP. With [gempipe.verify_egc](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/sanity/index.html#gempipe.interface.sanity.verify_egc), we can test the presence of EGCs for a particular molecule. As we can see below, a cycle composed by 5 reactions (`NH3t`, `NH4DISex`, `ATPS3r`, `NH3c`, `NH4t`) is able to produce ATP indefinitely. \n",
    "\n",
    "💡 **Tip!** With the `escher=True` option, the function will produce an [Escher](https://escher.github.io/)-compatible model, containing just the reactions composing the EGC. This way it will be easier to draw the EGC map, supporting the user in the identification of the fix.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "59fd35c2-8e10-4072-aa41-d1ae5deba776",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "atp_c + h2o_c --> adp_c + h_c + pi_c\n",
      "333.3333333333333 : optimal\n",
      "\n",
      "NH3t        1000.000000\n",
      "NH4DISex    1000.000000\n",
      "ATPS3r       333.333333\n",
      "NH3c        1000.000000\n",
      "NH4t       -1000.000000\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='atp')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b55ffcb2-7f24-49a1-bab6-c4b5eebd10c5",
   "metadata": {},
   "source": [
    "After drawing the EGC on a [Escher map](https://escher.github.io/), it's easy to see that the reaction `NH4t`, originally not present in the reference but acquired during the reference expansion, is generating a cycle producing an additional extracellular proton that is used by the ATP-synthase to generate ATP. In the reference model, the transport of ammonium (NH4+) was encoded as transport of ammonia (NH3), after proton dissociation in the extracellular compartment. Therefore, we decide to remove the `NH4t` reaction. Doing so, as we can see, the EGC disappears, and the biomass **deflates** down to 1.5 g/L. \n",
    "\n",
    "📌 **Note!** This pan-GSMM is currently not constrained for representing the **lactic fermentation**. Switching off the acetate branch will result in a even more deflated biomass yield, comparable to the experimental value. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "b30ba53e-ba31-406b-a002-5bfd733f4d65",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "atp_c + h2o_c --> adp_c + h_c + pi_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "# remove NH4t reaction:\n",
    "panmodel.remove_reactions([panmodel.reactions.get_by_id('NH4t')])\n",
    "\n",
    "# verify the EGC again:\n",
    "gempipe.verify_egc(panmodel, mid='atp')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "137e550d-bc9b-4092-b2a8-83e831058f29",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.519447465904365"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# simulate biomass production again:\n",
    "panmodel.slim_optimize()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6425638-ad14-4773-974b-401aa71cba98",
   "metadata": {},
   "source": [
    "Now that the ATP EGC is solved, we check all the others. As we can see, no other EGC is detected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "33f5b098-b771-40c6-8d03-b0644383d349",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ctp_c + h2o_c --> cdp_c + h_c + pi_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='ctp')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "e7df5422-350e-4079-a61c-6b4a6681c256",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gtp_c + h2o_c --> gdp_c + h_c + pi_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='gtp')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "167f16c5-3070-41e5-9f91-3a683d67ea38",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "h2o_c + utp_c --> h_c + pi_c + udp_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='utp')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "768b5197-921d-4fca-a2df-87b76fe7f06e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "h2o_c + itp_c --> h_c + idp_c + pi_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='itp')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "3f14b874-f521-4d56-8716-dec19063949d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "nadh_c --> h_c + nad_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='nadh')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "9a5b5c7f-71be-43ab-90fc-ec07b26a1157",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "nadph_c --> h_c + nadp_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='nadph')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "2c44eea2-9cb6-45c4-bf80-582abd755578",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fadh2_c --> fad_c + 2.0 h_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='fadh2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "76f6c76e-4c52-4085-8dc0-95ba48c02b16",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accoa_c + h2o_c --> ac_c + coa_c + h_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='accoa')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "7d02a6e2-ec7e-47fa-b3aa-25f0cafc0721",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "glu__L_c + h2o_c --> akg_c + 2.0 h_c + nh4_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='glu__L')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "d6e169d3-08c8-41c7-9004-bde3efe571be",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "q8h2_c --> 2.0 h_c + q8_c\n",
      "0 : optimal\n"
     ]
    }
   ],
   "source": [
    "gempipe.verify_egc(panmodel, mid='q8h2')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c84202e8-9f1f-4f00-a341-bce0ae60f4d0",
   "metadata": {},
   "source": [
    "We end here the sanity check, so it's time the **save** the final pan-GSMM using the appropriate `cobrapy` function. This way we can continue deriving strain-specific GSMMs, starting from this pan-GSMM and the associated PAM (in this case we didn't perform any gapfilling).\n",
    "\n",
    "Keep in mind that [`gempipe derive`](part_3_gempipe_derive.ipynb) will derive GSMMs that have (1) the same **objective** of the pan-GSMM, and (2) the same reactions **bounds** of the pan-GSMM. It gives the possibility to indicate one (or more) growth media on which to guarantee the biomass formation, which are know or assumed to sustain the growth of all the input strains."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "e29e8fcd-5444-4425-8194-521ac1291b8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cobra\n",
    "\n",
    "cobra.io.save_json_model(panmodel, 'tutoring_materials/plantarum/panmodel.json')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e1a5ff7-21b7-4132-84fa-8ad223f55ccb",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}