{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "77791dc4-03d4-4611-ac14-9f5c3c195883", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%aimport gempipe, gempipe.interface, gempipe.interface.sanity, gempipe.interface.gaps, gempipe.interface.medium\n", "%autoreload 1" ] }, { "cell_type": "markdown", "id": "a53e7c60-ea9c-4298-aef8-22e19f122b2e", "metadata": {}, "source": [ "# _Tutorial:_ gap-filling" ] }, { "cell_type": "markdown", "id": "24e54e8d-1ab5-42f7-8921-5081b69bb093", "metadata": {}, "source": [ "We generated a draft pan-GSMM and a PAM (presence-absence matrix) using `gempipe recon`. Taxid [68334](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=68334) is for _Erwinia aphidicola_, the species we want to model in this tutorial. Please note that several different species taxids could have been inputted at the same time, but here we are interested in just one species. \n", "\n", "```bash\n", "gempipe recon -c 8 -s neg -t 68334 -b enterobacterales_odb10 -o docs/tutoring_materials/aphidicola\n", "```" ] }, { "cell_type": "markdown", "id": "e82e5252-f73d-4b5d-8d48-9adbc8474cd2", "metadata": {}, "source": [ "First of all we load the Gempipe library. Then we load the draft pan-GSMM, the PAM and the functional annotation table, using just one function: `gempipe.initialize`. Next we load the corresponding universe on which this reference-free reconstruction was based. " ] }, { "cell_type": "code", "execution_count": 5, "id": "f440e539-702e-4696-96e8-886a8318b6e7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading PAM (tutoring_materials/aphidicola/pam.csv)...\n", "Loading functional annotation table (tutoring_materials/aphidicola/annotation.csv)...\n", "Loading report table (tutoring_materials/aphidicola/report.csv)...\n", "Loading draft pan-GSMM (tutoring_materials/aphidicola/draft_panmodel.json)...\n" ] } ], "source": [ "import gempipe\n", "\n", "# initialize gempipe on the 'gempipe recon' --outdir:\n", "panmodel = gempipe.initialize(\"tutoring_materials/aphidicola\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "845bdb33-44fa-429a-97d6-917532b626e8", "metadata": {}, "outputs": [], "source": [ "# grab the gram negative universe:\n", "universe = gempipe.get_universe('neg')" ] }, { "cell_type": "markdown", "id": "a559ca47-163d-4700-a622-601476391c66", "metadata": {}, "source": [ "Since we want to check the biomass production for this free-living species, we have to be sure that `Growth` is the reaction ID set as the current **objective**. Then we set the growth medium reflecting the concentrations of an old chemically defined medium (CDM) recipe for _Erwinia_, taken from [Grula 1960](https://doi.org/10.1128/jb.80.3.375-385.1960). As we can see, the biomass production is 0 on this medium, which may be due to a number of things:\n", "\n", "* some metabolic reaction is missing\n", "* some EX_change reaction needs a tuning\n", "* the biomass assembly reaction needs adjustments" ] }, { "cell_type": "code", "execution_count": 7, "id": "66ab1f3a-ad5e-4aec-903c-aa3856119e26", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Growth']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check which objective was selected:\n", "gempipe.get_objectives(panmodel)" ] }, { "cell_type": "code", "execution_count": 8, "id": "02a5e778-2584-4899-84f9-c0f4649b703a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# define the medium:\n", "def apply_medium(model):\n", " gempipe.reset_growth_env(model)\n", " gempipe.set_bounded_uptakes(model, {'EX_k_e': 29.973072, 'EX_asp__L_e': 21.036340, 'EX_pi_e': 19.983346, 'EX_glc__D_e': 16.652235, 'EX_so4_e': 0.125871, 'EX_mg2_e': 0.121714, 'EX_fe2_e': 0.001275, 'EX_fe3_e': 0.001275, 'EX_nh4_e': 0.001275, 'EX_ca2_e': 0.000999, 'EX_zn2_e': 0.000174, 'EX_mn2_e': 0.000118, 'EX_cu2_e': 0.000040})\n", " gempipe.set_unbounded_exchanges(model, ['EX_h2o_e', 'EX_h_e', 'EX_o2_e'])\n", "\n", " \n", "# apply medium to the panmodel:\n", "apply_medium(panmodel)\n", "\n", "# simulate biomass production:\n", "panmodel.slim_optimize()" ] }, { "cell_type": "markdown", "id": "6150eb40-3815-45b2-8c5f-438ef91dbcc5", "metadata": {}, "source": [ "Now we have to be shure that the universe, under the same exact conditions, grows. Otherwise, our pan-GSMM will be impossible to gap-fill. Below we can see that even the universe cannot grow. Indeed, using [gempipe.check_reactants](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/gaps/index.html#gempipe.interface.gaps.check_reactants) we can see 2 blocked biomass precursors: chloride and cobalt, trace element present in the **generic** biomass definition that we are using. " ] }, { "cell_type": "code", "execution_count": 9, "id": "bc02eb50-d0de-40b7-8191-696d017779ee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check if the universe\n", "apply_medium(universe)\n", "universe.slim_optimize()" ] }, { "cell_type": "code", "execution_count": 10, "id": "4c4e05c4-e5fc-4365-80a9-9e2288d23be6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 : 0.0 : optimal : cl_c : Chloride\n", "2 : 0.0 : optimal : cobalt2_c : Co2+\n" ] } ], "source": [ "# check blocked biomass precursors:\n", "_ = gempipe.check_reactants(universe, 'Growth')" ] }, { "cell_type": "markdown", "id": "1292f7c3-7175-4cbe-a50b-751a31263155", "metadata": {}, "source": [ "Instead of removing them from the biomass definition, it's quicker to open additional exchanges. With [gempipe.sensitivity_analysis](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/gaps/index.html#gempipe.interface.gaps.sensitivity_analysis), we can easily see which EX_change reactions to open: `EX_cl_e` and `EX_cobalt2_e`." ] }, { "cell_type": "code", "execution_count": 11, "id": "91ae9939-432d-4b85-9bc3-a26f130d7c8c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'EX_12dgr160_e': 0.0,\n", " 'EX_12dgr180_e': 0.0,\n", " 'EX_12ppd__R_e': 0.0,\n", " 'EX_xylu__L_e': 0.0,\n", " 'EX_zn2_e': 0.0,\n", " 'EX_cl_e': -2.0}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check which EX_change reactions can provide choloride:\n", "gempipe.sensitivity_analysis(universe, mid='cl_c')" ] }, { "cell_type": "code", "execution_count": 12, "id": "0f0eb230-225a-4141-a49c-058ae58a6e62", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'EX_12dgr160_e': 0.0,\n", " 'EX_12dgr180_e': 0.0,\n", " 'EX_12ppd__R_e': 0.0,\n", " 'EX_xylu__L_e': 0.0,\n", " 'EX_zn2_e': 0.0,\n", " 'EX_cobalt2_e': -2.0}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check which EX_change reactions can provide colbalt:\n", "gempipe.sensitivity_analysis(universe, mid='cobalt2_c')" ] }, { "cell_type": "markdown", "id": "b4e058b7-f020-45ce-9e81-5f398b9a0a0e", "metadata": {}, "source": [ "This way, we can redefine our in-silico medium, adding this 2 EX_change reactions. Now the universe grows, and we can use it to gap-fill our draft pan-GSMM. " ] }, { "cell_type": "code", "execution_count": 13, "id": "60f20561-59bb-4115-a0cf-c26d3b6462ba", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.056417489421720736" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# re-define the medium:\n", "def apply_medium(model):\n", " gempipe.reset_growth_env(model)\n", " gempipe.set_bounded_uptakes(model, {'EX_k_e': 29.973072, 'EX_asp__L_e': 21.036340, 'EX_pi_e': 19.983346, 'EX_glc__D_e': 16.652235, 'EX_so4_e': 0.125871, 'EX_mg2_e': 0.121714, 'EX_fe2_e': 0.001275, 'EX_fe3_e': 0.001275, 'EX_nh4_e': 0.001275, 'EX_ca2_e': 0.000999, 'EX_zn2_e': 0.000174, 'EX_mn2_e': 0.000118, 'EX_cu2_e': 0.000040})\n", " gempipe.set_unbounded_exchanges(model, ['EX_h2o_e', 'EX_h_e', 'EX_o2_e'])\n", " gempipe.set_unbounded_exchanges(model, ['EX_cl_e', 'EX_cobalt2_e']) \n", "\n", " \n", "# apply new in-silico medium to the universe:\n", "apply_medium(universe)\n", "universe.slim_optimize()" ] }, { "cell_type": "markdown", "id": "75a455cd-49c5-4f42-9b43-b0841a708dce", "metadata": {}, "source": [ "Since the draft pan-GSMM growth is still 0, we check which biomass precursors are blocked:" ] }, { "cell_type": "code", "execution_count": 14, "id": "4f639bc8-3ba5-49d0-895d-262665f7b753", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# pan-GSMM still can't grow:\n", "apply_medium(panmodel)\n", "panmodel.slim_optimize()" ] }, { "cell_type": "code", "execution_count": 15, "id": "4024d20f-f85d-4bf5-91ae-0f49aa8f70fc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 : 0.0 : optimal : cu2_c : Copper\n", "2 : 0.0 : optimal : fe3_c : Iron (Fe3+)\n", "3 : 0.0 : optimal : kdo2lipid4_p : KDO(2)-lipid IV(A)\n", "4 : 0.0 : optimal : pe160_c : Phosphatidylethanolamine (dihexadecanoyl, n-C16:0)\n", "5 : 0.0 : optimal : pe160_p : Phosphatidylethanolamine (dihexadecanoyl, n-C16:0)\n", "6 : 0.0 : optimal : pe161_c : Phosphatidylethanolamine (dihexadec-9enoyl, n-C16:1)\n", "7 : 0.0 : optimal : pe161_p : Phosphatidylethanolamine (dihexadec-9enoyl, n-C16:1)\n", "8 : 0.0 : optimal : thmpp_c : Thiamine diphosphate\n" ] } ], "source": [ "# check blocked biomass precursors:\n", "_ = gempipe.check_reactants(panmodel, 'Growth')" ] }, { "cell_type": "markdown", "id": "07fbbc27-5a75-4fb1-a74e-f2ffc11f5087", "metadata": {}, "source": [ "We choose to start from the phosphatidylethanolamine 16:0, to see which reactions are missing. With [gempipe.perform_gapfilling](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/gaps/index.html#gempipe.interface.gaps.perform_gapfilling) it's possible to focus on the biosythesis of a particular metabolite simply by specifying it's ID.\n", "\n", "💡 **Tip!** The time requested to solve optimization problems varies with the utilised solver. Gapfilling is an optimization problem, and its computation may run for an unacceptably long time. In these cases, we suggest to try a commercial solver, like for example [CPLEX](https://en.wikipedia.org/wiki/CPLEX), which is usually faster then the default [GLPK](https://en.wikipedia.org/wiki/GNU_Linear_Programming_Kit). Once installed, Gempipe will automatically switch to CPLEX as default solver." ] }, { "cell_type": "code", "execution_count": 16, "id": "467d873d-188c-4087-be20-aee040d51cbe", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Solution 1. Reactions to add: 1.\n", "1 KAS8 B-ketoacyl synthetase (palmitate, n-C16:0)\n", "\n", "Solution 2. Reactions to add: 1.\n", "1 KAS7 B-ketoacyl synthetase (n-C16:1)\n", "\n", "Solution 3. Reactions to add: 1.\n", "1 KAS17 B-ketoacyl synthetase (n-C18:1)\n", "\n", "Solution 4. Reactions to add: 1.\n", "1 KAS13 B-ketoacyl synthetase (octadecanoate)\n", "\n", "Solution 5. Reactions to add: 1.\n", "1 KAS13 B-ketoacyl synthetase (octadecanoate)\n" ] } ], "source": [ "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.1, mid='pe160_c', nsol=5)" ] }, { "cell_type": "markdown", "id": "9ff92bb5-c8c7-4083-8d13-75719b321675", "metadata": {}, "source": [ "Mapping these reactions on an [Escher map](https://escher.github.io/), seems like the fatty acid biosynthesis pathway is missing. This is not possible for a free-living species, which has to be self-sufficient in the biosynthesis of all its biomass precursors. Therefore, we take a look at the corrisponding [KEGG module (M00083)](https://www.genome.jp/module/M00083), to see if the corresponding KEGG Orthologs (KO) are present in this species. We use the [gempipe.query_pam](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/gaps/index.html#gempipe.interface.gaps.query_pam) function to search for metabolic genes still missing from the model. This function utilizes the PAM and the functional annotation produced by `gempipe recon`, which were automatically loaded with the `gempipe.initialize` function called at the beginning. \n", "\n", "💡 **Tip!** With [gempipe.query_pam](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/gaps/index.html#gempipe.interface.gaps.query_pam), genes can be searched via KO code, EC code, gene name, function description, and more. For each parameter it is possible to specify, instead of a single value, a **list** of values, for example `ko=['K00209', 'K10780', 'K02371']`. This way, more PAM rows may be obtained." ] }, { "cell_type": "code", "execution_count": 17, "id": "bbfbef94-19a8-4b5e-bbc7-4366f5134808", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Erwinia aphidicola GCA_014773485.1Erwinia aphidicola GCA_024169515.1Erwinia aphidicola GCA_918698235.1
Cluster_1282OEIEKCCN_02320OCILAMFM_00215LBGCNHPF_01597
\n", "
" ], "text/plain": [ " Erwinia aphidicola GCA_014773485.1 \n", "Cluster_1282 OEIEKCCN_02320 \\\n", "\n", " Erwinia aphidicola GCA_024169515.1 \n", "Cluster_1282 OCILAMFM_00215 \\\n", "\n", " Erwinia aphidicola GCA_918698235.1 \n", "Cluster_1282 LBGCNHPF_01597 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.query_pam(ko='K00647', name='fabB')" ] }, { "cell_type": "code", "execution_count": 18, "id": "872931d7-14ec-40be-a4a9-9fa043ee25fe", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Erwinia aphidicola GCA_014773485.1Erwinia aphidicola GCA_024169515.1Erwinia aphidicola GCA_918698235.1
Cluster_3003OEIEKCCN_00814OCILAMFM_03222LBGCNHPF_01227
Cluster_2990OEIEKCCN_02727OCILAMFM_00984LBGCNHPF_02154
\n", "
" ], "text/plain": [ " Erwinia aphidicola GCA_014773485.1 \n", "Cluster_3003 OEIEKCCN_00814 \\\n", "Cluster_2990 OEIEKCCN_02727 \n", "\n", " Erwinia aphidicola GCA_024169515.1 \n", "Cluster_3003 OCILAMFM_03222 \\\n", "Cluster_2990 OCILAMFM_00984 \n", "\n", " Erwinia aphidicola GCA_918698235.1 \n", "Cluster_3003 LBGCNHPF_01227 \n", "Cluster_2990 LBGCNHPF_02154 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.query_pam(ko='K00059', name='fabG')" ] }, { "cell_type": "code", "execution_count": 19, "id": "bf5bc291-70b9-41d4-9d9e-e842e23991d9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Erwinia aphidicola GCA_014773485.1Erwinia aphidicola GCA_024169515.1Erwinia aphidicola GCA_918698235.1
Cluster_3940OEIEKCCN_00708OCILAMFM_03060LBGCNHPF_01333
\n", "
" ], "text/plain": [ " Erwinia aphidicola GCA_014773485.1 \n", "Cluster_3940 OEIEKCCN_00708 \\\n", "\n", " Erwinia aphidicola GCA_024169515.1 \n", "Cluster_3940 OCILAMFM_03060 \\\n", "\n", " Erwinia aphidicola GCA_918698235.1 \n", "Cluster_3940 LBGCNHPF_01333 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.query_pam(ko='K01716', name='fabA')" ] }, { "cell_type": "code", "execution_count": 20, "id": "a0d62113-b557-4ef7-a55b-530e793179f4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Erwinia aphidicola GCA_014773485.1Erwinia aphidicola GCA_024169515.1Erwinia aphidicola GCA_918698235.1
Cluster_1346OEIEKCCN_01394;OEIEKCCN_03959OCILAMFM_03809LBGCNHPF_00219
\n", "
" ], "text/plain": [ " Erwinia aphidicola GCA_014773485.1 \n", "Cluster_1346 OEIEKCCN_01394;OEIEKCCN_03959 \\\n", "\n", " Erwinia aphidicola GCA_024169515.1 \n", "Cluster_1346 OCILAMFM_03809 \\\n", "\n", " Erwinia aphidicola GCA_918698235.1 \n", "Cluster_1346 LBGCNHPF_00219 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.query_pam(ko='K00209', name='fabV')" ] }, { "cell_type": "markdown", "id": "cfd006fd-6d06-4e38-9d0a-a33d6cf57b57", "metadata": {}, "source": [ "Several alternative KOs can be covered for the same metabolic function (for example fabL `K10780` and fabV `K00209`). Probably, the BiGG genes database is **missing** good representatives for these KOs. Indeed, the BiGG database is **small** and **biased** towards model organisms, so it's understandable that some metabolic genes are lost during the alignment, despite relaxing the `--indentity` and `--coverage` thresholds. As expected, _E. aphidicola_ has all the needed genes to include the KAS / FAS (keto-acyl synthase / fatty-acid synthase) series of reactions. Below we choose to add the FAS series:" ] }, { "cell_type": "code", "execution_count": 21, "id": "fcc44753-02d5-499c-be0f-e2a6cdf47a02", "metadata": {}, "outputs": [], "source": [ "# define a GPR for the FAS family of reactions: \n", "fabB = 'Cluster_1282'\n", "fabG = 'Cluster_2990 or Cluster_3003'\n", "fabA = 'Cluster_3940'\n", "fabV = 'Cluster_1346' \n", "\n", "gpr = f'{fabB} and ({fabG}) and {fabA} and {fabV}'\n", "\n", "# copy new reactions from the universe:\n", "gempipe.import_from_universe(panmodel, universe, 'FAS80_L', gpr=gpr) \n", "gempipe.import_from_universe(panmodel, universe, 'FAS100', gpr=gpr) \n", "gempipe.import_from_universe(panmodel, universe, 'FAS120', gpr=gpr) \n", "gempipe.import_from_universe(panmodel, universe, 'FAS140', gpr=gpr) \n", "gempipe.import_from_universe(panmodel, universe, 'FAS160', gpr=gpr) \n", "gempipe.import_from_universe(panmodel, universe, 'FAS180', gpr=gpr) " ] }, { "cell_type": "markdown", "id": "85025620-a787-46cd-a86f-f6aa74300023", "metadata": {}, "source": [ "Checking biomass precursors again, we see we just have **effectively gap-filled** for the phosphatidylethanolamine 16:0:" ] }, { "cell_type": "code", "execution_count": 22, "id": "c9086b24-053a-41ed-9140-78900c70c3ef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 : 0.0 : optimal : cu2_c : Copper\n", "2 : 0.0 : optimal : fe3_c : Iron (Fe3+)\n", "3 : 0.0 : optimal : kdo2lipid4_p : KDO(2)-lipid IV(A)\n", "4 : 0.0 : optimal : pe161_c : Phosphatidylethanolamine (dihexadec-9enoyl, n-C16:1)\n", "5 : 0.0 : optimal : pe161_p : Phosphatidylethanolamine (dihexadec-9enoyl, n-C16:1)\n", "6 : 0.0 : optimal : thmpp_c : Thiamine diphosphate\n" ] } ], "source": [ "# check blocked biomass precursors:\n", "_ = gempipe.check_reactants(panmodel, 'Growth')" ] }, { "cell_type": "markdown", "id": "79f03acc-b941-4baf-9095-acfa53b34c75", "metadata": {}, "source": [ "Let's now continue with the next blocked biomass precursor: `pe161_c`, another phosphatidylethanolamine, but this time with a double bond derived from a desaturase activity, as one could see opening an [Escher map](https://escher.github.io/). Gap-filling for the biosyntheisis of this precursor we obtain `DESATPE160` as the missing desaturase reaction:" ] }, { "cell_type": "code", "execution_count": 23, "id": "157d622a-f49d-4251-9b66-51f03e894041", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Solution 1. Reactions to add: 1.\n", "1 DESATPE160 PE160 desaturase pe C160 pe C161d9\n", "\n", "Solution 2. Reactions to add: 1.\n", "1 DESATPE160 PE160 desaturase pe C160 pe C161d9\n", "\n", "Solution 3. Reactions to add: 1.\n", "1 DESATPE160 PE160 desaturase pe C160 pe C161d9\n", "\n", "Solution 4. Reactions to add: 3.\n", "1 12DGR161tipp 1,2 diacylglycerol transport via flipping (periplasm to cytoplasm, n-C16:1)\n", "2 CLPNH161pp Cardiolipin hydrolase (periplasm, n-C16:1)\n", "3 DESATPG160 PG160 desaturase pg C160 pg C161d9\n", "\n", "Solution 5. Reactions to add: 1.\n", "1 DESATPE160 PE160 desaturase pe C160 pe C161d9\n" ] } ], "source": [ "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.1, mid='pe161_c', nsol=5)" ] }, { "cell_type": "markdown", "id": "5c17ee70-61f9-4a57-8c41-25a406ebff1d", "metadata": {}, "source": [ "This is a little more tricky. [Searching this reaction](http://bigg.ucsd.edu/universal/reactions/DESATPE160) on the BiGG database v1.6, we see it derives from **just 1** model: `iJN1463`, for Pseudomonas putida KT2440. Searching its associated [gene `PP_0217` in KEGG](https://www.genome.jp/entry/ppu:PP_0217), we see it isn't associated with any KO code. Therefore, we try to search [`PP_0217` in EggNOG](http://eggnog6.embl.de/search/seqid/160488.PP_0217/). Under the class Gammaproteobacteria, the same of _Erwinia_, two KO codes are suggested: K00507 and K23054. Anyway, these orthologs seems not to appear in our organism:" ] }, { "cell_type": "code", "execution_count": 24, "id": "4d82cb2b-3d52-4d38-9f67-07d71d6aa2af", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Erwinia aphidicola GCA_014773485.1Erwinia aphidicola GCA_024169515.1Erwinia aphidicola GCA_918698235.1
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Erwinia aphidicola GCA_014773485.1, Erwinia aphidicola GCA_024169515.1, Erwinia aphidicola GCA_918698235.1]\n", "Index: []" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.query_pam(ko=['K00507', 'K23054'])" ] }, { "cell_type": "markdown", "id": "c148e1fc-d291-4a5d-b80a-fd137ef83803", "metadata": {}, "source": [ "Drawing the phosphatidylethanolamine pathway on an [Escher map](https://escher.github.io/), we see it is complate apart from this desaturase. Therefore, there must be an alternative way to get to this metabolite, or it shuldn't appear in the biomass definition. Given that we do not dispose phenotipic wet-lab data on membrane composition for our species, we decide to include this reaction without specifying a GPR. Doing so, as expected, all the ramaining phosphatidylethanolamine blocks disappear:" ] }, { "cell_type": "code", "execution_count": 25, "id": "51a7cfbd-b562-48ff-8b2a-6dd4f68416d3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 : 0.0 : optimal : cu2_c : Copper\n", "2 : 0.0 : optimal : fe3_c : Iron (Fe3+)\n", "3 : 0.0 : optimal : kdo2lipid4_p : KDO(2)-lipid IV(A)\n", "4 : 0.0 : optimal : thmpp_c : Thiamine diphosphate\n" ] } ], "source": [ "gempipe.import_from_universe(panmodel, universe, 'DESATPE160') \n", "\n", "_ = gempipe.check_reactants(panmodel, 'Growth')" ] }, { "cell_type": "markdown", "id": "37738469-9ac7-4b0e-a1e1-82d046b5c5b5", "metadata": {}, "source": [ "We continue the gap-filling with the thiamin biosynthesis. On [Escher](https://escher.github.io/), drawing the thiamine pathway for the gram negative universe, we see that the metabolite 4-hydroxy-benzyl alcohol (`4hba_c`) is a product of the thiazole phosphate synthesis (`THZPSN`), intermediate reaction of the thiamin biosynthetic pathway. Since `4hba_c` must be consumed in some way, modelers have encoded several alternatives. One of them is simply a demand reaction to let `4hba_c` leave the system. Another one, more biologically meaningful, is to translocate it via a transporter `4HBAt` and an associated EX_change reaction:" ] }, { "cell_type": "code", "execution_count": 26, "id": "6c5d02eb-1537-4d5f-99af-036da012d1f8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Solution 1. Reactions to add: 1.\n", "1 sink_4hba_c R_sink_4hba_c\n", "\n", "Solution 2. Reactions to add: 1.\n", "1 sink_4hba_c R_sink_4hba_c\n", "\n", "Solution 3. Reactions to add: 2.\n", "1 4HBAt MNXR68734\n", "2 EX_4hba_e R_EX_4hba_e\n", "\n", "Solution 4. Reactions to add: 1.\n", "1 sink_4hba_c R_sink_4hba_c\n", "\n", "Solution 5. Reactions to add: 3.\n", "1 4HBADH 4 hydroxy benzyl alcohol dehydrogenase\n", "2 VNDH_2 4-hydroxybenzaldehyde dehydrogenase\n", "3 sink_2ohph_c R_sink_2ohph_c\n" ] } ], "source": [ "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.1, mid='thmpp_c', nsol=5)" ] }, { "cell_type": "markdown", "id": "f6aa4c7b-6e9a-4035-a223-2611d59643c2", "metadata": {}, "source": [ "Searching in **literature**, we understand that `pcaK` should be **permease** for `4hba_c` (see [Nichols and Harwood, 1997](https://doi.org/10.1128/jb.179.16.5056-5061.1997)). Since we find a cluster annotate as pcaK, we include the transporter and the associated EX_change reaction in the GSMM. To let the GSMM be a little bit more predictive, we decide not to constrain the transporter bounds. Checking again the blocked biomass precursors, as expected, `thmpp_c` disappears." ] }, { "cell_type": "code", "execution_count": 27, "id": "5e7bc882-e2ab-4e55-b231-1103698f4fdd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Erwinia aphidicola GCA_014773485.1Erwinia aphidicola GCA_024169515.1Erwinia aphidicola GCA_918698235.1
Cluster_1015OEIEKCCN_01251OCILAMFM_03656LBGCNHPF_00358
\n", "
" ], "text/plain": [ " Erwinia aphidicola GCA_014773485.1 \n", "Cluster_1015 OEIEKCCN_01251 \\\n", "\n", " Erwinia aphidicola GCA_024169515.1 \n", "Cluster_1015 OCILAMFM_03656 \\\n", "\n", " Erwinia aphidicola GCA_918698235.1 \n", "Cluster_1015 LBGCNHPF_00358 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.query_pam(name='pcaK')" ] }, { "cell_type": "code", "execution_count": 28, "id": "96c1c104-bc57-4e88-973f-ea0a172b5962", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 : 0.0 : optimal : cu2_c : Copper\n", "2 : 0.0 : optimal : fe3_c : Iron (Fe3+)\n", "3 : 0.0 : optimal : kdo2lipid4_p : KDO(2)-lipid IV(A)\n" ] } ], "source": [ "gempipe.import_from_universe(panmodel, universe, '4HBAt', bounds=(-1000,1000)) \n", "gempipe.import_from_universe(panmodel, universe, 'EX_4hba_e', bounds=(0,1000)) \n", "\n", "_ = gempipe.check_reactants(panmodel, 'Growth')" ] }, { "cell_type": "markdown", "id": "3fb2bbde-2485-4efe-ac84-36f5dd0ed30c", "metadata": {}, "source": [ "Now it's the turn of the KDO-lipid-IV (`kdo2lipid4_p`), a membrane component. From the gap-filling suggestions, `3HAACOAT140` always appears. " ] }, { "cell_type": "code", "execution_count": 29, "id": "467c738f-f805-4cf1-9cbe-8f12ca924b69", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Solution 1. Reactions to add: 2.\n", "1 3HAACOAT140 3 Hydroxyacyl ACPCoA Transacylase\n", "2 RECOAH6 3 hydroxyacyl Coa dehydratase 3R 3 hydroxytetradecanoyl CoA\n", "\n", "Solution 2. Reactions to add: 2.\n", "1 3HAACOAT140 3 Hydroxyacyl ACPCoA Transacylase\n", "2 RHACOAR140 3R 3 Hydroxyacyl CoANADP oxidoreductase\n", "\n", "Solution 3. Reactions to add: 2.\n", "1 3HAACOAT140 3 Hydroxyacyl ACPCoA Transacylase\n", "2 RHACOAR140 3R 3 Hydroxyacyl CoANADP oxidoreductase\n", "\n", "Solution 4. Reactions to add: 2.\n", "1 3HAACOAT140 3 Hydroxyacyl ACPCoA Transacylase\n", "2 RECOAH6 3 hydroxyacyl Coa dehydratase 3R 3 hydroxytetradecanoyl CoA\n", "\n", "Solution 5. Reactions to add: 2.\n", "1 3HAACOAT140 3 Hydroxyacyl ACPCoA Transacylase\n", "2 RECOAH6 3 hydroxyacyl Coa dehydratase 3R 3 hydroxytetradecanoyl CoA\n" ] } ], "source": [ "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.1, mid='kdo2lipid4_p', nsol=5)" ] }, { "cell_type": "markdown", "id": "a70d59ab-236e-49da-89f9-67fb4fea0783", "metadata": {}, "source": [ "Once again, searching [3HAACOAT140 in BiGG](http://bigg.ucsd.edu/universal/reactions/3HAACOAT140), we see it's associated with just 1 model (`iJN1463`), and just 1 gene: `PP_1408`. On KEGG, [this gene](https://www.genome.jp/entry/ppu:PP_1408) is not annotated with KO or EC codes. Therfore, we [search it](http://eggnog6.embl.de/search/seqid/160488.PP_1408/) on EggNOG, revelaing the code K18100 for Gammaproteobacteria. Unfortunately, we do not find any equivalent in our species:" ] }, { "cell_type": "code", "execution_count": 30, "id": "2e979389-3b64-4688-a4eb-bb506f255e3b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Erwinia aphidicola GCA_014773485.1Erwinia aphidicola GCA_024169515.1Erwinia aphidicola GCA_918698235.1
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Erwinia aphidicola GCA_014773485.1, Erwinia aphidicola GCA_024169515.1, Erwinia aphidicola GCA_918698235.1]\n", "Index: []" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.query_pam(ko=['K18100'])" ] }, { "cell_type": "markdown", "id": "f5c15449-4c6c-4367-b748-1d9b672c1fa2", "metadata": {}, "source": [ "Since all the rest of the KDO-lipid-IV biosynthetic pathway appears complete, we decide to close the gap without GPR. The last two remaining blocks are copper and iron: " ] }, { "cell_type": "code", "execution_count": 31, "id": "2d7c31fd-7971-4436-83c9-cf217f3c4fe0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 : 0.0 : optimal : cu2_c : Copper\n", "2 : 0.0 : optimal : fe3_c : Iron (Fe3+)\n" ] } ], "source": [ "gempipe.import_from_universe(panmodel, universe, '3HAACOAT140') \n", "gempipe.import_from_universe(panmodel, universe, 'RHACOAR140') \n", "\n", "_ = gempipe.check_reactants(panmodel, 'Growth')" ] }, { "cell_type": "markdown", "id": "b9ac96b8-95c5-4cef-a88a-9b5c03575a93", "metadata": {}, "source": [ "We first try to compute gap-filling reactions as we did before. Anyway, this time an _\"infeasible\"_ error appears, meaning that the optimization problem we are asking has **no solution**. The absence of solution does not depend on the reaction content of the universe, because we know the gap-filling reactions are there, ready to be transfered. In this context, the absence of solution depends on the required `minflux` parameter, set to 0.1, **too much** considering the iron content defined in our growth medium: as reported in the paragraph above, it was defined as 0.001275 mmol/L of iron-II plus 0.001275 mmol/L of iron-III. Even considered together, they could never reach the requested minumum 0.1." ] }, { "cell_type": "code", "execution_count": 32, "id": "9fc64d29-7be3-42a7-bd72-0ee42953b6c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ERROR: cobrapy: gap filling optimization failed (infeasible).\n" ] } ], "source": [ "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.1, mid='fe3_c', nsol=5)" ] }, { "cell_type": "markdown", "id": "98ed2a6f-0df1-423c-b666-64316adae6cc", "metadata": {}, "source": [ "Therefore, we lower the `minflux` to an amount compatible with our medium, and try the computation again. This time, other two type of errors could appear: (1) the _\"try lowering the integer threshold\"_ error, or (2) the _\"check original solver status\"_ error. Both these errors were largely discussed in [issue #941](https://github.com/opencobra/cobrapy/issues/941) of the cobrapy package." ] }, { "cell_type": "code", "execution_count": 33, "id": "3415d857-34b2-43ec-b476-987a20a23a6f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ERROR: cobrapy: gap filling optimization failed (check_original_solver_status).\n" ] } ], "source": [ "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.0001, mid='fe3_c', nsol=5)" ] }, { "cell_type": "markdown", "id": "01f6d010-3b3f-4f1e-9120-8f3f86a171c1", "metadata": {}, "source": [ "This time the error is telling us that, simply speaking, there's some difficulties in handling such low flux values. After having tried several different workarounds, we would suggest to apply the following easy trick. First, boost the nutrient input to an unrealistically high value, then ask the gapfilling again reaising up also the minimal flux through the objective (`minflux`). This way, the algorithm has no problems in suggesting us the right gap-filling reactions:" ] }, { "cell_type": "code", "execution_count": 34, "id": "926c2e13-d2f0-406a-9c0b-d5008ab28796", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Solution 1. Reactions to add: 1.\n", "1 FE3t Ferric iron uptake, plasma membrane\n", "\n", "Solution 2. Reactions to add: 1.\n", "1 FE3Gabcpp Iron (III) transport via ABC system (GTP) (periplasm)\n", "\n", "Solution 3. Reactions to add: 1.\n", "1 FE3abc Iron (III) transport via ABC system\n", "\n", "Solution 4. Reactions to add: 1.\n", "1 FE3abcpp Iron (III) transport via ABC system (periplasm to cytoplasm)\n", "\n", "Solution 5. Reactions to add: 1.\n", "1 FE3abcpp Iron (III) transport via ABC system (periplasm to cytoplasm)\n" ] } ], "source": [ "gempipe.set_unbounded_exchanges(panmodel, ['EX_fe3_e']) \n", "gempipe.set_unbounded_exchanges(universe, ['EX_fe3_e']) \n", "\n", "\n", "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.1, mid='fe3_c', nsol=5)" ] }, { "cell_type": "markdown", "id": "712ff20c-fd03-40f4-8707-467a42b386fe", "metadata": {}, "source": [ "After adding an iron transporter, only the copper block remains: " ] }, { "cell_type": "code", "execution_count": 35, "id": "b8244a98-af1b-4992-a80c-8b6a86fa1ed3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 : 0.0 : optimal : cu2_c : Copper\n" ] } ], "source": [ "gempipe.import_from_universe(panmodel, universe, 'FE3abcpp')\n", "\n", "_ = gempipe.check_reactants(panmodel, 'Growth')" ] }, { "cell_type": "markdown", "id": "e333ae10-2c28-4178-b560-fe63b269680d", "metadata": {}, "source": [ "Since it's another trace element, the solution is similar to what we just saw for iron. After the addition of a copper transporter, calling again the [gempipe.check_reactants](https://gempipe.readthedocs.io/en/latest/autoapi/gempipe/interface/gaps/index.html#gempipe.interface.gaps.check_reactants) function we receive an empty output, meaning that our _Erwinia_ model is able synthetize **all** precursors defined in the biomass assembly reaction: " ] }, { "cell_type": "code", "execution_count": 36, "id": "7ba0de61-ecd0-4848-8d51-f674e7508380", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Solution 1. Reactions to add: 1.\n", "1 CU2tpp Copper transport in via permease (no H+)\n", "\n", "Solution 2. Reactions to add: 1.\n", "1 CUabcpp Copper transport via ABC system (periplasm)\n", "\n", "Solution 3. Reactions to add: 1.\n", "1 Cuabc Copper transport via ABC system\n", "\n", "Solution 4. Reactions to add: 1.\n", "1 Cuabc Copper transport via ABC system\n", "\n", "Solution 5. Reactions to add: 1.\n", "1 CU2tpp Copper transport in via permease (no H+)\n" ] } ], "source": [ "gempipe.set_unbounded_exchanges(panmodel, ['EX_cu2_e']) \n", "gempipe.set_unbounded_exchanges(universe, ['EX_cu2_e']) \n", "\n", "\n", "_ = gempipe.perform_gapfilling(panmodel, universe, minflux=0.1, mid='cu2_c', nsol=5)" ] }, { "cell_type": "code", "execution_count": 37, "id": "519ddfc0-8ce5-4b30-b69b-b444fcca627f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gempipe.import_from_universe(panmodel, universe, 'CUabcpp')\n", "\n", "gempipe.check_reactants(panmodel, 'Growth')" ] }, { "cell_type": "markdown", "id": "a58eb9c6-1c31-4e4a-85a3-cffbb5dd7363", "metadata": {}, "source": [ "Indeed, we now see our _Erwinia_ model **growing** on the _Erwinia_ medium defined by [Grula 1960](https://doi.org/10.1128/jb.80.3.375-385.1960): " ] }, { "cell_type": "code", "execution_count": 38, "id": "895048b5-980f-4013-acfc-24502fd1a954", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.056417489421720736" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "apply_medium(panmodel)\n", "\n", "panmodel.slim_optimize()" ] }, { "cell_type": "markdown", "id": "ff6c5162-c541-402f-8bed-d44ec1c337d6", "metadata": {}, "source": [ "We just finshed a gap-filling for the production of biomass. After in-depth sanity checks and gap-fillings, the draft pan-GSMM outputted by `gempipe recon` could be finally called simply \"pan-GSMM\", indicating its final form. From this GSMM, we can now start to derive **strain**-specific GSMMs with `gempipe derive` (read [Part 3](part_3_gempipe_derive.ipynb)). Assuming that we are now ready to go on with the derivation, we save the pan-GSMM as it will be the main input of `gempipe derive`:" ] }, { "cell_type": "code", "execution_count": 40, "id": "e665976f-2c8b-4450-a71a-989dfcccb197", "metadata": {}, "outputs": [], "source": [ "import cobra\n", "\n", "cobra.io.save_json_model(panmodel, 'tutoring_materials/aphidicola/panmodel.json')" ] }, { "cell_type": "code", "execution_count": null, "id": "92140051-62bc-460a-9476-e5d6222b2ab2", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 5 }