{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "21ff81aa-2c70-470f-8e45-47ca02741f8f", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%aimport gempipe, gempipe.interface, gempipe.interface.sanity, gempipe.interface.gaps, gempipe.interface.medium\n", "%autoreload 1" ] }, { "cell_type": "markdown", "id": "2167d1a0-e0b8-4cf6-b37b-a27e22865fd9", "metadata": {}, "source": [ "# _Tutorial:_ strain clustering\n", "\n", "Using `gempipe derive`, we generated GSMMs for 45 randomly picked strains belonging to 3 species of _Lacticaseibacillus_: _L. casei_, _L. paracasei_, _L. rhamnosus_ (15 each). During the process, the binary feature table `rpam.csv` was generated, showing the presence / absence of metabolic reactions in each strain. Moreover, since we run `gempipe derive` with the options `--cnps` and `--aux`, two other binary feature tables were produced: `cnps.csv` and `aux.csv`. The first include the ability / inability of strains in catabolizing alternative C, N, P and S sources; the second report the presence / absence of auxotrophies for specific amino acids and vitamins. \n", "\n", "These tables have always the same structure: binary metabolic features in row, strains (genome accessions) in column. As the features are binary, cells contain either 1 (feature present) or 0 (feature absent). Below we see the table for alternative substrates (`cnps.csv`): IDs are preceded by the source type (C, N, P or S), and substrates can be repeated if they are considered source of multiple types of atoms at the same time (for example, methionine `met__L`, having chemical formula `C5H11NO2S`, will be tested as C, N and S source, so it will appear in three separated rows)." ] }, { "cell_type": "code", "execution_count": 2, "id": "839d8c04-8081-4fc4-bf74-046cb192655d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | GCA_025190575.1 | \n", "GCA_000026525.1 | \n", "GCA_005864085.1 | \n", "GCA_001013375.1 | \n", "GCA_024329605.1 | \n", "GCA_959021285.1 | \n", "GCA_000311965.1 | \n", "GCA_024329385.1 | \n", "GCA_025190605.1 | \n", "GCA_026427555.1 | \n", "... | \n", "GCA_002091975.1 | \n", "GCA_007989685.1 | \n", "GCA_030224505.1 | \n", "GCA_000735255.1 | \n", "GCA_030480425.1 | \n", "GCA_032466035.1 | \n", "GCA_030215365.1 | \n", "GCA_030361365.1 | \n", "GCA_028578865.1 | \n", "GCA_002091995.1 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [C]melib | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| [C]Larab | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| [C]acald | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| [C]serglugly | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| [C]pacald | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| [S]met__L | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
| [S]h2s | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| [S]gly_met | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
| [S]taur | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| [S]methal_ | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
307 rows × 45 columns
\n", "| \n", " | species | \n", "strain | \n", "niche | \n", "G | \n", "R | \n", "M | \n", "obj_value | \n", "status | \n", "R.1 | \n", "inserted_rids | \n", "solver_error | \n", "obj_value_gf | \n", "status_gf | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accession | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| GCA_008868595.1 | \n", "Lacticaseibacillus casei | \n", "BIO5773 | \n", "- | \n", "1039 | \n", "1312 | \n", "1212 | \n", "0.0 | \n", "optimal | \n", "1313 | \n", "{'cdm25_gempipe': ['GLUR']} | \n", "- | \n", "{'cdm25_gempipe': 1.6898873415546019} | \n", "{'cdm25_gempipe': 'optimal'} | \n", "
| GCA_005864085.1 | \n", "Lacticaseibacillus casei | \n", "FAM 20446 | \n", "- | \n", "948 | \n", "1255 | \n", "1212 | \n", "0.0 | \n", "optimal | \n", "1256 | \n", "{'cdm25_gempipe': ['ALATA_D']} | \n", "- | \n", "{'cdm25_gempipe': 1.6898873415543527} | \n", "{'cdm25_gempipe': 'optimal'} | \n", "
| GCA_018363095.1 | \n", "Lacticaseibacillus casei | \n", "FBL6 | \n", "- | \n", "1018 | \n", "1289 | \n", "1212 | \n", "0.0 | \n", "optimal | \n", "1290 | \n", "{'cdm25_gempipe': ['GLUR']} | \n", "- | \n", "{'cdm25_gempipe': 1.6898873415546252} | \n", "{'cdm25_gempipe': 'optimal'} | \n", "
| GCA_002091995.1 | \n", "Lacticaseibacillus casei | \n", "GCRL 163 | \n", "- | \n", "1039 | \n", "1265 | \n", "1212 | \n", "0.0 | \n", "optimal | \n", "1266 | \n", "{'cdm25_gempipe': ['GLUR']} | \n", "- | \n", "{'cdm25_gempipe': 1.7027403931953289} | \n", "{'cdm25_gempipe': 'optimal'} | \n", "
| GCA_037901485.1 | \n", "Lacticaseibacillus casei | \n", "LC130 | \n", "- | \n", "877 | \n", "1229 | \n", "1212 | \n", "0.0 | \n", "optimal | \n", "1230 | \n", "{'cdm25_gempipe': ['GLUR']} | \n", "- | \n", "{'cdm25_gempipe': 1.1298198948419083} | \n", "{'cdm25_gempipe': 'optimal'} | \n", "