diff --git a/doc/assets/hybrid_types.png b/doc/assets/hybrid_types.png new file mode 100644 index 0000000..6f717ad Binary files /dev/null and b/doc/assets/hybrid_types.png differ diff --git a/doc/assets/hybrid_types_dark.png b/doc/assets/hybrid_types_dark.png new file mode 100644 index 0000000..544ee5f Binary files /dev/null and b/doc/assets/hybrid_types_dark.png differ diff --git a/doc/examples/getting_started/getting_started.ipynb b/doc/examples/getting_started/getting_started.ipynb index 146dba7..0ed784b 100644 --- a/doc/examples/getting_started/getting_started.ipynb +++ b/doc/examples/getting_started/getting_started.ipynb @@ -5,18 +5,24 @@ "id": "5f668bef-04b2-4ecd-a195-1f107ef83c07", "metadata": {}, "source": [ - "# Getting Started" - ] - }, - { - "cell_type": "markdown", - "id": "c1fbd084-6c5b-46c5-add0-7210f175aeb3", - "metadata": {}, - "source": [ - "## Overview\n", + "# Getting Started\n", + "\n", + "This introductory tutorial shows how to set up a PEtab SciML problem using [AMICI](https://amici.readthedocs.io/en/latest/index.html). It walks through the main PEtab SciML problem files and focuses on creating a problem where a neural network (NN) enters the model dynamics, which is often called a universal differential equation (UDE) (also referred to as a grey-box model or hybrid Neural ODE). Familiarity with the PEtab v2 format is assumed; see the [PEtab tutorial](https://petab.readthedocs.io/en/latest/v2/tutorial/tutorial.html).\n", + "\n", + "The tutorial is provided as a notebook, available [here](https://github.com/PEtab-dev/petab_sciml/blob/main/doc/examples/getting_started/getting_started.ipynb), and the corresponding PEtab SciML problem files can be downloaded [here](https://github.com/PEtab-dev/petab_sciml/tree/main/doc/examples/getting_started).\n", "\n", - "This getting started tutorial outlines how to set up and simulate a PEtab SciML problem using [AMICI](https://amici.readthedocs.io/en/latest/index.html). Some familiarity with the PEtab format is assumed, see [PEtab](https://petab.readthedocs.io/en/latest/v1/documentation_data_format.html) for a refresher.\n", - "The environment and example PEtab files to run this notebook are provided in the PEtab SciML repo." + "
\n", + " \n", + " \n", + "
\n" ] }, { @@ -24,19 +30,33 @@ "id": "6a17e84a-04b6-449c-81e7-9a686723a062", "metadata": {}, "source": [ - "As an example, we will use the Lotka-Voltera system:\n", + "## Input problem\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\beta \\cdot \\text{prey} \\cdot \\text{predator}$$\n", + "As a working example, the Lotka–Volterra system is considered:\n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\gamma \\cdot \\text{prey} \\cdot \\text{predator} - \\delta \\cdot \\text{predator}$$\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\text{alpha} \\cdot \\text{prey}\n", + "- \\text{beta} \\cdot \\text{prey} \\cdot \\text{predator}, \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\text{gamma} \\cdot \\text{prey} \\cdot \\text{predator}\n", + "- \\text{delta} \\cdot \\text{predator}.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "We will replace the two interactive terms ($\\beta$ and $\\gamma$) with outputs from a neural network. \n", - "The ``prey`` and ``predator`` variables in the model will be used as the inputs to this network.\n", - "This example follows the Universal Differential Equation (UDE) case in [Rackauckas et al. 2020](https://arxiv.org/pdf/2001.04385).\n", + "Following Rackauckas et al. [1], to create a UDE, the interaction terms are replaced by the outputs of a NN that takes the states `prey` and `predator` as input:\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\text{NN}(\\text{prey}, \\text{predator})[0]$$\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\text{alpha} \\cdot \\text{prey}\n", + "- \\mathrm{NN}(\\text{prey}, \\text{predator}; \\text{theta})[0], \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\mathrm{NN}(\\text{prey}, \\text{predator}; \\text{theta})[1]\n", + "- \\text{delta} \\cdot \\text{predator}.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\text{NN}(\\text{prey}, \\text{predator})[1] - \\delta \\cdot \\text{predator}$$" + "Here, $\\mathrm{NN}(\\cdot; \\text{theta})[i]$ denotes the `i`-th output of the neural network.\n", + "\n", + "Given time-series measurements of the model states `prey` and `predator`, the goal of this tutorial is to set up a PEtab-SciML problem to estimate both mechanistic parameters (`alpha`, `delta`) and neural-network parameters (`theta`) from these measurements." ] }, { @@ -54,9 +74,26 @@ "id": "c00cbb82-588e-433c-9b2b-5b8240ee252b", "metadata": {}, "source": [ - "## Loading the PEtab problem\n", + "## PEtab SciML problem files\n", + "\n", + "A PEtab SciML problem is defined by a set of tabular TSV files, a model file, NN file(s), optional array data files, and a problem YAML file that ties everything together. Compared to PEtab, PEtab SciML adds a small number of files to describe the ML models and how they interact with the mechanistic model. Files specific to PEtab SciML are:\n", + "\n", + "- neural network file(s)\n", + "- hybridization table\n", + "- array data files\n", "\n", - "We will start by loading the PEtab problem using the PEtab Python library, so that we can interactively explore the files that are used to build and simulate the problem." + "Files inherited from PEtab:\n", + "\n", + "- model file\n", + "- conditions table\n", + "- experiment table\n", + "- mapping table\n", + "- measurements table\n", + "- observables table\n", + "- parameters table\n", + "- YAML file\n", + "\n", + "To make the format concrete, we will load the example problem (via AMICI) and then walk through the files, focusing on the PEtab SciML-specific additions." ] }, { @@ -71,40 +108,25 @@ "petab_problem = Problem.from_yaml(\"problem.yaml\")" ] }, - { - "cell_type": "markdown", - "id": "8201b3ab-6c90-4a03-807f-91aa079e76f9", - "metadata": {}, - "source": [ - "The files needed to define a PEtab SciML problem are:\n", - "- model file\n", - "- neural network file\n", - "- mapping table\n", - "- hybridization table\n", - "- parameters table\n", - "- network array inputs\n", - "- measurements table\n", - "- observables table\n", - "- conditions table\n", - "- YAML file\n", - "\n", - "The hybridization, network YAML and network input files are new and specific to PEtab SciML problems. The other files already exist in the PEtab format but have been expanded in order to support PEtab SciML problems." - ] - }, { "cell_type": "markdown", "id": "89d0f00c-daef-47cb-a5db-7d2b2566f5da", "metadata": {}, "source": [ - "### Model File\n", + "### Model file for a UDE\n", "\n", - "PEtab SciML importers accept models in various exchangeable standard formats (e.g., SBML, CellML, BioNetGen). For our example we use an SBML model because this standard is widely supported across PEtab importers. The SBML model will need modifying because outputs from the neural network can only be assigned to parameters in the PEtab problem. This means the ``prey`` and ``predator`` species will need to be removed from the expressions we want to parameterise, i.e.\n", + "PEtab v2 (and, by extension, PEtab SciML) accepts mechanistic models in common exchange formats (e.g. SBML, CellML, BioNetGen). In this tutorial, an SBML model is used since it is widely supported across PEtab SciML importers.\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\beta$$\n", + "In PEtab SciML, NN outputs are linked to the mechanistic model by assigning them to parameters in the model file. Therefore, the parts of the equations to be learned must be represented as parameters. In this example, the interaction terms are replaced by the parameters `beta` and `gamma`, which are later mapped to the network outputs to form a UDE, so the model file corresponds to:\n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\gamma - \\delta \\cdot \\text{predator}$$\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\alpha \\cdot \\text{prey} - \\text{beta}, \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\gamma - \\delta \\cdot \\text{predator}.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "Whichever model format is used, the idea is that parameters in the model are replaced by network elements. There are a large number of tools available for creating SBML model files like these, such as GUI based [COPASI](https://copasi.org/). " + "In the hybridization table (see below), `beta` and `gamma` are then assigned the outputs of a NN." ] }, { @@ -112,31 +134,45 @@ "id": "6c6bea9d-0f3d-43bc-ad9b-f1fb7a2e18d8", "metadata": {}, "source": [ - "### Neural Network File\n", + "### Neural network file\n", "\n", - "This file defines the network architecture, and should be provided in the PEtab SciML associated network YAML format. You can generate a file like this by defining a neural network in PyTorch and exporting it to the SciML YAML format. The provided example defines a network with three ``Linear`` layers and a ``tanh`` activation function. See the page on supported layers and activation functions for a complete list." + "PEtab SciML defines a programming-language-independent YAML format for storing NN models. This file is most easily generated by defining the network as a PyTorch module and exporting it to the PEtab SciML YAML representation.\n", + "\n", + "For our working example, we will use a simple feed-forward network with a single hidden layer and a `tanh` activation function. It can be defined and exported as:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "id": "f1a5301d-ee1e-4091-9ec7-755f6c36c0e8", "metadata": {}, "outputs": [], "source": [ "from petab_sciml.standard.nn_model import Input, NNModel, NNModelStandard\n", "import torch\n", - "from torch import nn\n", "import torch.nn.functional as F\n", "\n", - "class NeuralNetwork(nn.Module):\n", + "\n", + "class NeuralNetwork(torch.nn.Module):\n", + " \"\"\"\n", + " Neural network for the Lotka–Volterra UDE example.\n", + "\n", + " Notes\n", + " -----\n", + " For PEtab SciML export, the model must:\n", + " - define its layers in `__init__`\n", + " - implement `forward` (to define the execution trace).\n", + " \"\"\"\n", + "\n", " def __init__(self):\n", + " \"\"\"Define all layers of the network.\"\"\"\n", " super().__init__()\n", " self.layer1 = torch.nn.Linear(2, 5)\n", " self.layer2 = torch.nn.Linear(5, 5)\n", " self.layer3 = torch.nn.Linear(5, 2)\n", "\n", " def forward(self, net_input):\n", + " \"\"\"Forward execution trace.\"\"\"\n", " x = self.layer1(net_input)\n", " x = F.tanh(x)\n", " x = self.layer2(x)\n", @@ -144,13 +180,13 @@ " x = self.layer3(x)\n", " return x\n", "\n", + "\n", + "# Export using NNModel from the petab_sciml library\n", "net1 = NeuralNetwork()\n", "nn_model1 = NNModel.from_pytorch_module(\n", " module=net1, nn_model_id=\"net1\", inputs=[Input(input_id=\"input0\")]\n", ")\n", - "NNModelStandard.save_data(\n", - " data=nn_model1, filename=\"net1_from_pytorch.yaml\"\n", - ")" + "NNModelStandard.save_data(data=nn_model1, filename=\"net1_from_pytorch.yaml\")" ] }, { @@ -158,7 +194,7 @@ "id": "a580250b-0eb9-4502-a9bb-5b2fc514cbc1", "metadata": {}, "source": [ - "Note that where any specific network layers or parameters are referenced in the ``mapping.tsv``, it should refer to them by the layer ids in this file." + "Here, `nn_model_id` is the unique NN model ID, which is used throughout the PEtab SciML problem to refer to this NN (e.g. in the mapping table and problem yaml file)." ] }, { @@ -166,18 +202,16 @@ "id": "c94ad071-2c03-4be7-9a1a-f9bcefab7d4d", "metadata": {}, "source": [ - "### Mapping Table\n", - "\n", - "The purpose of the mapping table is to define PEtab compatible identifiers for model entities that would otherwise not be valid PEtab. Inputs, outputs and parameters of our neural network are all examples of model entities that would not be valid PEtab identifiers, so we define PEtab ids for these in this file.\n", + "### Mapping table\n", "\n", - "Take the first row for example. The valid PEtab identifier ``net1_input1`` is mapped to the model id ``net1.inputs[0][0]``. This model id denotes the first index of the first input to the neural network (note that zero based indexing is used).\n", + "The mapping table links NN entities (inputs, outputs, and parameters) to PEtab identifiers. These PEtab IDs can then be referenced in other PEtab SciML tables where appropriate. This step is needed because raw network entity names such as `net1.inputs[0][0]` are not valid PEtab IDs.\n", "\n", - "``net1.parameters`` refers to all the trainable parameters in the network i.e. weights and biases of all the layers." + "For the working example, the mapping table is:" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 3, "id": "a6631bb7-f198-4ac7-8c28-8c076d2fdc74", "metadata": {}, "outputs": [ @@ -244,7 +278,7 @@ "net1_ps net1.parameters" ] }, - "execution_count": 5, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -253,14 +287,22 @@ "petab_problem.mapping_df" ] }, + { + "cell_type": "markdown", + "id": "ff74f5a5", + "metadata": {}, + "source": [ + "For example, `net1_input1` is mapped to `net1.inputs[0][0]`, where the first index selects the input argument (the network `forward` function can have multiple input arguments), and the second index selects the entry within that input vector. The same indexing convention is used for outputs. Meanwhile, `net1.parameters` refers to all trainable parameters in the network (weights and biases across all layers). It is also possible to map subsets, for example `net1.parameters.layer1`, which can then be referenced (e.g. for priors) in the parameter table." + ] + }, { "cell_type": "markdown", "id": "31a5da7b-cb0b-4f50-9975-4c15f57a84cd", "metadata": {}, "source": [ - "### Hybridization Table\n", + "### Hybridization table\n", "\n", - "The hybridization table defines where our neural network inputs and outputs get used in the model. As these hybridization elements are integral to the construction of the model, the supporting tools add hybridization information when the model is defined. At this point we will therefore build the model from the PEtab problem in order to demonstrate how the neural network inputs and outputs are inserted into the ODE system." + "The hybridization table specifies where NN inputs and outputs are used in the mechanistic model (e.g. which parameters are replaced by network outputs). To make this concrete, let's build the model from the PEtab problem and inspect how the NN inputs and outputs are hybridized:" ] }, { @@ -276,21 +318,27 @@ " run_simulations,\n", ")\n", "\n", + "# TODO: Update when AMICI is updated!\n", "# Create AMICI model for the petab problem\n", "jax_model = import_petab_problem(\n", - " petab_problem,\n", - " model_output_dir=\"model\",\n", - " compile_=True,\n", - " jax=True\n", + " petab_problem, model_output_dir=\"model\", compile_=True, jax=True\n", ")\n", "\n", "# Create the JAXProblem - wrapper for the AMICI model\n", "jax_problem = JAXProblem(jax_model, petab_problem)" ] }, + { + "cell_type": "markdown", + "id": "0875fb46", + "metadata": {}, + "source": [ + "The hybridization table can be inspected with:" + ] + }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "id": "4169bb35-28dc-4a6e-b23b-d188775a1d3d", "metadata": {}, "outputs": [ @@ -315,28 +363,29 @@ " \n", " \n", " \n", - " targetValue\n", - " \n", - " \n", " targetId\n", - " \n", + " targetValue\n", " \n", " \n", " \n", " \n", - " net1_input1\n", + " 0\n", + " net1_input1\n", " prey\n", " \n", " \n", - " net1_input2\n", + " 1\n", + " net1_input2\n", " predator\n", " \n", " \n", - " beta\n", + " 2\n", + " beta\n", " net1_output1\n", " \n", " \n", - " gamma\n", + " 3\n", + " gamma\n", " net1_output2\n", " \n", " \n", @@ -344,21 +393,24 @@ "" ], "text/plain": [ - " targetValue\n", - "targetId \n", - "net1_input1 prey\n", - "net1_input2 predator\n", - "beta net1_output1\n", - "gamma net1_output2" + " targetId targetValue\n", + "0 net1_input1 prey\n", + "1 net1_input2 predator\n", + "2 beta net1_output1\n", + "3 gamma net1_output2" ] }, - "execution_count": 7, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "jax_problem._petab_problem.hybridization_df" + "# TODO: Fix when AMICI is updated\n", + "import pandas as pd\n", + "\n", + "hybridization_df = pd.read_csv(\"hybridization.tsv\", sep=\"\\t\")\n", + "hybridization_df" ] }, { @@ -366,7 +418,7 @@ "id": "b4b478b7-2bfd-401a-9454-522d51e6583b", "metadata": {}, "source": [ - "In the example, the inputs to the network are the ``prey`` and ``predator`` values and the outputs of the network provide $\\beta$ and $\\gamma$. These substitutions hold for all simulation conditions. Note that we refer to the inputs and outputs of the network by their PEtab identifiers, as defined in the mapping file." + "From the output, we see the input variables defined in the mapping table are assigned the values of the model states `prey` and `predator`. The network outputs are mapped to the model parameters `beta` and `gamma`. Effectively, when importing the PEtab SciML problem the mechanistic model is constructed such that `beta` and `gamma` are provided by a NN evaluated on `prey` and `predator`, yielding the desired UDE." ] }, { @@ -374,14 +426,14 @@ "id": "3ebb97b5-878b-4d31-87ea-6d508801078a", "metadata": {}, "source": [ - "### Parameters Table\n", + "### Parameters table\n", "\n", - "This file contains configuration information on all model parameters, which in our case, includes parameters of the neural network. Actual values of the network parameters should be provided in a separate HDF5 array file (see more details in the next section). The parameters file specifies the scale, bounds and nominal values of all the parameters. We want our neural network parameters to be freely optimised during training, so we set the bounds to ``[-inf, inf]`` and leave the nominal value blank (translates to a NaN when loaded into the PEtab problem)." + "In PEtab SciML, the parameters table also lists NN parameters. For each network parameter ID defined in the mapping table, it specifies bounds and (optionally) nominal values. The actual numerical values used to initialize the network parameters are stored separately in an HDF5 array file (see the next section):" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 6, "id": "6171d8ed-a485-49a8-b822-068fd70d8559", "metadata": {}, "outputs": [ @@ -458,7 +510,7 @@ "net1_ps lin -inf inf NaN 1" ] }, - "execution_count": 8, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -467,6 +519,14 @@ "petab_problem.parameter_df" ] }, + { + "cell_type": "markdown", + "id": "735942ff", + "metadata": {}, + "source": [ + "In this example, NN parameters are left unconstrained by using bounds `(-inf, inf)`. The nominal value is left empty (loaded as `NaN` by pandas), meaning initialization is handled via the network parameter array file." + ] + }, { "cell_type": "markdown", "id": "376d7687-24dc-484b-820e-44a7a28a0a6e", @@ -474,29 +534,50 @@ "jp-MarkdownHeadingCollapsed": true }, "source": [ - "### Array inputs\n", + "### Array data\n", "\n", - "In this example, the values of parameters for the neural network are provided in HDF5 format.\n", - "The structure inside the HDF5 file is as follows.\n", + "PEtab TSV tables are not well-suited for storing large array-valued data (e.g. NN parameters). PEtab SciML therefore stores such parameter data in an HDF5 file with the following structure\n", "\n", + "```text\n", + "arrays.hdf5\n", + "├── metadata\n", + "│ └── pytorch_format\n", + "└── parameters\n", + " └── net1\n", + " ├── layer1\n", + " │ ├── weight\n", + " │ └── bias\n", + " ├── layer2\n", + " │ ├── weight\n", + " │ └── bias\n", + " └── layer3\n", + " ├── weight\n", + " └── bias\n", "```\n", - " arrays.hdf5\n", - " ├── metadata\n", - " │ └── perm \n", - " └── parameters\n", - " └── net1\n", - " ├── layer1 \n", - " │ ├── weight\n", - " │ └── bias\n", - " ├── layer2\n", - " │ ├── weight\n", - " │ └── bias\n", - " └── layer3\n", - " ├── weight\n", - " └── bias\n", - "```\n", "\n", - "Note the network id ``net1`` matches the identifier in the network YAML file and the ``modelEntityId`` in the ``mapping.tsv``. Likewise the layer names (``layer1``, ``layer2``, ``layer3``) in the HDF5 are identifiers and must match the layer names in the network YAML and mapping files." + "The network ID (`net1`) must match the identifier used in the PEtab SciML YAML file. Likewise, layer names (`layer1`, `layer2`, `layer3`) must match the layer names in the NN YAML file.\n", + "\n", + "An array file with the correct parameter structure can most easily be generated (with random initialization) from the NN YAML file using the `petab_sciml` library:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "70a39c82", + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: Update when AMICI is updated with filename\n", + "from petab_sciml.standard.array_data import (\n", + " ArrayData,\n", + " ArrayDataStandard,\n", + " extract_nn_yaml_parameters,\n", + ")\n", + "\n", + "# Generates array file with parameters in the required structure\n", + "net_ps = extract_nn_yaml_parameters(\"net1_from_pytorch.yaml\")\n", + "array_data = ArrayData.model_validate(net_ps)\n", + "ArrayDataStandard.save_data(array_data, \"net1_ps_tmp.hdf5\")" ] }, { @@ -504,14 +585,14 @@ "id": "b8097112-5f96-4c2a-822a-5c56647ccf14", "metadata": {}, "source": [ - "### Measurements, Observables and Conditions Tables\n", + "### Measurements, observables, experiment and conditions tables\n", "\n", - "These tables are unchanged from a regular PEtab probem. For completeness, the measurement table contains the measurement data, which experimental condition they where collected, and which model entities each measurement maps to." + "These tables are unchanged from standard PEtab. Briefly, the measurements table contains the measurement data, the experiment under which each measurement was collected, and the observable (`observableId`) that links the data to model output:" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "id": "e119d605-964e-4603-b120-47ce367c3825", "metadata": {}, "outputs": [ @@ -711,7 +792,7 @@ "19 predator_o cond1 1.192784 10.0" ] }, - "execution_count": 9, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -725,12 +806,12 @@ "id": "fac45ce4-45ee-4b3b-82fa-f5f403039439", "metadata": {}, "source": [ - "The observables table links the model entities to the measured values." + "The observables table links the model entities to the measurements:" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 9, "id": "4984f33e-2373-4614-9198-e03398276f95", "metadata": {}, "outputs": [ @@ -799,7 +880,7 @@ "predator_o normal " ] }, - "execution_count": 10, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -813,12 +894,12 @@ "id": "613e0af7-41e0-4d74-9f55-579b31872f55", "metadata": {}, "source": [ - "The conditions table in intended to fully describe the experimental conditions the measurements were collected under. It can have any number of columns, but in this example there is only one experimental condition and no additional information about it." + "The conditions table describes the experimental conditions the measurements were collected under. It can have any number of columns, but in this example there is only one experimental condition and no additional information about it." ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 10, "id": "ef2136e1-9bdb-4c20-a6d2-9388f33a0fa0", "metadata": {}, "outputs": [ @@ -862,7 +943,7 @@ "Index: [cond1]" ] }, - "execution_count": 11, + "execution_count": 10, "metadata": {}, "output_type": "execute_result" } @@ -876,40 +957,43 @@ "id": "b60ce1a4-87a0-4be6-92f6-dbd74fd39e2f", "metadata": {}, "source": [ - "### YAML File\n", + "### YAML problem file\n", "\n", - "Finally, the ``problem.yaml`` file brings everything together by describing how the problem should be constructed from the PEtab files. The ``extensions`` section contains the details of the SciML elements of our problem.\n", + "Finally, `problem.yaml` ties everything together by listing the PEtab files used to build the problem. PEtab SciML-specific information is provided under `extensions.sciml`:\n", "\n", "```yaml\n", - " format_version: 2\n", - " parameter_file: \"parameters.tsv\"\n", - " problems:\n", - " - model_files:\n", - " lv:\n", - " location: \"lv.xml\"\n", - " language: \"sbml\"\n", - " measurement_files:\n", - " - \"measurements.tsv\"\n", - " observable_files:\n", - " - \"observables.tsv\"\n", - " condition_files:\n", - " - \"conditions.tsv\"\n", - " mapping_files:\n", - " - \"mapping.tsv\"\n", - " extensions:\n", - " sciml:\n", - " array_files:\n", - " - \"net1_ps.hdf5\"\n", - " hybridization_files:\n", - " - \"hybridization.tsv\"\n", - " neural_nets:\n", - " net1:\n", - " location: \"net1.yaml\"\n", - " static: false\n", - " format: \"YAML\"\n", + "format_version: 2\n", + "problems:\n", + " - model_files:\n", + " lv:\n", + " location: \"lv.xml\"\n", + " language: \"sbml\"\n", + " parameter_files:\n", + " - \"parameters.tsv\"\n", + " measurement_files:\n", + " - \"measurements.tsv\"\n", + " observable_files:\n", + " - \"observables.tsv\"\n", + " condition_files:\n", + " - \"conditions.tsv\"\n", + " mapping_files:\n", + " - \"mapping.tsv\"\n", + "extensions:\n", + " sciml:\n", + " version: 0.1.0\n", + " required: true\n", + " neural_networks:\n", + " net1:\n", + " location: \"net1.yaml\"\n", + " pre_initialization: false\n", + " format: \"YAML\"\n", + " array_files:\n", + " - \"net1_ps.hdf5\"\n", + " hybridization_files:\n", + " - \"hybridization.tsv\"\n", "```\n", "\n", - "Where the network is referenced in the ``mapping.tsv`` file, the ``modelEntityId`` must match the network id listed in this YAML, ``net1`` in this case." + "In more detail, `extensions.sciml` must list all neural networks. Each network is given a unique ID (here `net1`), which is then referenced throughout the PEtab SciML files (e.g. in the mapping table and array files). For each network, the YAML specifies where the network definition is stored (`location`), its format, and whether it is evaluated pre-initialization (`pre_initialization = true`) or during simulation (`false`, as in this UDE example). The `extensions.sciml` section should also list the hybridization table (if provided) and any array files." ] }, { @@ -917,14 +1001,14 @@ "id": "e80d4679-3ac3-47e5-afb0-165c1eb3f96f", "metadata": {}, "source": [ - "## Running Simulations\n", + "## Running simulations\n", "\n", - "We are now ready to run a simulation using the jax problem. AMICI's `run_simulations` will simulate all conditions provided. In this example though, there is only one. See the [AMICI docs](https://amici.readthedocs.io/en/latest/python_examples.html) for more examples such as training loops." + "The problem is now ready to simulate with the JAX backend. AMICI’s `run_simulations` evaluates the model for all simulation conditions defined in the PEtab problem:" ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 11, "id": "47eebce8-9cbe-4970-8ac6-50f07a5a8498", "metadata": {}, "outputs": [], @@ -938,28 +1022,42 @@ "sllh, rgrad = eqx.filter_grad(run_simulations, has_aux=True)(jax_problem)" ] }, + { + "cell_type": "markdown", + "id": "06e111b5", + "metadata": {}, + "source": [ + "For more examples (including model training), see the [AMICI docs](https://amici.readthedocs.io/en/latest/python_examples.html). Likewise, PEtab SciML problems can also be imported and trained in Julia using [PEtab.jl](https://sebapersson.github.io/PEtab.jl/stable/tutorials/sciml/standard_format)." + ] + }, { "cell_type": "markdown", "id": "febc0864-5602-4339-9a23-c123ce126534", "metadata": {}, "source": [ - "## Next Steps\n", + "## Next steps\n", + "\n", + "This tutorial showed how to set up a PEtab SciML problem where a machine-learning (ML) model enters the ODE dynamics (a UDE). PEtab SciML also supports other hybridizations, covered in the how-to guides:\n", + "\n", + "- [Neural ODEs](https://petab-sciml.readthedocs.io/latest/examples/how_to_neural_ode/how_to_neural_ode.html): Replace all ODE equations with a neural network.\n", + "- [ML models in observable formulas](https://petab-sciml.readthedocs.io/latest/examples/how_to_observable/how_to_observable.html): Use an ML model in the observable formula (e.g. to correct model misspecification).\n", + "- [Pre-initialization ML models](https://petab-sciml.readthedocs.io/latest/examples/how_to_dmms/how_to_dmms.html): Use ML models that map input data (e.g. images) to ODE parameters and/or initial conditions prior to simulation.\n", + "\n", + "A collection of real-data PEtab SciML problems is also available in the [PEtab SciML benchmark collection](https://github.com/sebapersson/Benchmark-Models-PEtab SciML).\n", + "\n", + "By design, the PEtab SciML tutorials focus on problem specification. For downstream tasks such as simulation and model training, see the [Software support page](ADD!) for tools supporting PEtab SciML.\n", "\n", - "This tutorial has outlined how to set up a PEtab SciML problem where a neural network appears in the ODE equations (creating what is sometimes referred to as an UDE). \n", - "The PEtab SciML format also supports others ways of formulating a problem, which you can find examples of in the how-to-guides. These include:\n", - "- neural ODEs (all ODE equations are replaced with a neural network)\n", - "- using networks in the observable function which links simulated values to measurements\n", - "- having a neural network upstream of the ODE mapping high-dimensional inputs (e.g., images) to ODE model parameters.\n", + "Lastly, this tutorial only scratches the surface of what PEtab SciML supports. For a complete description of the format, see the [format specification](https://petab-sciml.readthedocs.io/latest/format.html). While a simple NN architecture is used here, the PEtab SciML NN YAML format supports a wide range of standard layers and activation functions; see the [layers overview](https://petab-sciml.readthedocs.io/latest/layers.html). As PEtab SciML extends PEtab, it also supports PEtab features such as partial observability, multiple simulation conditions, and events.\n", "\n", - "We implemented a simple network in this tutorial for demonstration purposes but the PEtab SciML format supports a range of standard neural network layers and activation functions based on the PyTorch framework. The dedicated docs page lists these layers and functions, along with the tools that support each.\n", + "## References\n", "\n", - "For a complete description of the PEtab SciML format visit the [format specification](https://petab.readthedocs.io/en/latest/v1/documentation_data_format.html) page and for more information about more complex problem specifications supported by the PEtab format (e.g., models with partial observabillity, multiple experimental/simulations conditions, events), see the [PEtab documentation](https://petab.readthedocs.io/en/latest/)." + "1. Rackauckas, Christopher, et al. \"Universal differential equations for scientific machine learning.\" arXiv preprint arXiv:2001.04385 (2020)." ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -973,7 +1071,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.3" + "version": "3.13.13" } }, "nbformat": 4, diff --git a/doc/examples/getting_started/requirements.txt b/doc/examples/getting_started/requirements.txt index f467db8..46887c8 100644 --- a/doc/examples/getting_started/requirements.txt +++ b/doc/examples/getting_started/requirements.txt @@ -17,7 +17,7 @@ bleach==6.2.0 certifi==2025.8.3 cffi==2.0.0 charset-normalizer==3.4.3 -chex==0.1.91 +chex==0.1.90 cmake==4.1.0 cmake-build-extension==0.6.0 colorama==0.4.6 @@ -97,7 +97,7 @@ notebook==7.4.5 notebook_shim==0.2.4 numpy==2.2.3 opt_einsum==3.4.0 -optax==0.2.6 +optax==0.2.5 optimistix==0.0.10 oyaml==1.0 packaging==25.0 @@ -105,7 +105,7 @@ pandas==2.3.2 pandocfilters==1.5.1 parso==0.8.5 petab @ git+https://github.com/petab-dev/libpetab-python.git@99373c4340512def5a23bf2a75805c3f88251251 -petab_sciml @ git+https://github.com/PEtab-dev/petab_sciml.git@80af726a3c2e430a068ea1ebf567f77c072fd85f#subdirectory=src/python +petab_sciml @ git+https://github.com/PEtab-dev/petab_sciml.git@main pexpect==4.9.0 pillow==11.3.0 Pint==0.25 diff --git a/doc/examples/getting_started/sim_hybrid.png b/doc/examples/getting_started/sim_hybrid.png new file mode 100644 index 0000000..ae6ef41 Binary files /dev/null and b/doc/examples/getting_started/sim_hybrid.png differ diff --git a/doc/examples/getting_started/sim_hybrid_dark.png b/doc/examples/getting_started/sim_hybrid_dark.png new file mode 100644 index 0000000..dfbf3d4 Binary files /dev/null and b/doc/examples/getting_started/sim_hybrid_dark.png differ diff --git a/doc/examples/how_to_dmms/how_to_dmms.ipynb b/doc/examples/how_to_dmms/how_to_dmms.ipynb index c007c3f..7185bc9 100644 --- a/doc/examples/how_to_dmms/how_to_dmms.ipynb +++ b/doc/examples/how_to_dmms/how_to_dmms.ipynb @@ -1,39 +1,63 @@ { "cells": [ - { - "cell_type": "markdown", - "id": "b3339048-6d80-4eeb-befa-4bc1b3c468ec", - "metadata": {}, - "source": [ - "# How To: Setting ODE parameters with a Neural Network" - ] - }, { "cell_type": "markdown", "id": "d54f42a2-a31d-4fad-926a-b562ba26cc11", "metadata": {}, "source": [ - "This is the case where the neural network is evaluated prior to the simulation and the output from it sets values of parameters in the ODE. This approach can be referred to as a Deep Mechanistic Model (DMM). We assume some familiarity with the getting started tutorial, which examines an entire PEtab SciML problem, while this guide focuses on the parts that are relevant to this use case. \n", + "# Pre-initialization machine-learning models\n", + "\n", + "Sometimes informative high dimensional non-time-series data (e.g. images, omics data) are available. One way to leverage such data is to use a machine-learning (ML) model that takes it as input and, before simulation, maps it to ODE parameters and/or initial conditions.\n", "\n", - "As a case study we will use the Lotka-Volterra ODE system:\n", + "This tutorial shows how to define SciML problems where an ML model is evaluated pre-initialization to set model parameters using [AMICI](https://amici.readthedocs.io/en/latest/index.html). It focuses on the PEtab SciML files specific to this hybridization and covers the case where the ML model takes array inputs that vary across PEtab conditions. Familiarity with the [SciML starter tutorial](https://petab-sciml.readthedocs.io/latest/examples/getting_started/getting_started.html) is assumed. The tutorial is provided as a notebook, available [here](https://github.com/PEtab-dev/petab_sciml/blob/main/doc/examples/how_to_dmms/how_to_dmms.ipynb), and the corresponding PEtab SciML problem files can be downloaded [here](https://github.com/PEtab-dev/petab_sciml/tree/main/doc/examples/how_to_dmms).\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\beta \\cdot \\text{prey} \\cdot \\text{predator}$$\n", + "
\n", + " \n", + " \n", + "
\n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\gamma \\cdot \\text{prey} \\cdot \\text{predator} - \\delta \\cdot \\text{predator}$$\n", + "## Input problem\n", "\n", - "The observable models, which link measurement data to the model output, are then defined as,\n", + "As a working example, the Lotka–Volterra system is considered:\n", "\n", - "$$y_{\\text{prey}} = \\text{prey} + \\epsilon$$\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\text{alpha} \\cdot \\text{prey}\n", + "- \\text{beta} \\cdot \\text{prey} \\cdot \\text{predator}, \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\text{gamma} \\cdot \\text{prey} \\cdot \\text{predator}\n", + "- \\text{delta} \\cdot \\text{predator}.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "$$y_{\\text{predator}} = \\text{predator} + \\epsilon$$\n", + "With observation model:\n", "\n", - "In our formulation the output from the network replaces the parameter $\\gamma$, so that the ODE system becomes,\n", + "$$\n", + "\\begin{aligned}\n", + "y_{\\text{prey}} &= \\text{prey} + \\epsilon, \\\\\n", + "y_{\\text{predator}} &= \\text{predator} + \\epsilon.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\beta \\cdot \\text{prey} \\cdot \\text{predator}$$\n", + "where $\\epsilon \\sim \\mathcal{N}(0, \\sigma^2)$ is normally distributed measurement noise. In this example, a neural network (NN), which is evaluated prior to simulation, sets the value of the model parameter `gamma`. The ODE system becomes:\n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\text{NN}[0] \\cdot \\text{prey} \\cdot \\text{predator} - \\delta \\cdot \\text{predator}$$\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\text{alpha} \\cdot \\text{prey}\n", + "- \\text{beta} \\cdot \\text{prey} \\cdot \\text{predator}, \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\mathrm{NN}(\\text{array}; \\text{theta})[0] \\cdot \\text{prey} \\cdot \\text{predator}\n", + "- \\text{delta} \\cdot \\text{predator}.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "The input to the neural network can be either vector based (see other tutorials) or provided via an array file, as demonstrated in this tutorial." + "where $\\mathrm{NN}(\\text{array}; \\text{theta})$ denotes a NN that takes array (tensor) inputs whose values vary across PEtab simulation conditions. The goal of this tutorial is to set up a PEtab SciML problem for estimating both mechanistic parameters (`alpha`, `beta`, `delta`) and NN parameters (`theta`)." ] }, { @@ -41,15 +65,16 @@ "id": "292488b8-24c6-43c9-9e88-5270fa0efb33", "metadata": {}, "source": [ - "## Loading the PEtab problem" - ] - }, - { - "cell_type": "markdown", - "id": "79c7c39c-e010-4bfd-971f-70bf26c82d93", - "metadata": {}, - "source": [ - "Let's load the PEtab problem, build our model and define the overall hybrid problem as a ``JAXProblem``." + "## Defining a pre-initialization SciML problem\n", + "\n", + "A pre-initialization SciML problem with array inputs is defined by:\n", + "\n", + "1. Mapping NN inputs and outputs to PEtab identifiers in the mapping table.\n", + "2. Assigning the network output to a model parameter (or initial condition) in the hybridization table.\n", + "3. Providing condition-specific input arrays in a PEtab SciML HDF5 array file.\n", + "4. Setting the network to be evaluated pre-initialization in the problem YAML.\n", + "\n", + "To make this concrete, let's load the PEtab problem and inspect the relevant tables:" ] }, { @@ -62,7 +87,6 @@ "from amici.petab import import_petab_problem\n", "from amici.jax import (\n", " JAXProblem,\n", - " run_simulations,\n", ")\n", "from petab.v2 import Problem\n", "\n", @@ -71,13 +95,10 @@ "\n", "# Create a simulator for the ODE and NN models.\n", "jax_model = import_petab_problem(\n", - " petab_problem,\n", - " model_output_dir=\"model\",\n", - " compile_=True,\n", - " jax=True\n", + " petab_problem, model_output_dir=\"model\", compile_=True, jax=True\n", ")\n", "\n", - "# Create a JAXProblem to handle addition simulation information \n", + "# Create a JAXProblem to handle addition simulation information\n", "# (e.g. simulate multiple conditions).\n", "jax_problem = JAXProblem(jax_model, petab_problem)" ] @@ -87,12 +108,14 @@ "id": "da148931-f7ba-4b1b-850d-8c8754cd4ad9", "metadata": {}, "source": [ - "By looking at the hybridization, parameters and mapping tables we can see how this deep mechanistic modelling problem has been defined." + "### Mapping and hybridization tables\n", + "\n", + "First, we inspect the mapping and hybridization tables:" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "id": "7c1e0233-7e4b-411b-8ff7-ac57a2febf06", "metadata": {}, "outputs": [ @@ -117,48 +140,50 @@ " \n", " \n", " \n", - " targetValue\n", + " modelEntityId\n", " \n", " \n", - " targetId\n", + " petabEntityId\n", " \n", " \n", " \n", " \n", " \n", - " gamma\n", - " net3_output1\n", + " input0\n", + " net3.inputs[0]\n", + " \n", + " \n", + " net3_output1\n", + " net3.outputs[0][0]\n", + " \n", + " \n", + " net3_ps\n", + " net3.parameters\n", " \n", " \n", "\n", "" ], "text/plain": [ - " targetValue\n", - "targetId \n", - "gamma net3_output1" + " modelEntityId\n", + "petabEntityId \n", + "input0 net3.inputs[0]\n", + "net3_output1 net3.outputs[0][0]\n", + "net3_ps net3.parameters" ] }, - "execution_count": 3, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "jax_problem._petab_problem.hybridization_df" - ] - }, - { - "cell_type": "markdown", - "id": "38411614-bf14-4854-b655-5b29c3e345ec", - "metadata": {}, - "source": [ - "The ``gamma`` parameter from the model is mapped to the first output of the neural network, ``net3_output1``, via the hybridization and mapping tables." + "petab_problem.mapping_df" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 3, "id": "832c8deb-927d-4189-b278-0bc8e5dfdfbd", "metadata": {}, "outputs": [ @@ -183,45 +208,36 @@ " \n", " \n", " \n", - " modelEntityId\n", - " \n", - " \n", - " petabEntityId\n", - " \n", + " targetId\n", + " targetValue\n", " \n", " \n", " \n", " \n", - " input0\n", - " net3.inputs[0]\n", - " \n", - " \n", - " net3_output1\n", - " net3.outputs[0][0]\n", - " \n", - " \n", - " net3_ps\n", - " net3.parameters\n", + " 0\n", + " gamma\n", + " net3_output1\n", " \n", " \n", "\n", "" ], "text/plain": [ - " modelEntityId\n", - "petabEntityId \n", - "input0 net3.inputs[0]\n", - "net3_output1 net3.outputs[0][0]\n", - "net3_ps net3.parameters" + " targetId targetValue\n", + "0 gamma net3_output1" ] }, - "execution_count": 4, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "petab_problem.mapping_df" + "# TODO: Fix when AMICI is updated\n", + "import pandas as pd\n", + "\n", + "hybridization_df = pd.read_csv(\"hybridization.tsv\", sep=\"\\t\")\n", + "hybridization_df" ] }, { @@ -229,12 +245,14 @@ "id": "c999332c-e901-4f2c-877a-7ca482f75fdd", "metadata": {}, "source": [ - "Finally, in the parameter table, the network parameters are listed with infinite bounds and specified to be estimated. Also note that $\\gamma$ does not appear in the parameters table, as its value is set by the neural network model." + "From the hybridization table, the network output is mapped to the model parameter `gamma`. During import, this means that `gamma` is computed by the NN. The network input `input0` is mapped to `net3.inputs[0]`. Note that no additional indexing into the input argument is used here, since the full input tensor is provided via an array input file. The NN input must be mapped to a PEtab identifier in the mapping table so it can be matched to the corresponding array input data (see section below).\n", + "\n", + "Finally, since `gamma` is assigned by the NN, it should not appear as an estimated parameter in the parameters table." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 4, "id": "3cb05448-ca2a-4fbc-92e8-9d6f1f8ff630", "metadata": {}, "outputs": [ @@ -320,7 +338,7 @@ "net3_ps lin -inf inf NaN 1" ] }, - "execution_count": 5, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -334,76 +352,60 @@ "id": "bb0d08d2-088b-4850-96e2-d0c12e1605eb", "metadata": {}, "source": [ - "### Array input data\n", + "### Neural network input array data\n", + "\n", + "Neural network input array data should be provided in the PEtab SciML HDF5-array format, with the following structure:\n", + "\n", + "```text\n", + "arrays.hdf5\n", + "├── metadata\n", + "│ └── pytorch_format\n", + "└── inputs\n", + " └── inputId\n", + " ├── conditionId1\n", + " └── conditionId2\n", + "```\n", "\n", - "The PEtab problem YAML file, in the `petab_sciml` section of the `extensions` field, specifies two files of array data to be included in the problem:\n", - "- ``net3_ps.hdf5`` to set the values of the network parameters\n", - "- ``net3_input2.hdf5`` to provide the inputs into the network" + "Here, `inputId` links a PEtab identifier to condition-specific array data. For our working example, `inputId` is `input0` as defined in the mapping table, and `pytorch_format = true`, as the arrays follow PyTorch indexing conventions. Assuming two PEtab conditions, `cond1` and `cond2`, input data in this format can be generated using helper functions from the `petab_sciml` library:" ] }, { - "cell_type": "markdown", - "id": "7202b239-6e80-4a19-bde7-b35e809dc3bb", + "cell_type": "code", + "execution_count": 5, + "id": "5f58ec2d", "metadata": {}, + "outputs": [], "source": [ - "```yaml\n", - "...\n", - "extensions:\n", - " sciml:\n", - " array_files:\n", - " - \"net3_ps.hdf5\"\n", - " - \"net3_input2.hdf5\"\n", - " hybridization_files:\n", - " - \"hybridization.tsv\"\n", - " neural_nets:\n", - " net3:\n", - " location: \"net3.yaml\"\n", - " static: true\n", - " format: \"YAML\"\n", - "```" + "import numpy as np\n", + "from petab_sciml.standard.array_data import ArrayData, ArrayDataStandard\n", + "\n", + "# Generate random condition-specific neural network input arrays\n", + "# for illustration\n", + "input_cond1 = np.random.rand(1, 3, 10, 10)\n", + "input_cond2 = np.random.rand(1, 3, 10, 10)\n", + "\n", + "# Define the array data using the PEtab SciML HDF5-array structure.\n", + "data = {\n", + " \"metadata\": {\"pytorch_format\": True},\n", + " \"inputs\": {\"inputId1\": {\"cond1\": input_cond1, \"cond2\": input_cond2}},\n", + "}\n", + "\n", + "# Export the data to PEtab SciML HDF5 format\n", + "array_data = ArrayData.model_validate(data)\n", + "ArrayDataStandard.save_data(array_data, filename=\"input_data.hdf5\")" ] }, { "cell_type": "markdown", - "id": "3393b6c1-b193-4f64-8269-838c054bce97", + "id": "fca2b55e", "metadata": {}, "source": [ - "The nested structure of the neural network input file can be seen below. The input data to the neural network is under ``inputs/input0`` and data for two conditions is specified. The keys for the different conditions match those defined in the conditions table." + "When defining condition-specific inputs, the corresponding conditions must also be defined in the condition table and linked to experiments table. In our example, valid condition and experiment tables are:" ] }, { "cell_type": "code", - "execution_count": 7, - "id": "87200584-39ec-4bb2-a4ee-919065598b65", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " inputs\n", - " input0\n", - " cond1\n", - " cond2\n", - " metadata\n", - " perm\n" - ] - } - ], - "source": [ - "import h5py\n", - "\n", - "# Convenience function to show the nested structure of an HDF5 file\n", - "def show_h5_struct(file):\n", - " file.visit(lambda x: print(\" \" * (len(x.split(\"/\")) - 1), x.split(\"/\")[-1]))\n", - "\n", - "file = h5py.File(\"net3_input2.hdf5\")\n", - "show_h5_struct(file)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, + "execution_count": 6, "id": "78615c62-63c7-4b53-ab69-1a0cff91482d", "metadata": {}, "outputs": [ @@ -450,7 +452,7 @@ "Index: [cond1, cond2]" ] }, - "execution_count": 9, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -459,14 +461,43 @@ "petab_problem.condition_df" ] }, + { + "cell_type": "code", + "execution_count": 7, + "id": "c5189fc8", + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: When AMICI updated, add experiment table" + ] + }, { "cell_type": "markdown", - "id": "c26425d8-2b07-4262-9bbe-fb67f7dc24d0", + "id": "f03e00fd", "metadata": {}, "source": [ - "
\n", - "Also note the ``static: true`` setting in the neural network definition of the extensions config. This means the input to the neural network is not expected to depend on the model. This keyword will indicate to PEtab SciML importers that the network precedes the ODE, as opposed to being inside it (i.e. a UDE) or in one of the observable formulae.\n", - "
\n" + "### YAML problem file\n", + "\n", + "For problems with array-based inputs, `problem.yaml` should list the relevant array files (for both network inputs and network parameters). Since the NN is evaluated pre-initialization in this example, `pre_initialization` must be set to `true`. For this problem, the PEtab SciML extension section therefore is:\n", + "\n", + "```yaml\n", + "extensions:\n", + " sciml:\n", + " version: 0.1.0\n", + " required: true\n", + " neural_networks:\n", + " net1:\n", + " location: \"net3.yaml\"\n", + " pre_initialization: true\n", + " format: \"YAML\"\n", + " array_files:\n", + " - \"nn_parameters.hdf5\" # Parameter values (optional as parameters set to estimate)\n", + " - \"input_data.hdf5\" # Array Input data\n", + " hybridization_files:\n", + " - \"hybridization.tsv\"\n", + "```\n", + "\n", + "Providing an array file for NN parameters is optional when the network parameters are set to be estimated (as here). It is included to illustrate that multiple array files can be listed under `array_files` (e.g. one for parameters and one for condition-specific input data)." ] }, { @@ -474,14 +505,14 @@ "id": "c6a998ed-e561-4025-8572-36d7d1eb350a", "metadata": {}, "source": [ - "### Network Architecture\n", + "### Neural network architecture\n", "\n", - "The PyTorch snippet below shows how a network architecture would be defined and exported to YAML format using PEtab SciML. The predefined [YAML file](./net3.yaml) is also provided in the PEtab SciML repo for completeness. We have used a convolution architecture here to indicate how the DMM problem set up could enable inclusion of information from high dimensional inputs in the mechanistic model. " + "For completeness, the snippet below shows how the NN architecture used in this problem is defined and exported. A convolutional architecture was used to illustrate how PEtab SciML can map high-dimensional data to a mechanistic model." ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 8, "id": "743ed691-5ceb-41b8-9255-6576bfcc3996", "metadata": {}, "outputs": [], @@ -491,6 +522,7 @@ "from torch import nn\n", "import torch.nn.functional as F\n", "\n", + "\n", "class NeuralNetwork(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", @@ -507,13 +539,12 @@ " x = F.relu(x)\n", " return x\n", "\n", + "\n", "net1 = NeuralNetwork()\n", "nn_model1 = NNModel.from_pytorch_module(\n", " module=net1, nn_model_id=\"net3\", inputs=[Input(input_id=\"input0\")]\n", ")\n", - "NNModelStandard.save_data(\n", - " data=nn_model1, filename=\"net3_from_pytorch.yaml\"\n", - ")" + "NNModelStandard.save_data(data=nn_model1, filename=\"net3_from_pytorch.yaml\")" ] }, { @@ -521,35 +552,46 @@ "id": "8b04d5fe-42f0-443b-b02d-38832b6523f6", "metadata": {}, "source": [ - "### Parameterizing network inputs\n", + "## Parameterizing network inputs\n", + "\n", + "As an alternative to array files, NN input values can be provided either in the parameter table or in the PEtab v2 condition table when `pre_initialization = true`.\n", + "\n", + "### Option 1: Parameter table\n", + "\n", + "Scalar NN inputs can be declared as fixed parameters in the parameter table (here `estimate = 0` means they are not fitted):\n", + "\n", + "| parameterId | lowerBound | upperBound | nominalValue | estimate |\n", + "|------------------|------------|------------|--------------|----------|\n", + "| net1_input00 | 0.0 | 10.0 | 1.0 | 0 |\n", + "| net1_input01 | 0.0 | 10.0 | 1.0 | 0 |\n", + "\n", + "The mapping table links these PEtab identifiers to the network input entries:\n", "\n", - "A straightforward alternative to defining network inputs as array data in files, is to define them in the parameters table, with appropriate bounds and nominal values. \n", + "| petabEntityId | modelEntityId |\n", + "|--------------|-------------------|\n", + "| net1_input00 | net1.inputs[0][0] |\n", + "| net1_input01 | net1.inputs[0][1] |\n", "\n", - "| parameterId | parameterScale | lowerBound | upperBound | nominalValue | estimate |\n", - "|-----------------|----------------|------------|------------|--------------|----------|\n", - "| net1_input_pre1 | lin | -inf | inf | 1 | 0 |\n", - "| net1_input_pre2 | lin | -inf | inf | 1 | 0 |\n", - "| | | | | | |\n", + "When provided via the parameter table, input values apply across all PEtab conditions.\n", "\n", - "The network inputs can be defined as condition specific, with different scalar values for different conditions.\n", + "### Option 2: Condition table\n", "\n", - "| conditionId | net1_input1 | net1_input2 |\n", - "|-----------------|-------------------|-------------------|\n", - "| cond1 | 10.0 | 20.0 |\n", - "| cond2 | net1_input_pre1 | net1_input_pre2 |\n", + "Scalar NN inputs can also be assigned per condition in the PEtab v2 condition table:\n", "\n", - "The corresponding mapping table then defines the model entities for condition PEtab identifiers.\n", + "| conditionId | targetId | targetValue |\n", + "|------------|-------------|------------:|\n", + "| cond1 | net1_input00 | 10.0 |\n", + "| cond1 | net1_input01 | 20.0 |\n", + "| cond2 | net1_input00 | 1.0 |\n", + "| cond2 | net1_input01 | 1.0 |\n", "\n", - "| petabEntityId | modelEntityId |\n", - "|-----------------|-------------------|\n", - "| net1_input1 | net1.inputs[0][0] |\n", - "| net1_input2 | net1.inputs[0][1] |" + "The mapping table (same as above) connects `net1_input00`/`net1_input01` to the network input entries via `modelEntityId`.\n" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -563,7 +605,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.3" + "version": "3.13.13" } }, "nbformat": 4, diff --git a/doc/examples/how_to_dmms/pre_init_hybrid.png b/doc/examples/how_to_dmms/pre_init_hybrid.png new file mode 100644 index 0000000..87cc12d Binary files /dev/null and b/doc/examples/how_to_dmms/pre_init_hybrid.png differ diff --git a/doc/examples/how_to_dmms/pre_init_hybrid_dark.png b/doc/examples/how_to_dmms/pre_init_hybrid_dark.png new file mode 100644 index 0000000..2d6b588 Binary files /dev/null and b/doc/examples/how_to_dmms/pre_init_hybrid_dark.png differ diff --git a/doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb b/doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb index 262979d..8d4c31d 100644 --- a/doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb +++ b/doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb @@ -5,33 +5,74 @@ "id": "85e61efa-c382-4fcd-a649-19aff93f7309", "metadata": {}, "source": [ - "# How To: Neural ODE" - ] - }, - { - "cell_type": "markdown", - "id": "e2ae1512-526e-4287-82a8-f95234bc5f84", - "metadata": {}, - "source": [ - "In this guide we will cover how to set up a Neural ODE problem using the PEtab SciML format and utility functions from the `petab_sciml` Python library. We assume some familiarity with the getting started tutorial, which examines an entire PEtab SciML problem, while this guide focuses on the parts that are relevant to the Neural ODE use case. \n", + "# Neural ODE\n", + "\n", + "Neural ODEs are models where the full ODE right-hand side is represented by a neural network (NN).\n", "\n", - "In the Neural ODE case, the whole right-hand-side of the ODE model is replaced with a neural network. Taking the Lotka-Volterra system as an example,\n", + "This tutorial shows how to set up a Neural ODE PEtab SciML problem using the utility functions in the `petab_sciml` Python library and [AMICI](https://amici.readthedocs.io/en/latest/index.html). It assumes familiarity with the [SciML starter tutorial](https://petab-sciml.readthedocs.io/latest/examples/getting_started/getting_started.html). The tutorial is provided as a notebook, available [here](https://github.com/PEtab-dev/petab_sciml/blob/main/doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb), and the corresponding PEtab SciML problem files can be downloaded [here](https://github.com/PEtab-dev/petab_sciml/tree/main/doc/examples/how_to_neural_ode).\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\beta \\cdot \\text{prey} \\cdot \\text{predator}$$\n", + "
\n", + " \n", + " \n", + "
\n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\gamma \\cdot \\text{prey} \\cdot \\text{predator} - \\delta \\cdot \\text{predator}$$\n", "\n", - "where the observable models, which link measurement data to the model output, are defined as,\n", + "## Input problem\n", "\n", - "$$y_{\\text{prey}} = \\text{prey} + \\epsilon$$\n", + "As a working example, the Lotka–Volterra system is considered:\n", "\n", - "$$y_{\\text{predator}} = \\text{predator} + \\epsilon$$\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\text{alpha} \\cdot \\text{prey}\n", + "- \\text{beta} \\cdot \\text{prey} \\cdot \\text{predator}, \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\text{gamma} \\cdot \\text{prey} \\cdot \\text{predator}\n", + "- \\text{delta} \\cdot \\text{predator}.\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "The observables (linking measurement data to model output) are defined as:\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "y_{\\text{prey}} &= \\text{prey} + \\epsilon, \\\\\n", + "y_{\\text{predator}} &= \\text{predator} + \\epsilon\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "where $\\epsilon \\sim \\mathcal{N}(0, \\sigma^2)$ is normally distributed measurement noise. A Neural ODE is created by replacing the full ODE right-hand side with a NN:\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\mathrm{NN}(\\text{prey}, \\text{predator}; \\text{theta})[0], \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\mathrm{NN}(\\text{prey}, \\text{predator}; \\text{theta})[1].\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "Measurements of both `prey` and `predator` are available. The goal of this tutorial is to set up a PEtab SciML problem for estimating the NN parameters (`theta`)." + ] + }, + { + "cell_type": "markdown", + "id": "96466651", + "metadata": {}, + "source": [ + "## Creating a Neural ODE problem\n", "\n", - "The problem can be configured as a Neural ODE with,\n", + "A Neural ODE PEtab SciML problem is created by:\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\text{NN}(\\text{prey}, \\text{predator})[0]$$\n", + "1. Defining the neural-network architecture.\n", + "2. Defining the corresponding model file.\n", + "3. Defining the remaining PEtab tables. \n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\text{NN}(\\text{prey}, \\text{predator})[1]$$" + "The `petab_sciml` library provides helper functions for these steps as showcased below." ] }, { @@ -39,7 +80,9 @@ "id": "a706dd6a-98c8-493b-b73e-a4173af450cd", "metadata": {}, "source": [ - "## Defining the network architecture" + "### Defining the neural network file\n", + "\n", + "The first step is to define the neural-network architecture, which should be provided as a PEtab SciML YAML file. As described in the getting started tutorial, this file is easily generated by defining the network as a PyTorch module and exporting it. In this example, a simple feed-forward network is used:" ] }, { @@ -51,17 +94,21 @@ "source": [ "from petab_sciml.standard.nn_model import Input, NNModel, NNModelStandard\n", "import torch\n", - "from torch import nn\n", "import torch.nn.functional as F\n", "\n", - "class NeuralNetwork(nn.Module):\n", + "\n", + "class NeuralNetwork(torch.nn.Module):\n", + " \"\"\"Feed-forward network for the Neural ODE example (2 inputs -> 2 outputs).\"\"\"\n", + "\n", " def __init__(self):\n", + " \"\"\"Define all layers (required so the exporter can discover parameters).\"\"\"\n", " super().__init__()\n", " self.layer1 = torch.nn.Linear(2, 10)\n", " self.layer2 = torch.nn.Linear(10, 10)\n", " self.layer3 = torch.nn.Linear(10, 2)\n", "\n", " def forward(self, net_input):\n", + " \"\"\"Forward pass (required): maps [prey, predator] -> RHS components.\"\"\"\n", " x = self.layer1(net_input)\n", " x = F.tanh(x)\n", " x = self.layer2(x)\n", @@ -69,21 +116,12 @@ " x = self.layer3(x)\n", " return x\n", "\n", + "\n", "net1 = NeuralNetwork()\n", "nn_model1 = NNModel.from_pytorch_module(\n", " module=net1, nn_model_id=\"net1\", inputs=[Input(input_id=\"input0\")]\n", ")\n", - "NNModelStandard.save_data(\n", - " data=nn_model1, filename=\"net1.yaml\"\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "320f12b5-8b91-42c4-ab26-0e9977746e47", - "metadata": {}, - "source": [ - "The network architecture in this example is kept simple for demonstration purposes. Refer to the page on supported layers and activation functions for more inspiration, but note that PEtab SciML and its importers currently only support Neural ODEs where the neural network has vector inputs and outputs. " + "NNModelStandard.save_data(data=nn_model1, filename=\"net1.yaml\")" ] }, { @@ -91,21 +129,23 @@ "id": "56e1044a-da75-4766-a063-db803c64962b", "metadata": {}, "source": [ - "## Generating the PEtab files\n", + "### Generating the PEtab SciML files\n", + "\n", + "Given a neural-network YAML file, the `petab_sciml` Python library provides helper functions to generate the PEtab SciML files for a Neural ODE.\n", "\n", - "The PEtab SciML Python package provides utility functions to generate the model and PEtab files for a neural ODE case. The names of the species in the ODE system are required to generate the model. The utility functions will generate hybridization, mapping and parameter files." + "To generate the SBML model file, the species names in the ODE system must be provided. Given this, the helper functions generate the hybridization, mapping, and parameters tables:" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "id": "47c89dd2-822d-465c-bc15-3406aa91a59f", "metadata": {}, "outputs": [], "source": [ "from petab_sciml.problem_utils.neural_ode import (\n", - " create_neural_ode, \n", - " create_neural_ode_problem\n", + " create_neural_ode,\n", + " create_neural_ode_problem,\n", ")\n", "\n", "create_neural_ode([\"prey\", \"predator\"], model_filename=\"lv.xml\")" @@ -116,22 +156,22 @@ "id": "ee0a9cd6-ce92-4e35-a89e-ee820b90b292", "metadata": {}, "source": [ - "In order to completely define the PEtab problem, the [measurement](./measurements.tsv), [observable](./observables.tsv) and [array input files](./net1_ps.hdf5) need to be supplied by the user. There is then a utility function to generate the `problem.yaml` file and reference all the PEtab files in it. Example files are included in the docs as a demonstration." + "To completely define the PEtab problem, the user must provide the [measurements](./measurements.tsv), [observables](./observables.tsv), and the network parameter array file ([net1_ps.hdf5](./net1_ps.hdf5)). A helper function can then generate `problem.yaml` and reference all PEtab and PEtab SciML files. For reference, all files are provided in the [directory for this example](https://github.com/PEtab-dev/petab_sciml/tree/main/doc/examples/how_to_neural_ode):" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 3, "id": "1e6b4ea0-2e9f-48d5-9205-3c6bb4ca6c24", "metadata": {}, "outputs": [], "source": [ "create_neural_ode_problem(\n", - " \"lv.xml\", \n", - " \"measurements.tsv\", \n", - " \"observables.tsv\", \n", - " \"net1.yaml\", \n", - " [\"net1_ps.hdf5\"]\n", + " model_filename=\"lv.xml\",\n", + " measurements_filename=\"measurements.tsv\",\n", + " observables_filename=\"observables.tsv\",\n", + " network_filename=\"net1.yaml\",\n", + " array_filenames=[\"net1_ps.hdf5\"],\n", ")" ] }, @@ -140,7 +180,9 @@ "id": "d6af94a7-08fe-456b-a91b-6f326f2e286e", "metadata": {}, "source": [ - "## Loading the PEtab problem" + "## Loading the PEtab problem\n", + "\n", + "After generating the PEtab SciML files, the problem can be loaded:" ] }, { @@ -155,10 +197,7 @@ "outputs": [], "source": [ "from amici.petab import import_petab_problem\n", - "from amici.jax import (\n", - " JAXProblem,\n", - " run_simulations,\n", - ")\n", + "from amici.jax import JAXProblem\n", "from petab.v2 import Problem\n", "\n", "# Load the PEtab problem information from disk.\n", @@ -166,13 +205,10 @@ "\n", "# Create a simulator for the ODE and NN models.\n", "jax_model = import_petab_problem(\n", - " petab_problem,\n", - " model_output_dir=\"model\",\n", - " compile_=True,\n", - " jax=True\n", + " petab_problem, model_output_dir=\"model\", compile_=True, jax=True\n", ")\n", "\n", - "# Create a JAXProblem to handle addition simulation information \n", + "# Create a JAXProblem to handle addition simulation information\n", "# (e.g. simulate multiple conditions).\n", "jax_problem = JAXProblem(jax_model, petab_problem)" ] @@ -182,12 +218,12 @@ "id": "b8ab3aa8-a5dc-498e-9457-34e3e0d5bcec", "metadata": {}, "source": [ - "The hybridization and mapping tables show us how the neural network inputs and outputs are mapped to the model." + "The hybridization and mapping tables show how the NN is connected to the model:" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 5, "id": "c33ade41-69d5-4b7c-8e4c-6f06137c017a", "metadata": {}, "outputs": [ @@ -212,28 +248,29 @@ " \n", " \n", " \n", - " targetValue\n", - " \n", - " \n", " targetId\n", - " \n", + " targetValue\n", " \n", " \n", " \n", " \n", - " net1_input0\n", + " 0\n", + " net1_input0\n", " prey\n", " \n", " \n", - " net1_input1\n", + " 1\n", + " net1_input1\n", " predator\n", " \n", " \n", - " prey_param\n", + " 2\n", + " prey_param\n", " net1_output0\n", " \n", " \n", - " predator_param\n", + " 3\n", + " predator_param\n", " net1_output1\n", " \n", " \n", @@ -241,26 +278,29 @@ "" ], "text/plain": [ - " targetValue\n", - "targetId \n", - "net1_input0 prey\n", - "net1_input1 predator\n", - "prey_param net1_output0\n", - "predator_param net1_output1" + " targetId targetValue\n", + "0 net1_input0 prey\n", + "1 net1_input1 predator\n", + "2 prey_param net1_output0\n", + "3 predator_param net1_output1" ] }, - "execution_count": 9, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "jax_problem._petab_problem.hybridization_df" + "# TODO: Fix when AMICI is updated\n", + "import pandas as pd\n", + "\n", + "hybridization_df = pd.read_csv(\"hybridization.tsv\", sep=\"\\t\")\n", + "hybridization_df" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 6, "id": "0b42b898-b3a9-4163-9847-d40170f18835", "metadata": {}, "outputs": [ @@ -327,7 +367,7 @@ "net1_ps net1.parameters" ] }, - "execution_count": 10, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -341,24 +381,14 @@ "id": "736d0ff5-a4b3-427b-a2e4-ccd3516a60bf", "metadata": {}, "source": [ - "The input to the neural network is the state (the amounts of ``prey`` and ``predator``). The rate of change in the ``prey`` and ``predator`` is given respectively by the first and second output of the network." - ] - }, - { - "cell_type": "markdown", - "id": "c134e21c-1654-4ec8-a5ec-e3a3be337314", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "It is also worth showing that the parameter table only has the network parameters defined in it. Unlike previous examples, there are no other parameters to be estimated in the problem. This PEtab SciML problem specifies that the neural network parameters must be optimised, and that the outputs of the neural network will provide the time derivative of the solution." + "As seen in the hybridization table, the neural network inputs are the model states (`prey` and `predator`), and the two network outputs define the right-hand side of the ODE: the first output is used for $\\mathrm{d}(\\text{prey})/\\mathrm{d}t$ and the second output for $\\mathrm{d}(\\text{predator})/\\mathrm{d}t$.\n", + "\n", + "Inspecting the parameters table shows that, in contrast to the UDE example, it only contains neural-network parameters, since no mechanistic parameters are estimated:" ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 7, "id": "53d584e2-b3c5-485c-a648-7139a6f4c4ca", "metadata": {}, "outputs": [ @@ -417,7 +447,7 @@ "net1_ps lin -inf inf NaN 1" ] }, - "execution_count": 11, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -429,7 +459,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": ".getting_started (3.13.13)", "language": "python", "name": "python3" }, @@ -443,7 +473,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.3" + "version": "3.13.13" } }, "nbformat": 4, diff --git a/doc/examples/how_to_neural_ode/sim_hybrid.png b/doc/examples/how_to_neural_ode/sim_hybrid.png new file mode 100644 index 0000000..ae6ef41 Binary files /dev/null and b/doc/examples/how_to_neural_ode/sim_hybrid.png differ diff --git a/doc/examples/how_to_neural_ode/sim_hybrid_dark.png b/doc/examples/how_to_neural_ode/sim_hybrid_dark.png new file mode 100644 index 0000000..dfbf3d4 Binary files /dev/null and b/doc/examples/how_to_neural_ode/sim_hybrid_dark.png differ diff --git a/doc/examples/how_to_observable/how_to_observable.ipynb b/doc/examples/how_to_observable/how_to_observable.ipynb index dac32eb..b2d54f5 100644 --- a/doc/examples/how_to_observable/how_to_observable.ipynb +++ b/doc/examples/how_to_observable/how_to_observable.ipynb @@ -5,25 +5,54 @@ "id": "81cd17cb-2d7f-4d74-a63f-e6ff16ec8319", "metadata": {}, "source": [ - "# How To: Machine Learning models in the observables\n", + "# Machine-learning models in observables\n", "\n", - "This guide covers how to include a machine learning (ML) model in the observable formula, which links the model output to the observed measurement data. We assume some familiarity with the getting started tutorial, which examines an entire PEtab SciML problem, while this guide focuses on the parts that are relevant to the observable use case. As a case study we will use the Lotka-Volterra ODE system:\n", + "Sometimes mechanistic models can be misspecified, or the mapping from model states to measurements may be only partially known. Both scenarios can be addressed by augmenting the observable formula in the PEtab problem with a neural network (NN).\n", "\n", - "$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\beta \\cdot \\text{prey} \\cdot \\text{predator}$$\n", + "This tutorial shows how to include an ML model in an observable formula using [AMICI](https://amici.readthedocs.io/en/latest/index.html) and focuses on the PEtab SciML files specific to this hybridization. It assumes familiarity with the [SciML starter tutorial](https://petab-sciml.readthedocs.io/latest/examples/getting_started/getting_started.html). This tutorial is also provided as a notebook, available [here](https://github.com/PEtab-dev/petab_sciml/blob/main/doc/examples/how_to_observable/how_to_observable.ipynb), and the corresponding PEtab SciML problem files can be downloaded [here](https://github.com/PEtab-dev/petab_sciml/tree/main/doc/examples/how_to_observable).\n", "\n", - "$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\gamma \\cdot \\text{prey} \\cdot \\text{predator} - \\delta \\cdot \\text{predator}$$\n", + "
\n", + " \n", + " \n", + "
\n", "\n", - "The observable models are defined as,\n", + "## Input problem\n", "\n", - "$$y_{\\text{prey}} = \\text{prey} + \\epsilon$$\n", + "As a working example, the Lotka–Volterra system is considered:\n", "\n", - "$$y_{\\text{predator}} = \\text{predator} + \\epsilon$$\n", + "$$\n", + "\\begin{aligned}\n", + "\\frac{\\mathrm{d}\\,\\text{prey}}{\\mathrm{d}t} &= \\text{alpha} \\cdot \\text{prey}\n", + "- \\text{beta} \\cdot \\text{prey} \\cdot \\text{predator}, \\\\\n", + "\\frac{\\mathrm{d}\\,\\text{predator}}{\\mathrm{d}t} &= \\text{gamma} \\cdot \\text{prey} \\cdot \\text{predator}\n", + "- \\text{delta} \\cdot \\text{predator}.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "There are two observables in this system, `prey` and `predator`, which will have measurements associated with them. The species in these expressions are not substituted in the SBML model file, but instead the inclusion of the ML model is accomplished by correctly formulating the observable, hybridization and mapping PEtab tables. The observable model for `prey` is formulated as,\n", + "With observation model:\n", "\n", - "$$y_{\\text{prey}} = \\text{NN}(\\text{prey}, \\text{predator})[0] + \\epsilon$$\n", + "$$\n", + "\\begin{aligned}\n", + "y_{\\text{prey}} &= \\text{prey} + \\epsilon, \\\\\n", + "y_{\\text{predator}} &= \\text{predator} + \\epsilon.\n", + "\\end{aligned}\n", + "$$\n", "\n", - "The `predator` observable model is unchanged in this example." + "where $\\epsilon \\sim \\mathcal{N}(0, \\sigma^2)$ is normally distributed measurement noise. To create the observable hybridization, the `prey` observable is replaced by a NN that takes the states `prey` and `predator` as input:\n", + "\n", + "$$\n", + "y_{\\text{prey}} = \\mathrm{NN}(\\text{prey}, \\text{predator}; \\text{theta})[0] + \\epsilon.\n", + "$$\n", + "\n", + "The `predator` observable is left unchanged. The goal of this tutorial is to set up a PEtab SciML problem for estimating both mechanistic parameters (`alpha`, `beta`, `gamma`, `delta`) and NN parameters (`theta`)." ] }, { @@ -31,15 +60,15 @@ "id": "51c1a0d5-102f-46a4-821d-28f58bc4be48", "metadata": {}, "source": [ - "## Loading the PEtab problem" - ] - }, - { - "cell_type": "markdown", - "id": "ad529fe5-a91b-41b3-af22-6a8b855c50cf", - "metadata": {}, - "source": [ - "Let's load the PEtab problem so that we can examine the contents of the relevant PEtab tables." + "## Defining ML models in observable formulas\n", + "\n", + "An ML model is used in an observable formula by \n", + "\n", + "1. Mapping the NN output to a PEtab identifier in the mapping table. \n", + "2. Referencing that mapped output in the observables table\n", + "3. Specifying the NN inputs in the hybridization table.\n", + "\n", + "To make this concrete, let's load the PEtab problem and inspect the relevant tables:" ] }, { @@ -50,10 +79,7 @@ "outputs": [], "source": [ "from amici.petab import import_petab_problem\n", - "from amici.jax import (\n", - " JAXProblem,\n", - " run_simulations,\n", - ")\n", + "from amici.jax import JAXProblem\n", "from petab.v2 import Problem\n", "\n", "# Load the PEtab problem information from disk.\n", @@ -61,21 +87,26 @@ "\n", "# Create a simulator for the ODE and NN models.\n", "jax_model = import_petab_problem(\n", - " petab_problem,\n", - " model_output_dir=\"model\",\n", - " compile_=True,\n", - " jax=True\n", + " petab_problem, model_output_dir=\"model\", compile_=True, jax=True\n", ")\n", "\n", - "# Create a JAXProblem to handle addition simulation information \n", + "# Create a JAXProblem to handle addition simulation information\n", "# (e.g. simulate multiple conditions).\n", "jax_problem = JAXProblem(jax_model, petab_problem)" ] }, + { + "cell_type": "markdown", + "id": "b19dbdc0", + "metadata": {}, + "source": [ + "The first step is to map the NN output to a valid PEtab identifier in the mapping table:" + ] + }, { "cell_type": "code", "execution_count": 2, - "id": "d44cbd06-a3fe-4984-86dc-d5a48c7718c5", + "id": "a8a1f341", "metadata": {}, "outputs": [ { @@ -99,48 +130,41 @@ " \n", " \n", " \n", - " observableFormula\n", - " noiseFormula\n", - " observableTransformation\n", - " noiseDistribution\n", + " modelEntityId\n", " \n", " \n", - " observableId\n", - " \n", - " \n", - " \n", + " petabEntityId\n", " \n", " \n", " \n", " \n", " \n", - " prey_o\n", - " net1_output1\n", - " 0.05\n", - " lin\n", - " normal\n", + " net1_input1\n", + " net1.inputs[0][0]\n", " \n", " \n", - " predator_o\n", - " predator\n", - " 0.05\n", - " lin\n", - " normal\n", + " net1_input2\n", + " net1.inputs[0][1]\n", + " \n", + " \n", + " net1_output1\n", + " net1.outputs[0][0]\n", + " \n", + " \n", + " net1_ps\n", + " net1.parameters\n", " \n", " \n", "\n", "" ], "text/plain": [ - " observableFormula noiseFormula observableTransformation \\\n", - "observableId \n", - "prey_o net1_output1 0.05 lin \n", - "predator_o predator 0.05 lin \n", - "\n", - " noiseDistribution \n", - "observableId \n", - "prey_o normal \n", - "predator_o normal " + " modelEntityId\n", + "petabEntityId \n", + "net1_input1 net1.inputs[0][0]\n", + "net1_input2 net1.inputs[0][1]\n", + "net1_output1 net1.outputs[0][0]\n", + "net1_ps net1.parameters" ] }, "execution_count": 2, @@ -149,21 +173,21 @@ } ], "source": [ - "petab_problem.observable_df" + "petab_problem.mapping_df" ] }, { "cell_type": "markdown", - "id": "618ca834-e138-4189-bf93-16c46c5daa7a", + "id": "ca8286a6", "metadata": {}, "source": [ - "We can see here that the formula for the ``prey`` observable is defined by a PEtab identifier indicating a network output. The model definition of that PEtab id can be found in the mapping table. It is given by the first output of the neural network." + "The mapped output (`net1_output`) can then be referenced directly in the `observableFormula` column of the observables table:" ] }, { "cell_type": "code", "execution_count": 3, - "id": "217954a4-3488-4f23-9a8e-15ca29ca5a08", + "id": "d44cbd06-a3fe-4984-86dc-d5a48c7718c5", "metadata": {}, "outputs": [ { @@ -187,41 +211,48 @@ " \n", " \n", " \n", - " modelEntityId\n", + " observableFormula\n", + " noiseFormula\n", + " observableTransformation\n", + " noiseDistribution\n", " \n", " \n", - " petabEntityId\n", + " observableId\n", + " \n", + " \n", + " \n", " \n", " \n", " \n", " \n", " \n", - " net1_input1\n", - " net1.inputs[0][0]\n", - " \n", - " \n", - " net1_input2\n", - " net1.inputs[0][1]\n", - " \n", - " \n", - " net1_output1\n", - " net1.outputs[0][0]\n", + " prey_o\n", + " net1_output1\n", + " 0.05\n", + " lin\n", + " normal\n", " \n", " \n", - " net1_ps\n", - " net1.parameters\n", + " predator_o\n", + " predator\n", + " 0.05\n", + " lin\n", + " normal\n", " \n", " \n", "\n", "" ], "text/plain": [ - " modelEntityId\n", - "petabEntityId \n", - "net1_input1 net1.inputs[0][0]\n", - "net1_input2 net1.inputs[0][1]\n", - "net1_output1 net1.outputs[0][0]\n", - "net1_ps net1.parameters" + " observableFormula noiseFormula observableTransformation \\\n", + "observableId \n", + "prey_o net1_output1 0.05 lin \n", + "predator_o predator 0.05 lin \n", + "\n", + " noiseDistribution \n", + "observableId \n", + "prey_o normal \n", + "predator_o normal " ] }, "execution_count": 3, @@ -230,21 +261,23 @@ } ], "source": [ - "petab_problem.mapping_df" + "petab_problem.observable_df" ] }, { "cell_type": "markdown", - "id": "8b8e370f-49ea-40e3-93b2-b3c7c4a74f48", + "id": "618ca834-e138-4189-bf93-16c46c5daa7a", "metadata": {}, "source": [ - "Wherever the problem formulation refers to the prey observable (``prey_o``) the neural network will be evaluated and its output used in place of that observable value, for instance, in the measurements table." + "Since the `prey` observable formula is defined by a PEtab identifier that represents a network output, it evaluates to the corresponding network value at each time point. While a simple formula is used here, more complex expressions are supported, for example `net1_output^2 + 2` or `net1_output + prey`.\n", + "\n", + "In addition to the output, the network inputs must be specified. This is done in the hybridization table, where the input identifiers defined in the mapping table are assigned model quantities, here the states `prey` and `predator`:" ] }, { "cell_type": "code", "execution_count": 4, - "id": "e1fac571-1ad7-4acc-9fb6-c43b06e6d53d", + "id": "217954a4-3488-4f23-9a8e-15ca29ca5a08", "metadata": {}, "outputs": [ { @@ -268,179 +301,29 @@ " \n", " \n", " \n", - " observableId\n", - " simulationConditionId\n", - " measurement\n", - " time\n", + " targetId\n", + " targetValue\n", " \n", " \n", " \n", " \n", " 0\n", - " prey_o\n", - " cond1\n", - " 0.173017\n", - " 1.0\n", + " net1_input1\n", + " prey\n", " \n", " \n", " 1\n", - " prey_o\n", - " cond1\n", - " 0.489177\n", - " 2.0\n", - " \n", - " \n", - " 2\n", - " prey_o\n", - " cond1\n", - " 1.643996\n", - " 3.0\n", - " \n", - " \n", - " 3\n", - " prey_o\n", - " cond1\n", - " 5.451963\n", - " 4.0\n", - " \n", - " \n", - " 4\n", - " prey_o\n", - " cond1\n", - " 2.977522\n", - " 5.0\n", - " \n", - " \n", - " 5\n", - " prey_o\n", - " cond1\n", - " 0.181663\n", - " 6.0\n", - " \n", - " \n", - " 6\n", - " prey_o\n", - " cond1\n", - " 0.348112\n", - " 7.0\n", - " \n", - " \n", - " 7\n", - " prey_o\n", - " cond1\n", - " 0.937919\n", - " 8.0\n", - " \n", - " \n", - " 8\n", - " prey_o\n", - " cond1\n", - " 3.113240\n", - " 9.0\n", - " \n", - " \n", - " 9\n", - " prey_o\n", - " cond1\n", - " 8.863933\n", - " 10.0\n", - " \n", - " \n", - " 10\n", - " predator_o\n", - " cond1\n", - " 0.847416\n", - " 1.0\n", - " \n", - " \n", - " 11\n", - " predator_o\n", - " cond1\n", - " 0.211135\n", - " 2.0\n", - " \n", - " \n", - " 12\n", - " predator_o\n", - " cond1\n", - " -0.025054\n", - " 3.0\n", - " \n", - " \n", - " 13\n", - " predator_o\n", - " cond1\n", - " 0.125010\n", - " 4.0\n", - " \n", - " \n", - " 14\n", - " predator_o\n", - " cond1\n", - " 6.700455\n", - " 5.0\n", - " \n", - " \n", - " 15\n", - " predator_o\n", - " cond1\n", - " 2.007158\n", - " 6.0\n", - " \n", - " \n", - " 16\n", - " predator_o\n", - " cond1\n", - " 0.420092\n", - " 7.0\n", - " \n", - " \n", - " 17\n", - " predator_o\n", - " cond1\n", - " 0.048032\n", - " 8.0\n", - " \n", - " \n", - " 18\n", - " predator_o\n", - " cond1\n", - " 0.128669\n", - " 9.0\n", - " \n", - " \n", - " 19\n", - " predator_o\n", - " cond1\n", - " 1.192784\n", - " 10.0\n", + " net1_input2\n", + " predator\n", " \n", " \n", "\n", "" ], "text/plain": [ - " observableId simulationConditionId measurement time\n", - "0 prey_o cond1 0.173017 1.0\n", - "1 prey_o cond1 0.489177 2.0\n", - "2 prey_o cond1 1.643996 3.0\n", - "3 prey_o cond1 5.451963 4.0\n", - "4 prey_o cond1 2.977522 5.0\n", - "5 prey_o cond1 0.181663 6.0\n", - "6 prey_o cond1 0.348112 7.0\n", - "7 prey_o cond1 0.937919 8.0\n", - "8 prey_o cond1 3.113240 9.0\n", - "9 prey_o cond1 8.863933 10.0\n", - "10 predator_o cond1 0.847416 1.0\n", - "11 predator_o cond1 0.211135 2.0\n", - "12 predator_o cond1 -0.025054 3.0\n", - "13 predator_o cond1 0.125010 4.0\n", - "14 predator_o cond1 6.700455 5.0\n", - "15 predator_o cond1 2.007158 6.0\n", - "16 predator_o cond1 0.420092 7.0\n", - "17 predator_o cond1 0.048032 8.0\n", - "18 predator_o cond1 0.128669 9.0\n", - "19 predator_o cond1 1.192784 10.0" + " targetId targetValue\n", + "0 net1_input1 prey\n", + "1 net1_input2 predator" ] }, "execution_count": 4, @@ -449,21 +332,41 @@ } ], "source": [ - "petab_problem.measurement_df" + "# TODO: Fix when AMICI is updated\n", + "import pandas as pd\n", + "\n", + "hybridization_df = pd.read_csv(\"hybridization.tsv\", sep=\"\\t\")\n", + "hybridization_df" ] }, { "cell_type": "markdown", - "id": "261867c9-48a2-4310-ac0a-e52ae2012e85", + "id": "8b8e370f-49ea-40e3-93b2-b3c7c4a74f48", "metadata": {}, "source": [ - "It is worth noting that the PEtab standard supports [mathematical expressions](https://petab.readthedocs.io/en/latest/v2/documentation_data_format.html#math-expressions-syntax) in the observables table. For instance, instead of `net1_output1`, we could write `net1_output1^2 + 2` in its place and PEtab importers would duly perform that arithmetic. " + "Lastly, since the neural network is evaluated during simulation (its output depends on the ODE solution), it must be listed in the problem YAML with `pre_initialization = false`. Concretely, the SciML section of `problem.yaml` should look like:\n", + "\n", + "```yaml\n", + "extensions:\n", + " sciml:\n", + " version: 0.1.0\n", + " required: true\n", + " neural_networks:\n", + " net1:\n", + " location: \"net1.yaml\"\n", + " pre_initialization: false\n", + " format: \"YAML\"\n", + " array_files:\n", + " - \"net1_ps.hdf5\" # neural network parameters\n", + " hybridization_files:\n", + " - \"hybridization.tsv\"\n", + "```" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -477,7 +380,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.3" + "version": "3.13.13" } }, "nbformat": 4, diff --git a/doc/examples/how_to_observable/obs_hybrid.png b/doc/examples/how_to_observable/obs_hybrid.png new file mode 100644 index 0000000..25734cf Binary files /dev/null and b/doc/examples/how_to_observable/obs_hybrid.png differ diff --git a/doc/examples/how_to_observable/obs_hybrid_dark.png b/doc/examples/how_to_observable/obs_hybrid_dark.png new file mode 100644 index 0000000..7776bcf Binary files /dev/null and b/doc/examples/how_to_observable/obs_hybrid_dark.png differ diff --git a/doc/how_to_guides.rst b/doc/how_to_guides.rst index bf16565..f4455b7 100644 --- a/doc/how_to_guides.rst +++ b/doc/how_to_guides.rst @@ -7,6 +7,6 @@ hybridization scenarios supported by the format. .. toctree:: :maxdepth: 1 - Networks setting ODE parameters - Observable Formulae - Neural ODE + Neural ODEs + ML models in observables + Pre-initialization ML models diff --git a/doc/introduction.rst b/doc/introduction.rst index 1deef8c..e89cd32 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -13,6 +13,21 @@ models. support for it has been implemented in PEtab importers, though not yet released. Documentation and utility functions are currently being added. +.. container:: figure align-center + + .. image:: assets/hybrid_types.png + :alt: Hybrid model types + :class: only-light + + .. image:: assets/hybrid_types_dark.png + :alt: Hybrid model types + :class: only-dark + + .. container:: caption + + Hybridization patterns supported by PEtab SciML. + + Main features ------------- @@ -25,7 +40,7 @@ importable by downstream tools. The main features are: combined in three ways: (1) ML within the ODE dynamics (includes Neural ODEs), (2) ML in the observable/measurement model linking simulations to data, and (3) ML upstream of the ODE, mapping high-dimensional inputs - (e.g., images) to ODE model parameters. + (e.g., images) to ODE model parameters. See figure above. - **Import across ecosystems.** PEtab SciML problems can be imported into state-of-the-art toolboxes for dynamic-model training in Julia (`PEtab.jl `_) and Python/JAX @@ -63,10 +78,11 @@ How to read the documentation ----------------------------- If you are new to PEtab SciML, start with the -:doc:`Getting Started tutorial `. It is a prerequisite for the -How-to guides, which cover different model scenarios (e.g., Neural ODEs, ML -model upstream of the ODE). For a complete description of all options when -defining a SciML problem, see the :doc:`Format specification `. +:doc:`Getting started tutorial `. +It is a prerequisite for the How-to guides, which cover different model +scenarios (e.g., Neural ODEs, ML model upstream of the ODE). For a complete +description of all options when defining a SciML problem, see the +:doc:`Format specification `. Why a SciML data format? ------------------------