Skip to content

Update tutorials#79

Open
sebapersson wants to merge 10 commits into
mainfrom
update_tutorials
Open

Update tutorials#79
sebapersson wants to merge 10 commits into
mainfrom
update_tutorials

Conversation

@sebapersson
Copy link
Copy Markdown
Collaborator

@sebapersson sebapersson commented May 21, 2026

This PR updates the PEtab-SciML tutorials to:

  1. Ensure a consistent style across tutorials.
  2. Improve clarity and completeness (wording and a few additional details).
  3. Add examples for the HDF5 utility functions used to generate array-based input
    data.

The tutorial text is ready for review. The code snippets and accompanying PEtab files are
currently outdated and will be updated once the linter is in place. This can be done at a
later stage, and the PR should not be merged until then.

This PR is related to completing #23

@sebapersson sebapersson requested review from BSnelling and dilpath May 21, 2026 09:06
Comment thread doc/examples/getting_started/getting_started.ipynb Outdated
"$$\n",
"\n",
"$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\text{NN}(\\text{prey}, \\text{predator})[1] - \\delta \\cdot \\text{predator}$$"
"Measurements of both `prey` and `predator` are assumed. The goal of this tutorial is to set up a PEtab-SciML problem for estimating both mechanistic parameters (`alpha`, `delta`) and neural-network parameters (`theta`)."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be misleading to mention measurements here? It is the model state of prey and predator that are inputs to the neural network, rather than the measurement data.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to mention which data we use to estimate parameters from early. I made it a bit more explicit measurements on prey and predator are used for estimation:

Time-series measurements of the model states prey and predator are available. The goal of this tutorial is to set up a PEtab-SciML problem to estimate both mechanistic parameters (alpha, delta) and neural-network parameters (theta) from these measurements.

"$$\n",
"\n",
"$$\\frac{\\mathrm{d} \\text{predator}}{\\mathrm{d} t} = \\text{NN}(\\text{prey}, \\text{predator})[1] - \\delta \\cdot \\text{predator}$$"
"Measurements of both `prey` and `predator` are assumed. The goal of this tutorial is to set up a PEtab-SciML problem for estimating both mechanistic parameters (`alpha`, `delta`) and neural-network parameters (`theta`)."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline symbols for alpha, delta and theta might be nice, to clearly link them to the equations above.

Copy link
Copy Markdown
Collaborator Author

@sebapersson sebapersson May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit tricky. In the SBML model, and all tables we use alpha instead of $\alpha$. I think it is better to stay consistent, and therefore use alpha in the entire text (including equations).

I have updated the PR to this end, but happy to discuss this further :)

Comment thread doc/examples/getting_started/getting_started.ipynb Outdated
Comment thread doc/examples/getting_started/getting_started.ipynb Outdated
"# Machine-learning models in observables\n",
"\n",
"This guide covers how to include a machine learning (ML) model in the observable formula, which links the model output to the observed measurement data. We assume some familiarity with the getting started tutorial, which examines an entire PEtab SciML problem, while this guide focuses on the parts that are relevant to the observable use case. As a case study we will use the Lotka-Volterra ODE system:\n",
"Sometimes mechanistic models can be misspecified, or the mapping from model states to measurements may be only partially known. Both scenarios can be addressed by augmenting the observable formula in the PEtab problem with a neural network.\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we frame "misspecified" in a different way? Perhaps referencing that mechanistic models are by necessity coarse grained?

Copy link
Copy Markdown
Collaborator Author

@sebapersson sebapersson May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the misspecified here (but also, I am non-native speaker :), because the model can be wrong simply because we are wrong (very common), but it might also be to course-grained. So I think it captures most scenarios.

What more specifically might be problematic with misspecified?

Comment thread doc/examples/how_to_observable/how_to_observable.ipynb Outdated
Comment thread doc/examples/how_to_observable/how_to_observable.ipynb Outdated
Comment thread doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb Outdated
Comment thread doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb Outdated
Copy link
Copy Markdown
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some comments apply to all notebooks, e.g. naming of inputs/outputs in the mapping table.

Comment thread doc/examples/getting_started/getting_started.ipynb Outdated
Comment thread doc/examples/getting_started/getting_started.ipynb Outdated
"The environment and example PEtab files to run this notebook are provided in the PEtab SciML repo."
"This introductory tutorial shows how to set up a PEtab-SciML problem using [AMICI](https://amici.readthedocs.io/en/latest/index.html). It walks through the main PEtab-SciML problem files and focuses on creating a problem where a neural network enters the model dynamics, which is often called a universal differential equation (UDE) (also referred to as a grey-box model or hybrid Neural ODE). Familiarity with the PEtab v2 format is assumed; see the [PEtab tutorial](https://petab.readthedocs.io/en/latest/v2/tutorial/tutorial.html).\n",
"\n",
"The tutorial is provided as a notebook, available [here](https://github.com/PEtab-dev/petab_sciml/blob/main/doc/examples/getting_started/getting_started.ipynb), and the corresponding PEtab-SciML problem files can be downloaded [here](https://github.com/PEtab-dev/petab_sciml/tree/main/doc/examples/getting_started)."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be combined with the Environment section and simplified to e.g.

All files required to reproduce the results on this page are provided here. In particular, there is the Python 3 Jupyter notebook that generated this page, and the Python dependencies in requirements.txt.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this entire section will be dropped when AMICI is updated. So will leave this comment as reminder.

"PEtab v2 (and, by extension, PEtab-SciML) accepts dynamic models in common exchange formats (e.g. SBML, CellML, BioNetGen). In this tutorial, an SBML model is used since it is widely supported across PEtab-SciML importers.\n",
"\n",
"$$\\frac{\\mathrm{d} \\text{prey}}{\\mathrm{d} t} = \\alpha \\cdot \\text{prey} - \\beta$$\n",
"In PEtab-SciML, neural-network outputs are linked to the dynamic model by assigning them to parameters in the model file. Therefore, the parts of the equations to be learned must be represented as parameters. In this example, the interaction terms are replaced by the parameters `beta` and `gamma`, which are later mapped to the network outputs to form a UDE. Thus, the model file corresponds to:\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gamma is a prior distribution in PEtab v2. e.g. change to gamma_? beta also used to be an issue I think, because sympy can convert that to the beta function when imported with AMICI, so I usually use beta_ as well just to be safe, but I guess it's OK now...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting point I did not think about. But as alpha, beta, gamma and delta are the canonical parameters of this LV system, lets hope no problems arise with AMICI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was wrong. gamma is not a reserved keyword in PEtab v2 even though it's a prior distribution keyword, which makes sense because if gamma appears in the priorDistribution column then it's clear it's the distribution, and if it appears anywhere else then it's clear that it's the parameter.

So, not even an issue I think.

"metadata": {},
"source": [
"Note that where any specific network layers or parameters are referenced in the ``mapping.tsv``, it should refer to them by the layer ids in this file."
"Here, `nn_model_id` is the unique neural-network model ID, which is used throughout the PEtab-SciML problem to refer to this neural network (e.g. in the mapping table and problem yaml file)."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace neural-network, neural network, net, and network everywhere with NN? Just like we use ODE for the mechanistic model everywhere.

Comment thread doc/examples/how_to_dmms/how_to_dmms.ipynb Outdated
Comment on lines 550 to 555
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a v1 conditions table.

In v2 (and probably also v1), the targetId in a v2 condition table cannot appear as a parameterId in a v2 parameter table.

Hence, we could add a note than only one of the two options presented here is possible (either in the parameter table for all conditions, or in the condition table for condition-specific... or in the array data file).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, somehow I completely missed this section when updating the tutorials

Comment thread doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb Outdated
Comment thread doc/examples/how_to_neural_ode/how_to_neural_ode.ipynb Outdated
"Let's load the PEtab problem so that we can examine the contents of the relevant PEtab tables."
"## Defining ML models in observable formulas\n",
"\n",
"An ML model is used in an observable formula by (1) mapping the neural-network output to a PEtab identifier in the mapping table, (2) referencing that mapped output in the observables table, and (3) specifying the neural-network inputs in the hybridization table.\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or condition table?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for this kind of hybridization the input must be provided in the hybridization table (we do not want to change input equation depending on condition)

Co-authored-by: BSnelling <branwen.snelling@crick.ac.uk>
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.01%. Comparing base (c4d7e97) to head (5faa1ba).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #79   +/-   ##
=======================================
  Coverage   94.01%   94.01%           
=======================================
  Files           6        6           
  Lines         301      301           
=======================================
  Hits          283      283           
  Misses         18       18           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sebapersson
Copy link
Copy Markdown
Collaborator Author

Thanks for the feedback!

I have now implemented it. Lets now wait for AMICI to update, then we can update the code, and finally merge this PR.

Comment thread doc/examples/how_to_dmms/how_to_dmms.ipynb Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants