mrc-ide · absternator · Jun 26, 2026 · Jun 23, 2026 · Jun 23, 2026 · Jun 23, 2026
diff --git a/.coverage b/.coverage
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -0,0 +1,33 @@
+name: Publish Release to PyPI
+
+on:
+  push:
+    tags:
+      - 'v*.*.*'
+
+jobs:
+  run:
+    runs-on: ubuntu-latest
+    environment:
+      name: pypi
+    permissions:
+      id-token: write
+      contents: read
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v6
+      - name: Install uv
+        uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b
+        with:
+            enable-cache: true
+            version: "0.11.18"
+      - name: Set up Python
+        run: uv python install
+      - name: Build
+        run: uv build
+      # - name: Smoke test (wheel)
+        # run: uv run --isolated --no-project --with dist/*.whl tests/smoke_test.py
+      # - name: Smoke test (source distribution)
+        # run: uv run --isolated --no-project --with dist/*.tar.gz tests/smoke_test.py
+      - name: Publish
+        run: uv publish
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -0,0 +1,36 @@
+name: tests
+
+on:
+  push:
+    branches: ["**"]
+  pull_request:
+
+concurrency:
+  group: tests-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  test:
+    name: pytest (Python ${{ matrix.python-version }})
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          enable-cache: true
+          python-version: ${{ matrix.python-version }}
+
+      # Install core + dev only. This both runs the test suite and proves the
+      # core (inference) install works without the train/viz extras.
+      - name: Install project
+        run: uv sync --extra dev
+
+      - name: Run tests
+        run: uv run pytest
diff --git a/.gitignore b/.gitignore
@@ -40,3 +40,9 @@ uv.lock
 # Training outputs
 output/
 scripts/output/
+
+# Model training artifacts (regenerable; shipped copies live in src/estimint/data/)
+models/**/*.parquet
+models/**/*.pkl
+models/**/*.model
+models/**/plots/
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,4 +1,3 @@
 include README.md
-include requirements.txt
 recursive-include src/estimint/data *
 recursive-include src/estimint/inst *
diff --git a/README.md b/README.md
@@ -2,19 +2,44 @@
 
 Python port of the estiMINT R package for EIR (Entomological Inoculation Rate) estimation using machine learning.
 
+It estimates EIR from prevalence, converts between EIR and human biting rate (including the effect of changes in mosquito density), and turns a bednet specification (net type and resistance level) into the `dn0` killing parameter.
+
 ## Installation
 
 ```bash
-pip install -e .
+pip install estimint            # core: inference only (numpy, pandas, xgboost, scipy)
+```
+
+Optional extras, by use case:
+
+```bash
+pip install "estimint[train]"   # data prep + model training (duckdb, scikit-learn, pyarrow)
+pip install "estimint[viz]"     # plotting (matplotlib)
+pip install "estimint[all]"     # train + viz + model download
+pip install "estimint[dev]"     # test/lint/type-check toolchain
+```
+
+The `run_scenarios` pipeline also needs the stateMINT emulator (Python 3.12+). For now it
+comes from the `mamba2-train` branch. With uv this is handled for you:
+
+```bash
+uv sync --extra scenarios
+```
+
+With plain pip, install stateMINT from the branch yourself, then estiMINT:
+
+```bash
+pip install "git+https://github.com/mrc-ide/stateMINT.git@mamba2-train"
+pip install estimint
 ```
 
-Or install dependencies directly:
+For local development with [uv](https://docs.astral.sh/uv/):
 
 ```bash
-pip install -r requirements.txt
+uv sync --extra all --extra dev
 ```
 
-## File Mapping (R → Python)
+## File mapping (R to Python)
 
 | R File | Python File | Description |
 |--------|-------------|-------------|
@@ -28,35 +53,38 @@ pip install -r requirements.txt
 | `storage.R` | `storage.py` | Model persistence and loading |
 | `run.R` | `run.py` | Model inference |
 
-## API Reference
+## Data & retraining pipeline
 
-### Training
+All training data lives in `datasets/estimint_simulations_y9.parquet`. Two model folders
+derive their views from it and train:
 
-```python
-from estimint import train_xgb_model
-
-model = train_xgb_model(
-    in_parquet="data/input.parquet",
-    out_dir="output/",
-    thr_lo=0.02,           # Lower prevalence threshold
-    thr_hi=0.95,           # Upper prevalence threshold
-    k_strata=16,           # K-means strata for EIR
-    K=10,                  # CV folds
-    seed=42,
-    save_pkl=True,
-    save_plots=True,
-    save_artifacts=True
-)
+```
+datasets/               # training data (see datasets/README.md)
+models/
+  prevalence/           # prev_y9 -> EIR     (estiMINT_model.pkl)
+  hbr/                  # HBR<->EIR sub-models (estiMINT_HBR_model.pkl, estiMINT_EIR_to_HBR_model.pkl)
 ```
 
+Retrain a model end-to-end, e.g. the prevalence model:
+
+```bash
+python models/prevalence/prepare.py        # derive the training view from the parquet
+python models/prevalence/train.py          # train -> estiMINT_model.pkl + metrics/ + plots/
+```
+
+The deployed models shipped with the package live in `src/estimint/data/` and are loaded by
+name (`prevalence`, `hbr`, `eir_to_hbr`). This is independent of the training pipeline above.
+
+## API Reference
+
 ### Inference
 
 ```python
 from estimint import load_xgb_model, run_xgb_model
 import pandas as pd
 
-# Load model
-model = load_xgb_model("output/models/estiMINT_model.pkl")
+# Load a bundled model by name: "prevalence", "hbr", or "eir_to_hbr"
+model = load_xgb_model("prevalence")
 
 # Prepare input data
 new_data = pd.DataFrame({
@@ -80,13 +108,73 @@ print(f"Predicted EIR: {eir_predictions[0]:.2f}")
 from estimint import load_xgb_model, run_xgb_model, set_global_model
 
 # Set global model once
-model = load_xgb_model("output/models/estiMINT_model.pkl")
+model = load_xgb_model("prevalence")
 set_global_model(model)
 
 # Run predictions without passing model
 predictions = run_xgb_model(new_data)  # Uses global model
 ```
 
+### Bednet to dn0
+
+Turn a bednet specification (a mix of net types and an insecticide resistance level) into
+the `dn0` covariate, the probability a mosquito dies on contact, along with total ITN usage.
+
+```python
+from estimint import calculate_dn0, net_types
+
+net_types()                      # ['pyrethroid_only', 'pyrethroid_pbo', 'pyrethroid_ppf', 'pyrethroid_pyrrole']
+res = calculate_dn0(0.5, py_only=0.4, py_pbo=0.3, py_pyrrole=0.2, py_ppf=0.1)
+res.dn0, res.itn_use             # weighted dn0, total net usage
+```
+
+### Run scenarios
+
+`run_scenarios` runs the whole pipeline in one call. You give it a list of scenarios and
+get back a DataFrame. For each scenario it works out the bednet killing effect, estimates
+the EIR (from prevalence, from biting rate, or taken directly), optionally adjusts for a
+change in mosquito density, then runs the stateMINT emulator forward to the prevalence and
+cases trajectories.
+
+This needs the [stateMINT](https://github.com/mrc-ide/stateMINT) package installed as well
+as estiMINT. estiMINT only loads it when you call `run_scenarios`, and the model weights
+download from HuggingFace.
+
+```python
+from estimint import run_scenarios
+
+scenarios = [
+    dict(name="PBO nets, prevalence input, 60% more mosquitoes",
+         input="prevalence", value=0.30,
+         net="pyrethroid_pbo", resistance=0.55, net_usage=0.85,
+         Q0=0.90, phi_bednets=0.85, seasonal=1, irs_use=0.40, lsm=0.0,
+         mosquito_delta=0.60),
+    dict(name="Biting rate input",
+         input="hbr", value=250000.0,
+         net="pyrethroid_ppf", resistance=0.45, net_usage=0.50,
+         Q0=0.80, phi_bednets=0.82, seasonal=0, irs_use=0.0),
+    dict(name="EIR supplied directly, no nets",
+         input="eir", value=20.0,
+         Q0=0.88, phi_bednets=0.78, seasonal=1, irs_use=0.60),
+]
+
+df = run_scenarios(scenarios)
+print(df[["name", "eir_baseline", "eir_final", "prev_y9", "cases_endline"]])
+```
+
+Every scenario needs `input` and `value`, plus `Q0`, `phi_bednets`, `seasonal` and
+`irs_use`. `lsm` defaults to 0. To include nets give `net`, `resistance` and `net_usage`,
+or leave `net` out for none. `mosquito_delta` only applies when `input` is `"prevalence"`.
+
+The returned DataFrame has one row per scenario. Alongside the inputs it gives the
+estimated EIR (`eir_baseline`, and `eir_final` after any mosquito-density change) and the
+stateMINT output. That output is year-9 prevalence (`prev_y9`), endline prevalence and
+cases, and the full 157-step `prev_series` and `cases_series`. What you do with it is up to
+you.
+
+The `estimint.scenarios` module is also where the simulation-based inference and experiment
+code will go.
+
 ## Utility Functions
 
 ```python
@@ -110,6 +198,9 @@ y_calibrated = predict_qmap_w(y_pred, cal)
 
 ## Data Processing
 
+These functions need the training extras. Install them with `pip install "estimint[train]"`,
+which adds duckdb and scikit-learn.
+
 ```python
 from estimint import load_and_filter, make_value_weights, strata_and_split
 
@@ -126,6 +217,28 @@ df["eir_log10"] = np.log10(df["eir"])
 df = strata_and_split(df, k_strata=16, seed=42)
 ```
 
+## Testing
+
+```bash
+uv sync --extra dev          # or: pip install -e ".[dev]"
+uv run pytest                # or: pytest
+```
+
+This covers the metric and utility helpers, the EIR estimators (prevalence, HBR and direct
+EIR), the mosquito-density HBR pipeline, and the bednet calculation.
+
+## CI and releases
+
+The test suite runs on every push and pull request across Python 3.10 to 3.14, defined in
+[`.github/workflows/tests.yml`](.github/workflows/tests.yml).
+
+Releases publish to PyPI from [`.github/workflows/publish.yml`](.github/workflows/publish.yml).
+It builds with `uv build` and uploads with `uv publish` using
+[PyPI trusted publishing](https://docs.astral.sh/uv/guides/integration/github/#publishing-to-pypi),
+so no token is stored. To cut a release, bump `version` in `pyproject.toml` and publish a
+GitHub Release. The first time, register this repository as a trusted publisher in the PyPI
+project settings.
+
 ## Key Differences from R Version
 
 1. **File format**: Models saved as `.pkl` (pickle) instead of `.rds`
@@ -135,14 +248,20 @@ df = strata_and_split(df, k_strata=16, seed=42)
 
 ## Dependencies
 
+Core, always installed, and enough for inference:
+
 - numpy >= 1.20.0
 - pandas >= 1.3.0
-- duckdb >= 0.8.0
 - xgboost >= 1.6.0
-- scikit-learn >= 1.0.0
-- matplotlib >= 3.4.0
-- requests >= 2.28.0 (optional, for model download)
-- appdirs >= 1.4.0 (optional, for cache directory)
+- scipy >= 1.7.0
+
+Optional extras, installed with `estimint[name]`:
+
+- `train` adds duckdb, scikit-learn and pyarrow for data prep and model training
+- `viz` adds matplotlib for plotting
+- `download` adds requests and appdirs for fetching published models
+- `all` combines train, viz and download
+- `dev` is the test and lint toolchain (pytest, pytest-cov, black, isort, mypy, flake8)
 
 ## License
 

diff --git a/datasets/README.md b/datasets/README.md
@@ -0,0 +1,9 @@
+# datasets/
+
+Training data for retraining the estiMINT models. Not shipped with the package.
+
+**`estimint_simulations_y9.parquet`** — 16,384 rows (4,096 parameter sets × 4 sims),
+year-9 aggregates. Columns: `parameter_index`, `simulation_index`, `eir`, `dn0_use`,
+`Q0`, `phi_bednets`, `seasonal`, `itn_use`, `irs_use`, `prev_y9`, `hbr_y9`.
+
+Each model's `prepare.py` filters this source and sorts by key into its training view.
diff --git a/datasets/estimint_simulations_y9.parquet b/datasets/estimint_simulations_y9.parquet
diff --git a/models/hbr/README.md b/models/hbr/README.md
@@ -0,0 +1,17 @@
+# models/hbr
+
+The HBR feature's two sub-models, both used by `estimate_eir_with_mosquito_delta`
+(`src/estimint/hbr.py`) to answer "what happens to EIR if mosquito density changes by X%?".
+
+| Sub-model | Direction | Bundle name | File |
+|---|---|---|---|
+| `train_hbr_to_eir.py` | HBR + interventions → EIR | `hbr` | `estiMINT_HBR_model.pkl` |
+| `train_eir_to_hbr.py` | EIR + interventions → HBR | `eir_to_hbr` | `estiMINT_EIR_to_HBR_model.pkl` |
+
+```bash
+python models/hbr/prepare.py            # source -> hbr_training.parquet + eir_to_hbr_training.parquet
+python models/hbr/train_hbr_to_eir.py   # -> estiMINT_HBR_model.pkl
+python models/hbr/train_eir_to_hbr.py   # -> estiMINT_EIR_to_HBR_model.pkl
+```
+
+Deployed copies live in `src/estimint/data/`.
diff --git a/models/hbr/metrics/eir_OOF_metrics_K10CV.csv b/models/hbr/metrics/eir_OOF_metrics_K10CV.csv
@@ -0,0 +1,3 @@
+set,R2,bias,RMSE,MAE
+OOF_uncalibrated,0.9937114678283258,0.1975426785792437,7.503367295849099,1.811916642783641
+OOF_calibrated,0.9938349462151069,-0.11396988957813733,7.429335951261748,1.7968831645347725
diff --git a/models/hbr/metrics/eir_test_metrics.csv b/models/hbr/metrics/eir_test_metrics.csv
@@ -0,0 +1,2 @@
+set,R2,bias,RMSE,MAE
+Test,0.9932201789731747,-0.1516625971494146,7.79412320408285,1.7008778672664435
diff --git a/models/hbr/metrics/hbr_OOF_metrics_K10CV.csv b/models/hbr/metrics/hbr_OOF_metrics_K10CV.csv
@@ -0,0 +1,3 @@
+set,R2,bias,RMSE,MAE
+OOF_uncalibrated,0.9998069592457883,-1107.6675606973777,23435.86049194724,2709.3842691018244
+OOF_calibrated,0.9998096907822673,119.85741076025299,23269.46044419048,3190.5668570747875
diff --git a/models/hbr/metrics/hbr_test_metrics.csv b/models/hbr/metrics/hbr_test_metrics.csv
@@ -0,0 +1,2 @@
+set,R2,bias,RMSE,MAE
+Test,0.9999753512398193,1533.1655993230922,8372.508464605351,2530.611551719603
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		set,R2,bias,RMSE,MAE
		Test,0.9932201789731747,-0.1516625971494146,7.79412320408285,1.7008778672664435
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		set,R2,bias,RMSE,MAE
		Test,0.9999753512398193,1533.1655993230922,8372.508464605351,2530.611551719603