-
Notifications
You must be signed in to change notification settings - Fork 0
Integrate all estimators #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
c595976
a56f7ee
992af02
6c32614
7c99d0c
f9c1c53
a0a6528
90a8371
b6fb4dd
528d9b2
2b224e7
9963a04
8ca87e0
8a01454
4b27322
013b471
8c5327f
4a7493e
eb676f7
d8a4e74
e6ba653
70c43e7
0516fbc
121e9a2
90bc0fe
b2e9ae0
dda2434
346d838
06c57a1
c1219f6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
absternator marked this conversation as resolved.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| name: Publish Release to PyPI | ||
|
|
||
| on: | ||
| push: | ||
| tags: | ||
| - 'v*.*.*' | ||
|
|
||
| jobs: | ||
| run: | ||
| runs-on: ubuntu-latest | ||
| environment: | ||
| name: pypi | ||
| permissions: | ||
| id-token: write | ||
| contents: read | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v6 | ||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b | ||
| with: | ||
| enable-cache: true | ||
| version: "0.11.18" | ||
| - name: Set up Python | ||
| run: uv python install | ||
| - name: Build | ||
| run: uv build | ||
| # - name: Smoke test (wheel) | ||
|
absternator marked this conversation as resolved.
Outdated
|
||
| # run: uv run --isolated --no-project --with dist/*.whl tests/smoke_test.py | ||
| # - name: Smoke test (source distribution) | ||
| # run: uv run --isolated --no-project --with dist/*.tar.gz tests/smoke_test.py | ||
| - name: Publish | ||
| run: uv publish | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| name: tests | ||
|
|
||
| on: | ||
| push: | ||
| branches: ["**"] | ||
| pull_request: | ||
|
CosmoNaught marked this conversation as resolved.
|
||
|
|
||
| concurrency: | ||
| group: tests-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| test: | ||
| name: pytest (Python ${{ matrix.python-version }}) | ||
| runs-on: ubuntu-latest | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] | ||
|
|
||
| steps: | ||
|
absternator marked this conversation as resolved.
|
||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v5 | ||
| with: | ||
| enable-cache: true | ||
| python-version: ${{ matrix.python-version }} | ||
|
|
||
| # Install core + dev only. This both runs the test suite and proves the | ||
| # core (inference) install works without the train/viz extras. | ||
| - name: Install project | ||
| run: uv sync --extra dev | ||
|
|
||
| - name: Run tests | ||
| run: uv run pytest | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this file needed? |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,3 @@ | ||
| include README.md | ||
| include requirements.txt | ||
| recursive-include src/estimint/data * | ||
| recursive-include src/estimint/inst * |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,19 +2,44 @@ | |
|
|
||
| Python port of the estiMINT R package for EIR (Entomological Inoculation Rate) estimation using machine learning. | ||
|
|
||
| It estimates EIR from prevalence, converts between EIR and human biting rate (including the effect of changes in mosquito density), and turns a bednet specification (net type and resistance level) into the `dn0` killing parameter. | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| pip install -e . | ||
| pip install estimint # core: inference only (numpy, pandas, xgboost, scipy) | ||
| ``` | ||
|
|
||
| Optional extras, by use case: | ||
|
|
||
| ```bash | ||
| pip install "estimint[train]" # data prep + model training (duckdb, scikit-learn, pyarrow) | ||
| pip install "estimint[viz]" # plotting (matplotlib) | ||
| pip install "estimint[all]" # train + viz + model download | ||
| pip install "estimint[dev]" # test/lint/type-check toolchain | ||
| ``` | ||
|
|
||
| The `run_scenarios` pipeline also needs the stateMINT emulator (Python 3.12+). For now it | ||
| comes from the `mamba2-train` branch. With uv this is handled for you: | ||
|
|
||
| ```bash | ||
| uv sync --extra scenarios | ||
| ``` | ||
|
|
||
| With plain pip, install stateMINT from the branch yourself, then estiMINT: | ||
|
|
||
| ```bash | ||
| pip install "git+https://github.com/mrc-ide/stateMINT.git@mamba2-train" | ||
| pip install estimint | ||
| ``` | ||
|
|
||
| Or install dependencies directly: | ||
| For local development with [uv](https://docs.astral.sh/uv/): | ||
|
|
||
| ```bash | ||
| pip install -r requirements.txt | ||
| uv sync --extra all --extra dev | ||
| ``` | ||
|
|
||
| ## File Mapping (R → Python) | ||
|
CosmoNaught marked this conversation as resolved.
|
||
| ## File mapping (R to Python) | ||
|
|
||
| | R File | Python File | Description | | ||
| |--------|-------------|-------------| | ||
|
|
@@ -28,35 +53,38 @@ pip install -r requirements.txt | |
| | `storage.R` | `storage.py` | Model persistence and loading | | ||
| | `run.R` | `run.py` | Model inference | | ||
|
|
||
| ## API Reference | ||
| ## Data & retraining pipeline | ||
|
|
||
| ### Training | ||
| All training data lives in `datasets/estimint_simulations_y9.parquet`. Two model folders | ||
| derive their views from it and train: | ||
|
|
||
| ```python | ||
| from estimint import train_xgb_model | ||
|
|
||
| model = train_xgb_model( | ||
| in_parquet="data/input.parquet", | ||
| out_dir="output/", | ||
| thr_lo=0.02, # Lower prevalence threshold | ||
| thr_hi=0.95, # Upper prevalence threshold | ||
| k_strata=16, # K-means strata for EIR | ||
| K=10, # CV folds | ||
| seed=42, | ||
| save_pkl=True, | ||
| save_plots=True, | ||
| save_artifacts=True | ||
| ) | ||
| ``` | ||
| datasets/ # training data (see datasets/README.md) | ||
| models/ | ||
| prevalence/ # prev_y9 -> EIR (estiMINT_model.pkl) | ||
| hbr/ # HBR<->EIR sub-models (estiMINT_HBR_model.pkl, estiMINT_EIR_to_HBR_model.pkl) | ||
| ``` | ||
|
|
||
| Retrain a model end-to-end, e.g. the prevalence model: | ||
|
|
||
| ```bash | ||
| python models/prevalence/prepare.py # derive the training view from the parquet | ||
| python models/prevalence/train.py # train -> estiMINT_model.pkl + metrics/ + plots/ | ||
| ``` | ||
|
|
||
| The deployed models shipped with the package live in `src/estimint/data/` and are loaded by | ||
| name (`prevalence`, `hbr`, `eir_to_hbr`). This is independent of the training pipeline above. | ||
|
|
||
| ## API Reference | ||
|
|
||
| ### Inference | ||
|
|
||
| ```python | ||
| from estimint import load_xgb_model, run_xgb_model | ||
| import pandas as pd | ||
|
|
||
| # Load model | ||
| model = load_xgb_model("output/models/estiMINT_model.pkl") | ||
| # Load a bundled model by name: "prevalence", "hbr", or "eir_to_hbr" | ||
| model = load_xgb_model("prevalence") | ||
|
|
||
| # Prepare input data | ||
| new_data = pd.DataFrame({ | ||
|
|
@@ -80,13 +108,73 @@ print(f"Predicted EIR: {eir_predictions[0]:.2f}") | |
| from estimint import load_xgb_model, run_xgb_model, set_global_model | ||
|
|
||
| # Set global model once | ||
| model = load_xgb_model("output/models/estiMINT_model.pkl") | ||
| model = load_xgb_model("prevalence") | ||
| set_global_model(model) | ||
|
|
||
| # Run predictions without passing model | ||
| predictions = run_xgb_model(new_data) # Uses global model | ||
| ``` | ||
|
|
||
| ### Bednet to dn0 | ||
|
|
||
| Turn a bednet specification (a mix of net types and an insecticide resistance level) into | ||
| the `dn0` covariate, the probability a mosquito dies on contact, along with total ITN usage. | ||
|
|
||
| ```python | ||
| from estimint import calculate_dn0, net_types | ||
|
|
||
| net_types() # ['pyrethroid_only', 'pyrethroid_pbo', 'pyrethroid_ppf', 'pyrethroid_pyrrole'] | ||
| res = calculate_dn0(0.5, py_only=0.4, py_pbo=0.3, py_pyrrole=0.2, py_ppf=0.1) | ||
| res.dn0, res.itn_use # weighted dn0, total net usage | ||
| ``` | ||
|
|
||
| ### Run scenarios | ||
|
|
||
| `run_scenarios` runs the whole pipeline in one call. You give it a list of scenarios and | ||
| get back a DataFrame. For each scenario it works out the bednet killing effect, estimates | ||
| the EIR (from prevalence, from biting rate, or taken directly), optionally adjusts for a | ||
| change in mosquito density, then runs the stateMINT emulator forward to the prevalence and | ||
| cases trajectories. | ||
|
|
||
| This needs the [stateMINT](https://github.com/mrc-ide/stateMINT) package installed as well | ||
| as estiMINT. estiMINT only loads it when you call `run_scenarios`, and the model weights | ||
| download from HuggingFace. | ||
|
|
||
| ```python | ||
| from estimint import run_scenarios | ||
|
|
||
| scenarios = [ | ||
| dict(name="PBO nets, prevalence input, 60% more mosquitoes", | ||
| input="prevalence", value=0.30, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is input param needed? how would mint web call this? |
||
| net="pyrethroid_pbo", resistance=0.55, net_usage=0.85, | ||
| Q0=0.90, phi_bednets=0.85, seasonal=1, irs_use=0.40, lsm=0.0, | ||
| mosquito_delta=0.60), | ||
| dict(name="Biting rate input", | ||
| input="hbr", value=250000.0, | ||
| net="pyrethroid_ppf", resistance=0.45, net_usage=0.50, | ||
| Q0=0.80, phi_bednets=0.82, seasonal=0, irs_use=0.0), | ||
| dict(name="EIR supplied directly, no nets", | ||
| input="eir", value=20.0, | ||
| Q0=0.88, phi_bednets=0.78, seasonal=1, irs_use=0.60), | ||
| ] | ||
|
|
||
| df = run_scenarios(scenarios) | ||
| print(df[["name", "eir_baseline", "eir_final", "prev_y9", "cases_endline"]]) | ||
| ``` | ||
|
|
||
| Every scenario needs `input` and `value`, plus `Q0`, `phi_bednets`, `seasonal` and | ||
| `irs_use`. `lsm` defaults to 0. To include nets give `net`, `resistance` and `net_usage`, | ||
| or leave `net` out for none. `mosquito_delta` only applies when `input` is `"prevalence"`. | ||
|
|
||
| The returned DataFrame has one row per scenario. Alongside the inputs it gives the | ||
| estimated EIR (`eir_baseline`, and `eir_final` after any mosquito-density change) and the | ||
| stateMINT output. That output is year-9 prevalence (`prev_y9`), endline prevalence and | ||
| cases, and the full 157-step `prev_series` and `cases_series`. What you do with it is up to | ||
| you. | ||
|
|
||
| The `estimint.scenarios` module is also where the simulation-based inference and experiment | ||
| code will go. | ||
|
|
||
| ## Utility Functions | ||
|
|
||
| ```python | ||
|
|
@@ -110,6 +198,9 @@ y_calibrated = predict_qmap_w(y_pred, cal) | |
|
|
||
| ## Data Processing | ||
|
|
||
| These functions need the training extras. Install them with `pip install "estimint[train]"`, | ||
| which adds duckdb and scikit-learn. | ||
|
|
||
| ```python | ||
| from estimint import load_and_filter, make_value_weights, strata_and_split | ||
|
|
||
|
|
@@ -126,6 +217,28 @@ df["eir_log10"] = np.log10(df["eir"]) | |
| df = strata_and_split(df, k_strata=16, seed=42) | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| ```bash | ||
| uv sync --extra dev # or: pip install -e ".[dev]" | ||
| uv run pytest # or: pytest | ||
| ``` | ||
|
|
||
| This covers the metric and utility helpers, the EIR estimators (prevalence, HBR and direct | ||
| EIR), the mosquito-density HBR pipeline, and the bednet calculation. | ||
|
|
||
| ## CI and releases | ||
|
|
||
| The test suite runs on every push and pull request across Python 3.10 to 3.14, defined in | ||
| [`.github/workflows/tests.yml`](.github/workflows/tests.yml). | ||
|
|
||
| Releases publish to PyPI from [`.github/workflows/publish.yml`](.github/workflows/publish.yml). | ||
| It builds with `uv build` and uploads with `uv publish` using | ||
| [PyPI trusted publishing](https://docs.astral.sh/uv/guides/integration/github/#publishing-to-pypi), | ||
| so no token is stored. To cut a release, bump `version` in `pyproject.toml` and publish a | ||
| GitHub Release. The first time, register this repository as a trusted publisher in the PyPI | ||
| project settings. | ||
|
|
||
| ## Key Differences from R Version | ||
|
|
||
| 1. **File format**: Models saved as `.pkl` (pickle) instead of `.rds` | ||
|
|
@@ -135,14 +248,20 @@ df = strata_and_split(df, k_strata=16, seed=42) | |
|
|
||
| ## Dependencies | ||
|
absternator marked this conversation as resolved.
Outdated
|
||
|
|
||
| Core, always installed, and enough for inference: | ||
|
|
||
| - numpy >= 1.20.0 | ||
| - pandas >= 1.3.0 | ||
| - duckdb >= 0.8.0 | ||
| - xgboost >= 1.6.0 | ||
| - scikit-learn >= 1.0.0 | ||
| - matplotlib >= 3.4.0 | ||
| - requests >= 2.28.0 (optional, for model download) | ||
| - appdirs >= 1.4.0 (optional, for cache directory) | ||
| - scipy >= 1.7.0 | ||
|
|
||
| Optional extras, installed with `estimint[name]`: | ||
|
|
||
| - `train` adds duckdb, scikit-learn and pyarrow for data prep and model training | ||
| - `viz` adds matplotlib for plotting | ||
| - `download` adds requests and appdirs for fetching published models | ||
| - `all` combines train, viz and download | ||
| - `dev` is the test and lint toolchain (pytest, pytest-cov, black, isort, mypy, flake8) | ||
|
|
||
| ## License | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # datasets/ | ||
|
|
||
| Training data for retraining the estiMINT models. Not shipped with the package. | ||
|
|
||
| **`estimint_simulations_y9.parquet`** — 16,384 rows (4,096 parameter sets × 4 sims), | ||
|
CosmoNaught marked this conversation as resolved.
Outdated
|
||
| year-9 aggregates. Columns: `parameter_index`, `simulation_index`, `eir`, `dn0_use`, | ||
| `Q0`, `phi_bednets`, `seasonal`, `itn_use`, `irs_use`, `prev_y9`, `hbr_y9`. | ||
|
|
||
| Each model's `prepare.py` filters this source and sorts by key into its training view. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # models/hbr | ||
|
|
||
| The HBR feature's two sub-models, both used by `estimate_eir_with_mosquito_delta` | ||
| (`src/estimint/hbr.py`) to answer "what happens to EIR if mosquito density changes by X%?". | ||
|
|
||
| | Sub-model | Direction | Bundle name | File | | ||
| |---|---|---|---| | ||
| | `train_hbr_to_eir.py` | HBR + interventions → EIR | `hbr` | `estiMINT_HBR_model.pkl` | | ||
| | `train_eir_to_hbr.py` | EIR + interventions → HBR | `eir_to_hbr` | `estiMINT_EIR_to_HBR_model.pkl` | | ||
|
|
||
| ```bash | ||
| python models/hbr/prepare.py # source -> hbr_training.parquet + eir_to_hbr_training.parquet | ||
| python models/hbr/train_hbr_to_eir.py # -> estiMINT_HBR_model.pkl | ||
| python models/hbr/train_eir_to_hbr.py # -> estiMINT_EIR_to_HBR_model.pkl | ||
| ``` | ||
|
|
||
| Deployed copies live in `src/estimint/data/`. |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. usually you dont commit these metrics |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| set,R2,bias,RMSE,MAE | ||
| OOF_uncalibrated,0.9937114678283258,0.1975426785792437,7.503367295849099,1.811916642783641 | ||
| OOF_calibrated,0.9938349462151069,-0.11396988957813733,7.429335951261748,1.7968831645347725 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| set,R2,bias,RMSE,MAE | ||
| Test,0.9932201789731747,-0.1516625971494146,7.79412320408285,1.7008778672664435 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| set,R2,bias,RMSE,MAE | ||
| OOF_uncalibrated,0.9998069592457883,-1107.6675606973777,23435.86049194724,2709.3842691018244 | ||
| OOF_calibrated,0.9998096907822673,119.85741076025299,23269.46044419048,3190.5668570747875 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| set,R2,bias,RMSE,MAE | ||
| Test,0.9999753512398193,1533.1655993230922,8372.508464605351,2530.611551719603 |
Uh oh!
There was an error while loading. Please reload this page.