PolicyEngine · hua7450 · May 30, 2026 · May 30, 2026 · May 30, 2026
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ The app is fully static — all TANF benefits are precomputed into JSON files, s
 | Data generation | Python, [PolicyEngine US](https://github.com/PolicyEngine/policyengine-us) |
 | Hosting | GitHub Pages (via `docs/` folder) |
 
-**Current data version:** policyengine-us `1.598.0`
+**Current data version:** policyengine-us `1.715.3` + [Indiana TANF fix #8543](https://github.com/PolicyEngine/policyengine-us/pull/8543), tax year 2026 — all 56 data files regenerated.
 
 ### Precomputed data grid
 
@@ -33,7 +33,10 @@ The app is fully static — all TANF benefits are precomputed into JSON files, s
 | Adults | 1–2 | — |
 | Children | 0–7 | — |
 
-This produces 15,376 simulations per state (~23 minutes each).
+This produces 15,376 benefit values per state. The vectorized generator
+(`precompute_vec.py`) computes a full state in ~2–5 seconds — roughly **600×
+faster** than the cell-by-cell generator — and is validated bit-for-bit
+identical to it. See [scripts/README.md](scripts/README.md).
 
 ## Getting Started
 
@@ -58,11 +61,21 @@ Requires Python 3.10+ and policyengine-us:
 ```bash
 cd scripts
 pip install -r requirements.txt
-python precompute.py           # Generate all state JSON files
-python precompute.py --states CA,NY  # Generate specific states only
-python precompute.py --metadata-only # Regenerate metadata.json only
+
+# Fast vectorized generator (recommended; ~2–5s per state)
+python precompute_vec.py                 # Generate all state JSON files
+python precompute_vec.py --states CA,NY   # Generate specific states only
+
+# Reference cell-by-cell generator (slow; metadata.json lives here)
+python precompute.py --states CA,NY       # Generate specific states (slow)
+python precompute.py --metadata-only      # Regenerate metadata.json only
 ```
 
+The two generators are interchangeable and produce byte-for-byte identical
+data; `precompute_vec.py` is just far faster. See
+[scripts/README.md](scripts/README.md) for how the vectorization works and the
+validation behind that claim.
+
 Then rebuild the frontend:
 
 ```bash

diff --git a/public/data/AK.json b/public/data/AK.json
diff --git a/public/data/CA_1.json b/public/data/CA_1.json
diff --git a/public/data/CA_2.json b/public/data/CA_2.json
diff --git a/public/data/CT.json b/public/data/CT.json
diff --git a/public/data/HI.json b/public/data/HI.json
diff --git a/public/data/IL.json b/public/data/IL.json
diff --git a/public/data/IN.json b/public/data/IN.json
diff --git a/public/data/KS.json b/public/data/KS.json
diff --git a/public/data/KY.json b/public/data/KY.json
diff --git a/public/data/MA.json b/public/data/MA.json
diff --git a/public/data/MN.json b/public/data/MN.json
diff --git a/public/data/MT.json b/public/data/MT.json
diff --git a/public/data/ND.json b/public/data/ND.json
diff --git a/public/data/NE.json b/public/data/NE.json
diff --git a/public/data/NH.json b/public/data/NH.json
diff --git a/public/data/NY.json b/public/data/NY.json
diff --git a/public/data/OH.json b/public/data/OH.json
diff --git a/public/data/SC.json b/public/data/SC.json
diff --git a/public/data/SD.json b/public/data/SD.json
diff --git a/public/data/TX.json b/public/data/TX.json
diff --git a/public/data/WA.json b/public/data/WA.json
diff --git a/public/data/WI.json b/public/data/WI.json
diff --git a/public/data/WY.json b/public/data/WY.json
diff --git a/public/data/metadata.json b/public/data/metadata.json
diff --git a/scripts/README.md b/scripts/README.md
@@ -0,0 +1,117 @@
+# Data generation scripts
+
+The frontend is fully static: every TANF benefit it shows is precomputed into
+`public/data/<STATE>.json`. These scripts produce those files.
+
+## Files
+
+| File | Role |
+|---|---|
+| `calculator.py` | Builds a PolicyEngine situation for one household and returns its TANF benefit. Single source of truth for income injection and state-variable mapping. |
+| `config.py` | State list, county→region/group mappings, default year. |
+| `precompute.py` | **Reference** generator — one `Simulation` per grid cell. Slow but simple; also owns `metadata.json`. |
+| `precompute_vec.py` | **Fast** generator — vectorized with PolicyEngine `axes`. Recommended. Produces byte-for-byte identical output. |
+
+## The grid
+
+Each state file is a full grid of **15,376** benefit values:
+
+| Dimension | Values | Count |
+|---|---|---|
+| Earned income (monthly) | $0–$3,000, $100 steps | 31 |
+| Unearned income (monthly) | $0–$3,000, $100 steps | 31 |
+| Adults | 1, 2 | 2 |
+| Children | 0–7 | 8 |
+
+`31 × 31 × 2 × 8 = 15,376`. Stored as `data["<adults>_<children>_false"][earned_idx][unearned_idx]`
+= the rounded **monthly** benefit.
+
+## Why the vectorized generator is ~600× faster
+
+The bottleneck was never the math — it was constructing **15,376 separate
+`Simulation` objects per state**. `precompute_vec.py` constructs only **16**:
+
+* The **household-structure** dimensions (adults × children) *can't* be
+  expressed as axes — they change the number of person entities — so they stay
+  a 16-iteration loop (2 adults × 8 children).
+* The **income** dimensions (31 earned × 31 unearned = 961 cells) become two
+  PolicyEngine **axis groups**, so one `Simulation` computes all 961 cells in a
+  single vectorized pass.
+
+That's 16 builds per state instead of 15,376. Measured on Illinois: **4.3 s
+vectorized vs ~47.6 min cell-by-cell (~664×)**.
+
+### How the axes are built (and why it matches exactly)
+
+`precompute_vec.py` reuses `calculator.create_situation` verbatim to build the
+base household (with zero income), then attaches axes that reproduce *exactly*
+the same inputs the per-cell code would have set:
+
+* `employment_income` (annual) → one year-axis, `min=0, max=36000`.
+* `tanf_gross_earned_income` (monthly) → **12 lock-step month-axes**, one per
+  month, `min=0, max=3000`. Parallel axes in a group step together, so cell *i*
+  gets `employment_income = i·1200` and each month `= i·100` — i.e.
+  `employment_income = monthly·12`, identical to `create_situation`.
+* State-specific **person-level monthly** vars (DC, IL, MT, SC, TX) → the same
+  12-month treatment.
+
+Two subtleties that the code documents inline:
+
+1. **Entity homogeneity.** PolicyEngine lays out a whole parallel-axis group
+   using the *first* axis's entity, so person-level and SPM-unit-level vars
+   can't share a group. The **SPM-unit annual** vars (CA, CO, NC) are therefore
+   set *after* the simulation is built, via `simulation.set_input`, with a
+   961-length array matching the cell layout.
+2. **Cell orientation.** PolicyEngine expands axis group 0 (earned) as the
+   *inner/fast* index (`np.meshgrid` uses `'xy'` indexing), so the flat result
+   is laid out `[unearned][earned]`. The code reshapes then transposes (`.T`)
+   to the `[earned][unearned]` layout the frontend expects.
+
+### Safety net
+
+If the vectorized path ever raises for a single (adults, children) structure
+(e.g. a state with parameters that don't resolve at the target year), that
+structure silently falls back to the trusted cell-by-cell path, so the run can
+never emit wrong or missing data. Any fallback is reported at the end of the run.
+
+## Validation
+
+`precompute_vec.py` was confirmed **bit-for-bit identical** to `precompute.py`
+on 10 states chosen to cover every code path — plain states, person-monthly
+special vars (IL), county selection (CA), and SPM-unit annual vars (CA/CO):
+
+```
+AK CA_1 CA_2 CT HI IL IN KS KY CO  →  0 mismatches across 138,384+ cells
+```
+
+To re-verify after any change, regenerate a state to a temp dir and diff:
+
+```bash
+python precompute_vec.py --states IL --output-dir /tmp/vec_check
+python - <<'PY'
+import json
+a = json.load(open("../public/data/IL.json"))
+b = json.load(open("/tmp/vec_check/IL.json"))
+print("mismatches:", sum(a[k][e][u] != b[k][e][u]
+                          for k in a for e in range(31) for u in range(31)))
+PY
+```
+
+## Usage
+
+```bash
+pip install -r requirements.txt
+
+# Fast (recommended)
+python precompute_vec.py                # all states -> public/data/
+python precompute_vec.py --states CA,NY  # subset
+python precompute_vec.py --states IL --output-dir /tmp/vec_check  # don't clobber
+
+# Reference / metadata
+python precompute.py --states CA,NY      # slow, cell-by-cell
+python precompute.py --metadata-only     # regenerate metadata.json
+```
+
+> Note: `metadata.json` (year, FPG, grid config, county data) is owned by
+> `precompute.py --metadata-only`. `precompute_vec.py` only writes the per-state
+> benefit grids.
diff --git a/scripts/precompute.py b/scripts/precompute.py
@@ -8,7 +8,7 @@
 import os
 import sys
 import time
-from multiprocessing import Pool, cpu_count
+from multiprocessing import Pool
 
 # Add scripts dir to path (calculator.py and config.py live here)
 sys.path.insert(0, os.path.dirname(__file__))
@@ -17,7 +17,7 @@
 from config import PILOT_STATES, CA_COUNTIES, PA_COUNTIES, VA_COUNTIES
 
 # Grid configuration
-YEAR = 2025
+YEAR = 2026
 EARNED_STEPS = list(range(0, 3001, 100))  # $0-$3000/mo in $100 steps (31 values)
 UNEARNED_STEPS = list(range(0, 3001, 100))  # $0-$3000/mo in $100 steps (31 values)
 ADULTS_RANGE = [1, 2]
@@ -43,7 +43,7 @@
 }
 
 OUTPUT_DIR = os.path.join(
-    os.path.dirname(__file__), "..", "frontend", "public", "data"
+    os.path.dirname(__file__), "..", "public", "data"
 )
 
 
@@ -113,11 +113,11 @@ def build_county_list(counties):
     pa_counties, pa_county_groups = build_county_list(PA_COUNTIES)
     va_counties, va_county_groups = build_county_list(VA_COUNTIES)
 
-    # Federal Poverty Guidelines 2025
+    # Federal Poverty Guidelines 2026
     fpg = {
-        "default": {"base": 15650, "per_additional": 5500},
-        "AK": {"base": 19560, "per_additional": 6880},
-        "HI": {"base": 18000, "per_additional": 6330},
+        "default": {"base": 15960, "per_additional": 5680},
+        "AK": {"base": 19950, "per_additional": 7100},
+        "HI": {"base": 18360, "per_additional": 6530},
     }
 
     from importlib.metadata import version as pkg_version
@@ -154,6 +154,7 @@ def build_county_list(counties):
 
 
 def main():
+    sys.stdout.reconfigure(line_buffering=True)
     import argparse
 
     parser = argparse.ArgumentParser()
@@ -166,6 +167,12 @@ def main():
         action="store_true",
         help="Only generate metadata.json",
     )
+    parser.add_argument(
+        "--workers",
+        type=int,
+        default=3,
+        help="Number of parallel worker processes (default: 3).",
+    )
     args = parser.parse_args()
 
     os.makedirs(OUTPUT_DIR, exist_ok=True)
@@ -221,7 +228,7 @@ def main():
     start = time.time()
 
     # Use multiprocessing
-    num_workers = min(cpu_count(), len(tasks))
+    num_workers = min(args.workers, len(tasks))
     print(f"Using {num_workers} workers...\n")
 
     completed = 0