Skip to content

fix: surface MultiOutputRegressor guidance when treatment_featurizer + single-output model_t (closes #1012)#1036

Open
immu4989 wants to merge 1 commit into
py-why:mainfrom
immu4989:fix-1012-multioutput-error
Open

fix: surface MultiOutputRegressor guidance when treatment_featurizer + single-output model_t (closes #1012)#1036
immu4989 wants to merge 1 commit into
py-why:mainfrom
immu4989:fix-1012-multioutput-error

Conversation

@immu4989

@immu4989 immu4989 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Closes #1012.

What's happening

When treatment_featurizer produces a multi-column featurized treatment (e.g. PolynomialFeatures yielding [T, T**2, ...]), the first-stage model_t receives a 2D Y target. If the user-supplied regressor doesn't support multi-output (CatBoost, older XGBoost pre-1.6, etc.), the underlying estimator raises an opaque shape error from inside _fit_with_groups, far from the actual cause.

The reporter asks: "Is this the intended behavior? Can a model be fitted to each polynomial degree of the feature or do I need a multi-output regressor?"

Answer is the latter, but they shouldn't have had to ask.

Fix

_FirstStageSelector.train (econml/dml/dml.py) now catches ValueError from the underlying self._model.train(...) call. When the target is multi-column (ndim == 2 and shape[1] > 1) and is not a discrete-treatment one-hot encoding, the error is re-raised with:

First-stage model failed to fit a N-column target. This typically happens when treatment_featurizer (or a multi-dimensional outcome) produces a multi-column target but the supplied model does not support multi-output regression. Wrap your model with sklearn.multioutput.MultiOutputRegressor, or use a model with native multi-output support (e.g. LinearRegression, RandomForestRegressor, GradientBoostingRegressor). Original error: <underlying message>

The original error is preserved via raise ... from exc so debugging is unaffected.

Verification

New test_single_output_model_t_with_featurizer_raises_helpful_error (in test_treatment_featurization.py):

  • Builds a mock _SingleOutputOnlyRegressor that errors on 2D y (mirrors CatBoost's failure mode)
  • Asserts the raised ValueError mentions MultiOutputRegressor and the multi-output framing
  • Then re-runs the same setup with MultiOutputRegressor(_SingleOutputOnlyRegressor()) and asserts fit succeeds and effect() returns finite values

Pre-fix: test fails with opaque expected 1D target, got shape (300, 2).
Post-fix: test passes.

Broader sanity: 28 tests passed across test_dml.py + test_treatment_featurization.py (ray-marked tests deselected per the known env issue on main).

…+ single-output model_t (py-why#1012)

When treatment_featurizer produces a multi-column featurized treatment
(e.g. PolynomialFeatures yielding [T, T**2, ...]), the first-stage
model_t receives a 2D Y target. If the user-supplied regressor only
supports single-output (e.g. CatBoost, older XGBoost), the underlying
estimator raises an opaque shape error far from the actual cause.

_FirstStageSelector.train now catches ValueError from the underlying
model.train when the target is multi-column and the target is not
discrete (i.e. featurizer path, not one-hot-encoded discrete treatment),
and re-raises with guidance pointing the user to
sklearn.multioutput.MultiOutputRegressor or natively multi-output
estimators (LinearRegression, RandomForestRegressor,
GradientBoostingRegressor). The original error is preserved as the
exception chain root.

Adds test_single_output_model_t_with_featurizer_raises_helpful_error
which uses a mock single-output regressor and asserts both that the
wrapped error mentions MultiOutputRegressor and that wrapping the same
mock model with MultiOutputRegressor produces a successful fit.

Signed-off-by: Imran Ahamed <immu4989@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Treatment featurizer in CausalForestDML

1 participant