fix: surface MultiOutputRegressor guidance when treatment_featurizer + single-output model_t (closes #1012)#1036
Open
immu4989 wants to merge 1 commit into
Open
Conversation
…+ single-output model_t (py-why#1012) When treatment_featurizer produces a multi-column featurized treatment (e.g. PolynomialFeatures yielding [T, T**2, ...]), the first-stage model_t receives a 2D Y target. If the user-supplied regressor only supports single-output (e.g. CatBoost, older XGBoost), the underlying estimator raises an opaque shape error far from the actual cause. _FirstStageSelector.train now catches ValueError from the underlying model.train when the target is multi-column and the target is not discrete (i.e. featurizer path, not one-hot-encoded discrete treatment), and re-raises with guidance pointing the user to sklearn.multioutput.MultiOutputRegressor or natively multi-output estimators (LinearRegression, RandomForestRegressor, GradientBoostingRegressor). The original error is preserved as the exception chain root. Adds test_single_output_model_t_with_featurizer_raises_helpful_error which uses a mock single-output regressor and asserts both that the wrapped error mentions MultiOutputRegressor and that wrapping the same mock model with MultiOutputRegressor produces a successful fit. Signed-off-by: Imran Ahamed <immu4989@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1012.
What's happening
When
treatment_featurizerproduces a multi-column featurized treatment (e.g.PolynomialFeaturesyielding[T, T**2, ...]), the first-stagemodel_treceives a 2DYtarget. If the user-supplied regressor doesn't support multi-output (CatBoost, older XGBoost pre-1.6, etc.), the underlying estimator raises an opaque shape error from inside_fit_with_groups, far from the actual cause.The reporter asks: "Is this the intended behavior? Can a model be fitted to each polynomial degree of the feature or do I need a multi-output regressor?"
Answer is the latter, but they shouldn't have had to ask.
Fix
_FirstStageSelector.train(econml/dml/dml.py) now catchesValueErrorfrom the underlyingself._model.train(...)call. When the target is multi-column (ndim == 2 and shape[1] > 1) and is not a discrete-treatment one-hot encoding, the error is re-raised with:The original error is preserved via
raise ... from excso debugging is unaffected.Verification
New
test_single_output_model_t_with_featurizer_raises_helpful_error(intest_treatment_featurization.py):_SingleOutputOnlyRegressorthat errors on 2D y (mirrors CatBoost's failure mode)ValueErrormentionsMultiOutputRegressorand the multi-output framingMultiOutputRegressor(_SingleOutputOnlyRegressor())and assertsfitsucceeds andeffect()returns finite valuesPre-fix: test fails with opaque
expected 1D target, got shape (300, 2).Post-fix: test passes.
Broader sanity: 28 tests passed across
test_dml.py+test_treatment_featurization.py(ray-marked tests deselected per the known env issue onmain).