fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator#11385
Conversation
|
Someone is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
|
Hi @NIK-TIGER-BILL! git config user.email "new.email@example.com"
git commit --amend --author="Your Name <new.email@example.com>" --no-edit
git push --force-with-lease |
|
Hi @bogdankostic, thank you for the heads-up! I'll amend the commits to use the email address associated with this GitHub account and force-push the updated branch. That should resolve the CLA check. I'll ping you once it's done. |
aba0041 to
a0c8d17
Compare
|
@bogdankostic Done — I amended the commit to use the verified email associated with this account and force-pushed the updated branch. The CLA check should now pass. Thanks for the guidance! |
a0c8d17 to
a368154
Compare
|
@bogdankostic Fixed — amended the commit author to use the verified email associated with this GitHub account and force-pushed the branch. The CLA check should now be fully resolved. Thanks again for the guidance! |
|
@NIK-TIGER-BILL Can you please make sure that the CI checks pass, for example the linter? You can find more details in our contributing guidelines. |
|
@bogdankostic Thanks for the follow-up! I checked the linter output on the changed files. The |
|
The linting error is a result of adding It also seems that your newest commit isn't linked to your GitHub account again, making the CLA check fail. |
FaithfulnessEvaluator and ContextRelevanceEvaluator previously included NaN scores from failed LLM calls when computing the aggregate mean, causing the overall score to silently become NaN. Now failed queries are excluded and a warning is logged. Fixes deepset-ai#11383 Signed-off-by: NIK-TIGER-BILL <59732804+NIK-TIGER-BILL@users.noreply.github.com>
a368154 to
f11e551
Compare
|
@bogdankostic Thanks for the detailed review! I have fixed both issues:
Please let me know if anything else needs adjustment. |
f11e551 to
e4f1caf
Compare
The warning for excluded NaN scores in FaithfulnessEvaluator and
ContextRelevanceEvaluator used %s positional formatting, which
Haystack's keyword-only logger rejects with a TypeError, crashing
run() in exactly the failed-query path this change is meant to handle.
Switch to {}-style interpolation with a keyword argument.
Also give the release note a proper reno hash filename and add the
missing YAML document start marker.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||
Related Issues
Proposed Changes:
When
FaithfulnessEvaluatororContextRelevanceEvaluatorrun withraise_on_failure=Falseand an LLM call fails, the per-query score becomesNaN. Previously theseNaNvalues were included in the aggregatemean, causing the overall score to silently becomeNaNand giving the user no indication that some queries were skipped.Changes:
NaNscores before computing the aggregate mean.WARNINGtelling the user how many queries were excluded.How did you test it?
Updated
test_run_returns_nan_raise_on_failure_falsein bothtest_faithfulness_evaluator.pyandtest_context_relevance_evaluator.pyto verify that the aggregate score is computed from valid scores only and that the warning message is emitted.Notes for the reviewer
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.