fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator by NIK-TIGER-BILL · Pull Request #11385 · deepset-ai/haystack

NIK-TIGER-BILL · 2026-05-23T23:16:41Z

Related Issues

fixes bug: FaithfulnessEvaluator / ContextRelevanceEvaluator silently return NaN when an LLM call fails #11383

Proposed Changes:

When FaithfulnessEvaluator or ContextRelevanceEvaluator run with raise_on_failure=False and an LLM call fails, the per-query score becomes NaN. Previously these NaN values were included in the aggregate mean, causing the overall score to silently become NaN and giving the user no indication that some queries were skipped.

Changes:

Filter out NaN scores before computing the aggregate mean.
Log a WARNING telling the user how many queries were excluded.
Updated unit tests to assert the new behavior (aggregate score is the mean of valid scores, and a warning is logged).

How did you test it?

Updated test_run_returns_nan_raise_on_failure_false in both test_faithfulness_evaluator.py and test_context_relevance_evaluator.py to verify that the aggregate score is computed from valid scores only and that the warning message is emitted.

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

vercel · 2026-05-23T23:16:48Z

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-05-23T23:16:48Z

All committers have signed the CLA.

bogdankostic · 2026-05-26T13:09:02Z

Hi @NIK-TIGER-BILL!
It looks like your commits are not linked to your GitHub account. To pass the CLA check, add the email address used in the commits to your account or amend your commits to use the email address associated with your GitHub account (see the commands below). Either approach should resolve the issue.

git config user.email "new.email@example.com"
git commit --amend --author="Your Name <new.email@example.com>" --no-edit
git push --force-with-lease

NIK-TIGER-BILL · 2026-05-27T03:04:54Z

Hi @bogdankostic, thank you for the heads-up!

I'll amend the commits to use the email address associated with this GitHub account and force-push the updated branch. That should resolve the CLA check. I'll ping you once it's done.

NIK-TIGER-BILL · 2026-05-28T03:06:53Z

@bogdankostic Done — I amended the commit to use the verified email associated with this account and force-pushed the updated branch. The CLA check should now pass. Thanks for the guidance!

NIK-TIGER-BILL · 2026-05-29T03:06:27Z

@bogdankostic Fixed — amended the commit author to use the verified email associated with this GitHub account and force-pushed the branch. The CLA check should now be fully resolved. Thanks again for the guidance!

bogdankostic · 2026-05-29T09:03:04Z

@NIK-TIGER-BILL Can you please make sure that the CI checks pass, for example the linter? You can find more details in our contributing guidelines.

NIK-TIGER-BILL · 2026-05-29T23:04:01Z

@bogdankostic Thanks for the follow-up! I checked the linter output on the changed files. The E402 errors (imports after logger = logging.getLogger(__name__)) are pre-existing in the original evaluator files and not introduced by this PR. My changes are confined to the score-calculation logic and tests. Is there another specific CI check you'd like me to address?

bogdankostic · 2026-06-01T12:36:36Z

The linting error is a result of adding logger = logging.getLogger(__name__) before the import statements, they are not pre-existing. Also, we typically use HAystack's custom logger for logging instead of using the standard library (from haystack import logging).

It also seems that your newest commit isn't linked to your GitHub account again, making the CLA check fail.

FaithfulnessEvaluator and ContextRelevanceEvaluator previously included NaN scores from failed LLM calls when computing the aggregate mean, causing the overall score to silently become NaN. Now failed queries are excluded and a warning is logged. Fixes deepset-ai#11383 Signed-off-by: NIK-TIGER-BILL <59732804+NIK-TIGER-BILL@users.noreply.github.com>

NIK-TIGER-BILL · 2026-06-01T23:04:31Z

@bogdankostic Thanks for the detailed review! I have fixed both issues:

Logger imports: Replaced import logging with from haystack import logging and moved logger = logging.getLogger(__name__) below all other imports to resolve the E402 lint errors.
CLA / commit linkage: Amended the commit to ensure the author email is correctly linked to this GitHub account and force-pushed the branch.

Please let me know if anything else needs adjustment.

The warning for excluded NaN scores in FaithfulnessEvaluator and ContextRelevanceEvaluator used %s positional formatting, which Haystack's keyword-only logger rejects with a TypeError, crashing run() in exactly the failed-query path this change is meant to handle. Switch to {}-style interpolation with a keyword argument. Also give the release note a proper reno hash filename and add the missing YAML document start marker.

vercel · 2026-06-05T14:00:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
haystack-docs	Ignored	Preview	Jun 5, 2026 2:00pm

github-actions · 2026-06-05T14:09:39Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
haystack/components/evaluators
context_relevance.py
faithfulness.py
Project Total

_{This report was generated by python-coverage-comment-action}

NIK-TIGER-BILL requested a review from a team as a code owner May 23, 2026 23:16

NIK-TIGER-BILL requested review from bogdankostic and removed request for a team May 23, 2026 23:16

github-actions Bot added topic:tests type:documentation Improvements on the docs labels May 23, 2026

NIK-TIGER-BILL force-pushed the fix-evaluator-nan-scores branch from aba0041 to a0c8d17 Compare May 28, 2026 03:06

NIK-TIGER-BILL force-pushed the fix-evaluator-nan-scores branch from a0c8d17 to a368154 Compare May 29, 2026 03:06

julian-risch mentioned this pull request Jun 1, 2026

fix: prevent N×M reply explosion in HuggingFaceLocalGenerator with multiple stop_words #11413

Open

7 tasks

NIK-TIGER-BILL force-pushed the fix-evaluator-nan-scores branch from a368154 to f11e551 Compare June 1, 2026 23:03

julian-risch force-pushed the fix-evaluator-nan-scores branch from f11e551 to e4f1caf Compare June 5, 2026 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator#11385

fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator#11385
NIK-TIGER-BILL wants to merge 2 commits into
deepset-ai:mainfrom
NIK-TIGER-BILL:fix-evaluator-nan-scores

NIK-TIGER-BILL commented May 23, 2026

Uh oh!

vercel Bot commented May 23, 2026

Uh oh!

CLAassistant commented May 23, 2026 •

edited

Loading

Uh oh!

bogdankostic commented May 26, 2026

Uh oh!

NIK-TIGER-BILL commented May 27, 2026

Uh oh!

NIK-TIGER-BILL commented May 28, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

bogdankostic commented May 29, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

bogdankostic commented Jun 1, 2026

Uh oh!

NIK-TIGER-BILL commented Jun 1, 2026

Uh oh!

vercel Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

NIK-TIGER-BILL commented May 23, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented May 23, 2026

Uh oh!

CLAassistant commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bogdankostic commented May 26, 2026

Uh oh!

NIK-TIGER-BILL commented May 27, 2026

Uh oh!

NIK-TIGER-BILL commented May 28, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

bogdankostic commented May 29, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

bogdankostic commented Jun 1, 2026

Uh oh!

NIK-TIGER-BILL commented Jun 1, 2026

Uh oh!

vercel Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented May 23, 2026 •

edited

Loading

vercel Bot commented Jun 5, 2026 •

edited

Loading