Skip to content

Fix SpanOrQuery scores each document using the combined IDF of all causes#15819

Open
gaobinlong wants to merge 1 commit intoapache:mainfrom
gaobinlong:spanOrQuery
Open

Fix SpanOrQuery scores each document using the combined IDF of all causes#15819
gaobinlong wants to merge 1 commit intoapache:mainfrom
gaobinlong:spanOrQuery

Conversation

@gaobinlong
Copy link
Copy Markdown
Contributor

Description

Resolves #13796.

Fix SpanOrQuery scores each document not using only the IDF of the term that matched, but the combined IDF of all clauses. With two docs (foo:bar, foo:baz) and a spanOr([foo:bar, foo:baz]) query, each doc should score the same as the equivalent bool-should query (i.e.single-term IDF, not summed IDF). Compare the result in OpenSearch as follows:

With bool-should query, idf for doc (foo:bar) is 0.6931472
image

With SpanOr query, idf for doc (foo:bar) is 1.3862944 which is 0.6931472 + 0.6931472
image

…auses rather than the matched clauses

Signed-off-by: Binlong Gao <gbinlong@amazon.com>
@github-actions github-actions Bot added this to the 10.5.0 milestone Mar 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions Bot added the Stale label Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SpanOrQuery uses IDFs of failed subqueries in score calculation.

1 participant