Skip to content

SpanOrQuery uses IDFs of failed subqueries in score calculation. #13796

@tkarampAlpha

Description

@tkarampAlpha

Description

It seems that for SpanOrQuery IDF of terms belonging in subqueries that will not match a given document, will affect said document's score.

I have observed this through on which I have 3 documents:

doc1: 
    field: something
doc2:
    field: nothing
doc3: 
    field: anything

And I issue the following query:

spanOr([Contents:something, Contents:nothing])

If you check at the score explanation you will notice that in both document's score the idf of both terms affects it even though for each document only one matches.

This is an example of the explanation of the first document's score:

3.9616547 = weight(spanOr([Contents:something, Contents:nothing]) in 0) [AsBM25Similarity], result of:
  3.9616547 = score(freq=1.0), computed as boost * idf * tf from:
    51.0 = boost
    3.9616585 = idf, sum of:
      1.9808292 = idf for term nothing , computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
        1 = docFreq
        3 = docCount
      1.9808292 = idf for term something , computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) + 1 from:
        1 = docFreq
        3 = docCount
    0.019607842 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
      1.0 = phraseFreq=1.0
      50.0 = k1, term saturation parameter
      0.0 = b, length normalization parameter
      1.0 = dl, length of field
      2.0 = avgdl, average length of field

Version and environment details

lucene 9.7.0 through solr 9.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions