[bugfix] Preserve ft:highlight-field-matches under facet drill-down#6454
[bugfix] Preserve ft:highlight-field-matches under facet drill-down#6454joewiz wants to merge 1 commit into
Conversation
A faceted ft:query wraps the content query in a Lucene DrillDownQuery before searching. The same wrapped query was stored on every LuceneMatch, and ft:highlight-field-matches walks that stored query (via getTerms -> LuceneUtil.extractTerms) to recover per-field term matches. extractTerms cannot see into a DrillDownQuery, so a facet drill-down silently yielded zero matches and produced empty highlights -- every faceted search lost its KWIC snippets, while the same query without a facet filter highlighted fine. Decouple the search query from the match query: search with the DrillDownQuery, but store the pre-drill-down query (still index-type-filtered and boosted) on the LuceneMatch for term/highlight extraction. searchAndProcess gains a two-query overload; the single-query form delegates with the same query for both, so non-faceted paths are unchanged. The drilldown and boost wrapping is restructured around an applyBoost helper; search, scoring, and facet counts are unaffected (only the match query consumed by highlighting changes). Regression test (facet-drilldown-highlight.xqm): a faceted query now highlights and marks the queried term, drill-down still selects/excludes by facet value, and a plain query still highlights. 649 lucene XQSuite tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
duncdrum
left a comment
There was a problem hiding this comment.
The bug is real and this fix works, but I think the change surface is larger than needed.
LuceneUtil.extractTermsFromDrillDown already exists specifically to handle DrillDownQuery
in term extraction — it was added alongside the original facet feature. The problem is its
implementation: it calls query.rewrite(new IndexSearcher(reader)), which in Lucene 10
expands the DrillDownQuery into a BooleanQuery containing both the content query and the dimension-filter clauses. Walking all clauses mixes content terms with internal dimension terms (e.g. $facets:kind$para), which don't appear in document text and apparently prevent correct highlight extraction.
DrillDownQuery exposes getBaseQuery(), which returns the
content query directly — no rewrite, no dimension noise. That reduces the fix to one line in
LuceneUtil:
private static void extractTermsFromDrillDown(DrillDownQuery query, ...) {
extractTerms(query.getBaseQuery(), terms, reader, includeFields);
}
With that in place, there's no need to separate searchQuery from matchQuery in
searchAndProcess, no applyBoost helper, and no duplication across the two query() overloads. The existing test suite (facet-drilldown-highlight.xqm) would still be the right regression harness.
Did you give this a try, or am I missing something?
| } | ||
| searchAndProcess(contextId, qname, docs, contextSet, resultSet, | ||
| returnAncestor, searcher, query, config); | ||
| returnAncestor, searcher, applyBoost(searchQuery, config), applyBoost(query, config), config); |
There was a problem hiding this comment.
I m not sure why we apply boost here, that is normally handled elsewhere. How is boost is relevant to highlighting?
| } | ||
| searchAndProcess(contextId, qname, docs, contextSet, resultSet, | ||
| returnAncestor, searcher, query, config); | ||
| returnAncestor, searcher, applyBoost(searchQuery, config), applyBoost(query, config), config); |
[This PR was co-authored with Claude Code. -Joe]
Summary
A faceted
ft:query(one with afacetsdrill-down option) silently returns empty highlights:ft:highlight-field-matchesfinds nothing, so any KWIC snippet built on it falls back to raw leading text. The same query without a facet filter highlights correctly. This is a pre-existing defect (reproduces ondevelopand on7.0.0-beta3), surfaced while wiring an/api/search-style endpoint where?q=termhighlights but?q=term&facet=valuedoes not.Root cause
In
LuceneIndexWorker.query(), when a facet drill-down is requested the content query is wrapped in a LuceneDrillDownQuerybefore searching. That same wrapped query is then handed to the hit collector and stored on everyLuceneMatch.ft:highlight-field-matcheswalks the stored match query —getTerms(match.getQuery())→LuceneUtil.extractTerms(...)— to recover the per-field terms to mark.extractTermscannot see into aDrillDownQuery, so it recovers zero terms and highlights nothing.The give-away is that
filterByIndexTypealso wraps the query (in a plainBooleanQueryofMUST+FILTER) and is applied to every query, yet unfaceted queries still highlight —extractTermstraverses aBooleanQueryfine. The opaque wrapper is specificallyDrillDownQuery.Fix
Decouple the search query from the match query:
DrillDownQuery(so drill-down filtering and facet counts are unchanged);LuceneMatch, so term/highlight extraction sees a query it can traverse.searchAndProcessgains a two-query overload(…, searchQuery, matchQuery, config); the existing single-query form delegates with the same query for both, so every non-faceted path is byte-for-byte unchanged. The drill-down/boost wrapping is reorganized around a smallapplyBoosthelper. Only the query consumed by highlighting changes — hits, scores, and facet counts are unaffected.What changed
LuceneIndexWorker.java— bothquery()arities (string and XML) computesearchQuery(with drill-down) andmatchQuery(without);searchAndProcesstwo-query overload passesmatchQueryto the hit collector and searches withsearchQuery; newapplyBoosthelper.facet-drilldown-highlight.xqm(new XQSuite regression test).Test plan
facet-drilldown-highlight.xqm: a faceted query now highlights and marks the queried term; a plain query still highlights; facet drill-down still selects and excludes by facet value.Notes
ft:queryitself (and therefore for anything built on it). A highlight-preserving workaround exists for callers in the meantime (run the unfaceted query and filter the hit set in XQuery, computing facet counts from the unfiltered set viaft:facets), but this is the correct fix.