Skip to content

fix(falkordb): short-circuit fulltext query when every token was a stopword#1590

Open
aasc77 wants to merge 2 commits into
getzep:mainfrom
aasc77:fix/empty-tokens-group-redisearch
Open

fix(falkordb): short-circuit fulltext query when every token was a stopword#1590
aasc77 wants to merge 2 commits into
getzep:mainfrom
aasc77:fix/empty-tokens-group-redisearch

Conversation

@aasc77

@aasc77 aasc77 commented Jun 16, 2026

Copy link
Copy Markdown

Summary

_build_falkor_fulltext_query (and the equivalent class-method FalkorDriver.build_fulltext_query) emit (@group_id:"X") () whenever the input's content tokens reduce to all stopwords after sanitize + filter — FalkorDB/RediSearch rejects the trailing empty group with Syntax error at offset N near X. There's a too-long guard above the wrap site but no symmetric too-short/empty one. This PR adds it.

sanitized_query = ' | '.join(filtered_words)

# new: short-circuit when every input token was a stopword
if not filtered_words:
    return ''

# existing: too-long guard
if len(sanitized_query.split(' ')) + len(group_ids or '') >= max_query_length:
    return ''

Every call site already special-cases the '' return as "no candidates" (if fuzzy_query == '': return [] in node_fulltext_search / edge_fulltext_search / the community siblings), so no caller-side change is required — the new guard piggybacks on the existing sentinel.

How I hit it

During a 230-episode bulk replay against a 1849-entity FalkorDB graph, the dedup queue stalled at 31% when an episode's LLM-derived content tokens happened to be all stopwords. Container log (graphiti-core 0.28.2 + FalkorDB v4.18.10):

graphiti_core.driver.falkordb_driver - ERROR - Error executing FalkorDB query: Query timed out
{'query': '(@group_id:"X") ()', 'limit': 20, 'routing_': 'r', 'edge_uuids': [], 'group_ids': ['X']}
services.queue_service - ERROR - Failed to process episode None for group X: RediSearch: Syntax error at offset 26 near X

Reproducing the parse error directly against FalkorDB:

$ docker exec falkordb redis-cli GRAPH.QUERY mygraph \
    'CALL db.idx.fulltext.queryRelationships("RELATES_TO", "(@group_id:\"X\") ()") YIELD relationship RETURN relationship LIMIT 1'
RediSearch: Syntax error at offset 26 near X

With this PR applied the replay completed cleanly.

Test plan

  • Existing call sites already handle '' as "no candidates" — verified by grep across graphiti_core/driver/falkordb/operations/search_ops.py (4 sites: node_fulltext_search, edge_fulltext_search, community node + edge siblings).
  • Sanity: non-stopword input still produces a wrapped query of the expected shape (regression-tested downstream in our consumer's QA suite).
  • No callers depend on a malformed-empty return — the broken-() output was never a valid RediSearch expression so any caller that did something with it was broken anyway.

Happy to add a unit test directly inside this repo if a maintainer points me at the right place — I can see the testing structure under tests/ and would just want to confirm the location/style conventions.

Notes

  • Two-file patch because the same builder lives in both graphiti_core/driver/falkordb_driver.py (class-method form on FalkorDriver.build_fulltext_query) and graphiti_core/driver/falkordb/operations/search_ops.py (module-level _build_falkor_fulltext_query). Same guard, same comment, applied to both.
  • Comment in the patch documents why the early return is safe (callers already special-case '').

…opword

`_build_falkor_fulltext_query` (and the equivalent class-method
`FalkorDriver.build_fulltext_query`) currently emits
`(@group_id:"X") ()` whenever the input's content tokens reduce to all
stopwords after sanitize + filter — FalkorDB/RediSearch rejects the
trailing empty group with `Syntax error at offset N near X`. There's a
too-long guard above the wrap site but no symmetric too-short/empty one.

Every call site of `_build_falkor_fulltext_query` already special-cases a
`''` return as "no candidates":

    fuzzy_query = _build_falkor_fulltext_query(query, group_ids)
    if fuzzy_query == '':
        return []

So the fix is a one-line early return after the stopword filter — same
sentinel the too-long path already uses, no caller-side change needed.

Hit during a 230-episode bulk replay against a 1849-entity FalkorDB graph:
the dedup queue stalled at ~31% when an episode's LLM-derived content
tokens happened to be all stopwords. With this guard, the rest of the
replay completed cleanly.
@zep-cla-assistant

zep-cla-assistant Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@aasc77

aasc77 commented Jun 16, 2026

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA behalf on myself, e-mail: angel.s@me.com

zep-cla-assistant Bot added a commit that referenced this pull request Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant