Closes #21326: Defer global search cache updates to a background job by jnovinger · Pull Request #22481 · netbox-community/netbox

jnovinger · 2026-06-17T14:41:56Z

Closes: #21326

Updates to the global search cache (CachedValue) currently happen synchronously inside the post_save/post_delete signal handlers, adding latency to every write. This defers that work to a background job that runs after the response is sent.

Search-cache updates are coalesced per transaction and dispatched on commit via a new SearchCacheJob, rather than running inline in the signal handlers.
When no RQ worker is available, indexing falls back to running synchronously, so installs without a worker behave as before.
Bulk operations enqueue a single job for the whole transaction instead of one update per object.
The originating database alias is captured from the save/delete and replayed on the deferred write, keeping cache entries correctly schema-routed under netbox-branching without core depending on the plugin.
docs/features/search.md is updated to note that search results are now eventually consistent (a newly created or changed object may not appear for a brief moment).

I'll follow up with a comment covering the design details (the coalescing buffer, the alias capture/replay, and the worker-gated fallback).

Previously the global search cache (CachedValue) was updated synchronously inside the post_save/post_delete signal handlers, adding latency to every write and, for bulk operations, one synchronous re-index per object. The signal handlers now buffer dirty objects and defer the work to a background job that runs after the transaction commits: * netbox/netbox/search/deferred.py coalesces dirty objects per (database alias, transaction). The buffer lives inside a transaction.on_commit callback's closure, so a single flush is scheduled per alias per transaction (collapsing repeated operations; a deletion supersedes a pending create/update). Because Django clears run_on_commit on both commit and rollback, no buffered state can survive a rolled-back transaction. In autocommit (no open transaction), the indexing runs inline immediately. * On flush, when an RQ worker is available the work is dispatched as a SearchCacheJob (netbox/netbox/search/tasks.py); otherwise it runs inline, so installs without a running worker behave as before. * The database alias is captured from the signal's `using` kwarg and replayed on the deferred read and write via .using(alias). This keeps cache entries routed to the originating schema (e.g. a branch schema under netbox-branching) even though the worker has no active routing context, without core depending on the plugin. Only primitives (the alias string and {object_type_id: [pk]} maps) cross the job boundary. * SearchBackend.cache()/remove() gain a `using` parameter, and a shared _remove_by_id() performs the raw delete by content type and object IDs. The worker re-fetches live instances on `using`, removes stale entries, and re-indexes within a single transaction so an object is never left with no cache rows. A missing schema (e.g. a branch merged/deprovisioned between enqueue and execution) is skipped; other database errors propagate so the job fails visibly. remove() now resolves the object type via the indexer's (concrete) model rather than ContentType.objects.get_for_model(instance), matching the content type that cache() writes under so the two cannot diverge for a proxy/MTI searchable model. No registered search index is currently a proxy model, so this is not an observable change today.

Cover the deferred indexing behavior introduced for #21326: * The existing save/delete tests now drive the post-commit flush via captureOnCommitCallbacks; with no RQ worker registered they exercise the synchronous fallback. * Coalescing: a bulk save schedules a single flush carrying every object, and a create-then-delete in one transaction collapses to a removal. * Rollback safety: a rolled-back transaction leaves no scheduled flush, and a commit on the same connection after a rollback still indexes (no stale buffer state suppresses it). * Worker path: when a worker is available the flush enqueues a SearchCacheJob, and the originating database alias is forwarded to the job. (Cross-schema routing itself is covered by the netbox-branching test suite; here we assert only that the alias is captured and forwarded.) * The deferred task tolerates an object deleted between enqueue and execution. * AutocommitCachingTestCase uses TransactionTestCase so saves run outside an atomic block, exercising the inline-indexing branch that TestCase masks.

Search index updates are now applied by a background task after the response is sent, so a newly created or changed object may not appear in search results for a brief period. Update the global search documentation accordingly (and note the synchronous fallback when no background worker is running).

jnovinger · 2026-06-17T14:47:08Z

Some additional design context for reviewers (and the bot). Collapsed since it's deeper than most reviews will need.

Design details

Coalescing buffer

netbox/netbox/search/deferred.py buffers dirty objects per (database alias, transaction). The buffer dict lives inside the closure of the transaction.on_commit callback that will flush it, and the callback is tagged so subsequent mark_dirty() calls in the same transaction find and extend the existing batch rather than scheduling another flush. The result is one flush per alias per transaction: a 5,000-row bulk import enqueues a single job, not 5,000.

Coalescing collapses repeated operations on the same object, and a deletion supersedes a pending create/update (the object ends up removed from the index).

Rollback safety

Django clears a connection's run_on_commit list on both commit and rollback. Because the batch only exists inside a scheduled callback's closure, a rolled-back transaction's buffered objects are discarded with the callback: there is no thread- or module-level state that can leak a rolled-back object into a later transaction's flush. A commit on the same connection after a rollback indexes normally.

In autocommit (no open transaction), on_commit runs its callback immediately at registration, so mark_dirty() indexes inline rather than scheduling a flush that would fire before the batch is populated.

Schema routing without a plugin dependency

netbox-branching routes CachedValue writes to a per-branch Postgres schema via a database router that reads a context variable at query time. Deferred work runs after that context is gone (and, on a worker, in a separate process), so the deferral would otherwise write to the main schema.

Rather than have core know about branching, the database alias is captured from the signal's using kwarg (the alias Django resolved for the write) and replayed on the deferred read and write via .using(alias). QuerySet.using() overrides the router, so the entries land in the originating schema regardless of routing context. Only primitives cross the job boundary: the alias string and {object_type_id: [pk]} maps. The worker re-resolves the model and re-fetches live instances on that alias. Cross-schema correctness is covered by netbox-branching's own routing test suite; this PR's tests assert only that the alias is captured and forwarded.

Worker-gated fallback

On flush, if a default-queue worker is registered the work is dispatched as a SearchCacheJob; otherwise it runs inline. Installs without a running worker keep today's synchronous behavior, so this is not a hard dependency on having a worker.

Why a `JobRunner` rather than a bare `queue.enqueue`

The deferred work is dispatched as a SearchCacheJob(JobRunner) rather than a raw queue.enqueue('module.func', ...). The raw-enqueue form exists in core only for webhooks (extras/events.py), and predates the JobRunner framework; the documented mechanism for decoupling work from the request/response cycle (including housekeeping and synchronization jobs) is JobRunner. Coalescing keeps the volume reasonable (one job per alias per transaction), so this doesn't create a Job row per object.

jnovinger · 2026-06-17T14:47:31Z

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration.

github-actions · 2026-06-17T14:47:55Z

Claude finished @jnovinger's task in 2m 49s —— View job

Review

Read changed files
Analyze for issues of concern
Post findings

1. Redis outage turns a successful save into a 500

_flush() calls any_workers_for_queue() (which opens a Redis connection) inside the on_commit callback. The callback is registered without robust=True, so any ConnectionError propagates back to the HTTP handler — causing a 500 even though the database write already committed. Before this PR, search-cache updates had no Redis dependency in the signal path.

netbox/netbox/search/deferred.py:63

transaction.on_commit(flush, using=alias)
# Should be:
transaction.on_commit(flush, using=alias, robust=True)

robust=True (Django 4.1+) logs the exception rather than propagating it, which matches the intent of graceful degradation already expressed in the worker-fallback logic. Fix this →

2. `removal_handler` is missing `ProgrammingError` handling

caching_handler wraps ObjectType.objects.get_for_model() in a ProgrammingError guard (for incomplete schemas during migrations), but removal_handler does not. Deletes can fire during migrations too.

netbox/netbox/search/backends.py:98

object_type = ObjectType.objects.get_for_model(indexer.model)  # unguarded

Fix this →

3. `connection.run_on_commit` internal structure

_pending_batch() unpacks each entry as (_sids, func, _robust) — a 3-tuple that is Django's internal representation, not a documented API. This is the only viable mechanism for finding an existing callback without introducing module-level state, so there may not be a better option, but it's worth flagging as a fragile dependency that could silently break on a Django upgrade.

netbox/netbox/search/deferred.py:83

for _sids, func, _robust in connection.run_on_commit:

4. `SearchCacheJob` creates a `Job` DB row per committed transaction

Job.enqueue() calls job.save(), so every transaction that touches a searchable model (when a worker is running) creates a persistent Job row. The design notes acknowledge this is one row per transaction rather than one per object, but on a busy instance this could produce a large number of short-lived Job records. It's worth considering whether a bare queue.enqueue() (bypassing the Job ORM record) would be more appropriate here, given that these housekeeping jobs have no meaningful object association or user-visible state — unlike the rest of the Job framework users.

* Register the on_commit flush with robust=True so a failure while flushing (e.g. Redis being unreachable when checking for an available worker) is logged rather than propagated, preventing a committed save from returning a 500. * Guard removal_handler's get_for_model() lookup with the same ProgrammingError handling as caching_handler, for deletes that fire during migrations. * Add a test asserting Django's connection.run_on_commit entry shape (savepoint_ids, func, robust), so a future Django change fails with a clear pointer to _pending_batch() rather than an opaque unpack error.

jnovinger · 2026-06-17T15:52:11Z

1. Redis outage turns a successful save into a 500

Fixed. The on_commit flush is now registered with robust=True, so a Redis failure while checking for an available worker is logged instead of propagating. A committed save will never 500 on a search-cache update.

2. removal_handler is missing ProgrammingError handling

Fixed. removal_handler now wraps the get_for_model() lookup in the same ProgrammingError guard as caching_handler.

3. connection.run_on_commit internal structure

Agreed it's a Django internal with no public alternative (short of module-level state, which we avoid for rollback safety). Added a test that asserts the (savepoint_ids, func, robust) shape directly, so a change in a future Django release fails with a message pointing at _pending_batch() rather than an opaque unpack error.

4. SearchCacheJob creates a Job DB row per committed transaction

Staying as-is. JobRunner is the documented mechanism for decoupling work from the request/response cycle; the bare queue.enqueue() form exists in core only for webhooks and predates the JobRunner framework. Coalescing holds this to one job per transaction (not per object), and expired Job records are already pruned by the housekeeping job. Our view is that consistency with the job framework is worth the row; if row volume proves to be a problem on large instances, narrowing it is a reasonable follow-up.

jnovinger · 2026-06-17T15:52:46Z

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration. See previous comments for review response.

github-actions · 2026-06-17T15:53:03Z

Claude finished @jnovinger's task in 3m 18s —— View job

Review

Read changed files
Analyze for remaining issues of concern
Post findings

Previous issues #1 and #2 are correctly fixed. Issue #3 is acknowledged with a test. Issue #4 is staying as-is (noted).

One remaining concern:

Autocommit path lacks the same Redis-error protection as the transactional path

mark_dirty() has two branches:

Transactional (connection.in_atomic_block is True): registers an on_commit callback with robust=True, so a Redis ConnectionError inside _flush() (e.g. from any_workers_for_queue()) is logged and swallowed — the committed write is never turned into a 500.
Autocommit (connection.in_atomic_block is False): calls _flush() directly with no error handling. A Redis ConnectionError from the same any_workers_for_queue() call propagates through mark_dirty() → the signal handler → Django's signal.send() → the save/delete caller.

any_workers_for_queue opens a real Redis connection (get_connection(queue_name) + Worker.all_keys()); if Redis is unreachable it raises. This path was not present before this PR, so this introduces a new Redis dependency in the autocommit post_save/post_delete path.

netbox/netbox/search/deferred.py:46–48

if not connection.in_atomic_block:
    _flush({(object_type_id, pk): op}, alias)
    return

Suggested fix — wrap the autocommit flush to match the transactional path's intent:

if not connection.in_atomic_block:
    try:
        _flush({(object_type_id, pk): op}, alias)
    except Exception:
        logger.exception("Error flushing search cache (autocommit)")
    return

Fix this →

Make _flush the single guarded dispatcher so a search cache update can never turn an already-committed save into an error, on either the transactional or the autocommit path: * Wrap the worker-availability probe and the job enqueue (both of which touch Redis, and a worker can die between them) in one try/except RedisError, falling back to inline indexing. The search index lives in PostgreSQL, so the inline fallback has no Redis dependency. * Guard the autocommit branch of mark_dirty() so a flush failure there is logged rather than propagated to the post_save/post_delete caller. The transactional path keeps transaction.on_commit(..., robust=True) as defense in depth. * Document that a broker failure mid-enqueue can leave a stranded PENDING Job row (the index is still written inline; the row ages out via housekeeping). * Note why the removal loop is not wrapped in a transaction (single DELETE per content type) and that a dropped update is recovered by reindex, not retry. Tests: add coverage for the inline fallback when the broker is unreachable and when enqueue itself fails after the probe succeeds (pinning the decision to guard the whole dispatch, not just the probe).

jnovinger · 2026-06-18T08:37:10Z

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration.

Since the last review, the autocommit-path Redis gap you flagged is fixed: _flush is now the single guarded dispatcher, wrapping the worker probe and the enqueue together in a try/except RedisError with an inline fallback, and the autocommit branch of mark_dirty() is guarded as well. The transactional path keeps robust=True as defense in depth. See the latest commit message ("Harden deferred search cache error handling") and the collapsed design-details comment above for the full rationale, including why the removal loop isn't wrapped in a transaction and the recover-via-reindex (no retry) failure model.

github-actions · 2026-06-18T08:37:24Z

Claude finished @jnovinger's task in 4m 5s —— View job

Review

Read changed files
Analyze latest changes for remaining issues
Post findings

All previous issues are resolved. No remaining concerns.

The three structural fixes hold up:

Single guarded dispatcher: _flush() wraps the worker probe and enqueue together in try/except RedisError, and falls through to inline indexing on any Redis failure. Both callers add a second layer (robust=True on the transactional path, except Exception on the autocommit path), so a non-Redis exception from enqueue() — e.g. a DatabaseError saving the Job row — is logged and swallowed rather than propagated. The object won't be indexed in that case, but that's consistent with the stated recover-via-reindex failure model, and such a failure would indicate broader DB trouble.
removal_handler ProgrammingError guard: present and symmetric with caching_handler.
run_on_commit tuple shape test: correctly asserts the (savepoint_ids, func, robust) structure with a pointer back to _pending_batch().

The psycopg sqlstate attribute in _is_missing_schema is correct for psycopg3, which is the only supported driver (psycopg[c,pool]==3.3.4).

Move the search deferral imports to module top level wherever possible, leaving a single function-local import for the one genuinely cycle-bound case: * backends.py imports OP_CACHE/OP_REMOVE/mark_dirty from .deferred at module level (the handlers no longer each import them locally). * deferred.py imports RQ_QUEUE_DEFAULT and any_workers_for_queue at module level. * The only remaining function-local import is _flush's `from netbox.search.tasks import SearchCacheJob, update_search_cache`, which is required because tasks.py imports the search_backend singleton defined at the bottom of backends.py (which itself imports this module). A cleaner fix (extracting the singleton) is tracked as a follow-up. Because any_workers_for_queue is now bound in deferred's namespace, the tests patch netbox.search.deferred.any_workers_for_queue (where it is looked up) rather than utilities.rqworker.any_workers_for_queue.

* Rename mark_dirty() to mark_for_deferred_indexing(), which says what it does (schedule deferred search indexing) rather than borrowing vague cache- invalidation vocabulary. Update the docstring's "dirty" wording to match. * Expand the comment on the on_commit robust=True flag to record why it is required rather than optional: Django runs on_commit callbacks synchronously as the atomic block exits (after COMMIT), so an exception escaping the flush would surface as a 500 on an already-committed write. _flush handles the recoverable Redis fault itself; robust=True keeps any other failure from becoming that post-commit 500.

jnovinger · 2026-06-18T11:23:31Z

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration.

Since the last review, a few follow-up items identified during local review have been addressed:

Reduced the deferral's nested imports to a single, genuinely cycle-bound one (_flush's import of netbox.search.tasks); everything else is now top-level. The remaining nested import is commented with the cycle and a pointer to a tracked follow-up.
Renamed mark_dirty() to mark_for_deferred_indexing() for clarity about what it does.
Expanded the robust=True comment on the on_commit flush to record why it's required (an exception escaping the post-COMMIT callback would otherwise become a 500 on an already-committed write).

Full detail is in the commit messages and the collapsed design-details comment above; please consult them as needed.

github-actions · 2026-06-18T11:23:46Z

Claude finished @jnovinger's task in 2m 18s —— View job

Review

Read changed files
Analyze latest changes for remaining issues
Post findings

All previous issues have been resolved. The three changes described in the trigger comment are all correct:

Import reduction: _flush now has exactly one nested import — the genuine cycle (netbox.search.tasks → netbox.search.backends → netbox.search.deferred → netbox.search.tasks) — and everything else is top-level. The remaining nested import is commented with the cycle explanation and a pointer to a follow-up.
Rename: mark_dirty → mark_for_deferred_indexing is complete. No stale references remain anywhere in the codebase.
robust=True comment: The expanded comment correctly records both that robust=True is required (not just defensive) and why — an exception escaping a post-COMMIT callback propagates out of the atomic block and becomes a 500 on an already-committed write.

No remaining concerns.

pheus

Thanks! The direction looks good to me.

I do think this needs a bit more adjustment before merge, mainly around transaction correctness and code organization.

Nested `atomic()` rollback behavior

I think the current batching logic can leak rolled-back work from a nested transaction into an outer batch.

_pending_batch() appears to find an existing pending callback by database alias only. That means an operation performed inside an inner atomic() block can be added to a batch that belongs to an outer on_commit() callback.

For example:

with transaction.atomic():
    site_a.save()  # schedules an outer search-cache batch

    try:
        with transaction.atomic():
            site_b.delete()  # added to the existing outer batch
            raise RuntimeError()
    except RuntimeError:
        pass

# outer transaction commits

At the database level, site_b still exists because the inner savepoint rolled back. But because the delete was appended to the outer batch, the search cache update can still remove site_b after the outer transaction commits.

Could we make the pending batch lookup savepoint-aware, or otherwise avoid reusing a batch registered outside the current savepoint scope?

I think this should have a regression test covering:

schedule a search-cache update in an outer transaction;
delete or update another searchable object inside an inner atomic();
roll back the inner savepoint;
commit the outer transaction;
confirm the rolled-back inner operation did not affect the search cache.

Code organization / ownership

I also think the new deferred search-cache path would be easier to maintain with clearer ownership boundaries.

Right now the import/dependency shape feels a bit circular:

search.backends -> search.deferred -> search.tasks -> search.backends

The local import avoids the runtime circular import, but I think it is a sign that the responsibilities are not quite separated yet. deferred.py is handling transaction batching, callback coalescing, enqueue-vs-inline behavior, and dispatching. Then the task/job code reaches back into backend-specific cache internals.

I would prefer something closer to:

netbox/search/
├── backends.py   # search backend implementations
├── deferred.py   # transaction/on_commit batching and coalescing
└── jobs.py       # SearchCacheJob(JobRunner)

In particular, I think SearchCacheJob(JobRunner) should live in netbox/search/jobs.py, matching the usual NetBox job organization. The job itself can stay very thin and delegate the actual cache mutation to the backend.

For example:

# netbox/search/jobs.py

class SearchCacheJob(JobRunner):
    class Meta:
        name = "Search cache update"

    def run(self, using=None, cache_groups=None, remove_groups=None, **kwargs):
        from netbox.search.backends import search_backend

        search_backend.apply_deferred_updates(
            using=using,
            cache_groups=cache_groups,
            remove_groups=remove_groups,
            log=self.logger,
        )

Then the CachedValue-specific behavior can remain owned by CachedValueSearchBackend:

class CachedValueSearchBackend(SearchBackend):
    def apply_deferred_updates(self, using=None, cache_groups=None, remove_groups=None, log=None):
        # owns CachedValue lookups, removals, cache(..., using=...), etc.
        ...

That would also make the custom backend contract clearer. Before this change, custom SEARCH_BACKEND implementations mostly needed to provide the public cache() / remove() behavior used by the signal handlers. With the deferred path, the job now appears to assume CachedValueSearchBackend internals such as _remove_by_id() and a cache(..., using=...) signature. Keeping the deferred implementation on CachedValueSearchBackend would avoid making those assumptions globally.

Comments

Small style note: I think some of the comments in this path could be trimmed once the responsibilities are split up.

The deferred search-cache buffer used a single on_commit callback per (database alias, transaction), found by scanning run_on_commit for the alias tag and mutating its batch. An op buffered inside a nested atomic() that later rolled back was appended to the outer callback's batch, which Django's savepoint_rollback never inspects (it prunes callbacks by the savepoint-id set captured at registration, not by their closure). The rolled-back op survived and flushed on the outer commit, for example removing from the cache an object whose delete was rolled back. Scope each flush callback to the savepoint stack active when it is registered (tuple(connection.savepoint_ids)); _pending_batch now matches on alias and scope. Each savepoint scope gets its own callback and batch, so a rolled-back savepoint's callback is pruned by Django automatically. Coalescing still holds within a scope; cross-scope ops produce one job per scope, and correctness is preserved because the worker re-fetches live rows at flush time. Add regression coverage for the nested-rollback leak plus committed nested scopes, cross-scope save-then-delete, sibling savepoints, and deep nesting with a middle rollback.

Relocate SearchCacheJob into netbox/search/jobs.py, matching the per-app job convention (core/jobs.py, extras/jobs.py); tasks.py was the lone violation. The job is now thin: it delegates to the backend. Move the deferred-update logic (the remove loop, the re-fetch + atomic remove-then-cache loop, and the missing-schema skip) from the free function update_search_cache onto CachedValueSearchBackend as apply_deferred_updates(), with _is_missing_schema and the Postgres SQLSTATE set as private members of that class. The CachedValue specifics (_remove_by_id, cache(..., using=...)) are now reached only from within the backend that owns them, rather than from the job and the inline fallback reaching into backend internals. SearchBackend gains an abstract apply_deferred_updates() (raising NotImplementedError like its siblings) so the deferred contract is explicit for custom SEARCH_BACKEND classes. deferred._flush now dispatches via SearchCacheJob (worker path) or search_backend.apply_deferred_updates() (inline fallback). This renames and relocates the backends/deferred/jobs import cycle but does not break it; the single function-local import in _flush remains, and the clean fix (extracting the search_backend singleton) is still tracked in #22485.

jnovinger · 2026-06-22T10:10:29Z

Both addressed, in two commits now pushed.

Nested atomic() rollback leak (19ba073c9). Confirmed with a failing regression test first, and the root cause is what you described. The mechanism: Django scopes on_commit callbacks to savepoints by the savepoint-id set captured when the callback is registered, and savepoint_rollback prunes callbacks by that set. Our single callback was registered at the outer level and only mutated (not re-registered) from the inner savepoint, so the inner rollback never pruned it and the op flushed on the outer commit.

Fix: scope each flush callback to the savepoint stack active when it is registered (tuple(connection.savepoint_ids)). _pending_batch now matches on alias and scope, so each savepoint scope gets its own callback and batch, and Django's existing savepoint_rollback pruning discards a rolled-back scope's callback for free, no manual bookkeeping. Coalescing still holds within a scope; cross-scope operations produce one job per scope, which is correct because the worker re-fetches live rows at flush time. Added your requested regression test plus coverage for committed nested scopes, cross-scope save-then-delete, sibling savepoints, and deep nesting with a middle rollback.

Code organization (4b87d7383). Done as you sketched. SearchCacheJob moved to netbox/search/jobs.py and is thin; it delegates to CachedValueSearchBackend.apply_deferred_updates(), which now owns the deferred-update logic and the missing-schema handling (_is_missing_schema and the Postgres SQLSTATE set are private to that class). SearchBackend gains an abstract apply_deferred_updates() raising NotImplementedError like its siblings, so the contract is explicit for custom SEARCH_BACKEND classes rather than leaking _remove_by_id / cache(..., using=...) assumptions. Old tasks.py is gone, and I trimmed the comments while splitting things up.

One thing this does not fully resolve: it renames and relocates the backends -> deferred -> jobs -> backends cycle but does not break it. deferred._flush still carries a single function-local import, because deferred.py can't import the search_backend singleton at module level (it's bound at the bottom of backends.py, after backends.py imports deferred). Breaking the cycle properly means extracting the singleton and moving signal wiring into an AppConfig.ready(), which is broad enough that I've kept it separate as #22485.

pheus

Thanks for the updates! This looks very close.

I have two small follow-ups.

For the delete/removal tests, could we make sure the object PK is captured before calling delete() wherever we assert against CachedValue.object_id? Assertions using site.pk after site.delete() can accidentally query for object_id=None instead of the deleted object’s original ID.

It would also be useful for at least one delete/removal test to seed existing CachedValue rows before the delete and then assert that rows for the captured PK are removed.

One other API question: do we want deferred indexing to become part of the base SearchBackend contract? apply_deferred_updates() makes sense for CachedValueSearchBackend, but it is a new expectation for custom SEARCH_BACKEND implementations that may previously have only implemented cache(), remove(), search(), and clear().

Other than that, this looks good to me. Thanks again for iterating on it.

jnovinger added 3 commits June 17, 2026 16:40

jnovinger marked this pull request as draft June 17, 2026 14:44

jnovinger added 2 commits June 18, 2026 12:39

jnovinger marked this pull request as ready for review June 18, 2026 11:28

jnovinger requested review from a team and pheus and removed request for a team June 18, 2026 11:28

Fixes #21326: Reference issue #22485 in the deferred-import comment

bac3d43

pheus requested changes Jun 18, 2026

View reviewed changes

jnovinger added 2 commits June 18, 2026 17:05

jnovinger requested a review from pheus June 22, 2026 10:10

pheus requested changes Jun 22, 2026

View reviewed changes

Conversation

jnovinger commented Jun 17, 2026

Closes: #21326

Uh oh!

jnovinger commented Jun 17, 2026

Coalescing buffer

Rollback safety

Schema routing without a plugin dependency

Worker-gated fallback

Why a JobRunner rather than a bare queue.enqueue

Uh oh!

jnovinger commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

1. Redis outage turns a successful save into a 500

2. removal_handler is missing ProgrammingError handling

3. connection.run_on_commit internal structure

4. SearchCacheJob creates a Job DB row per committed transaction

Uh oh!

jnovinger commented Jun 17, 2026

1. Redis outage turns a successful save into a 500

2. removal_handler is missing ProgrammingError handling

3. connection.run_on_commit internal structure

4. SearchCacheJob creates a Job DB row per committed transaction

Uh oh!

jnovinger commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Autocommit path lacks the same Redis-error protection as the transactional path

Uh oh!

jnovinger commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Uh oh!

jnovinger commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Uh oh!

pheus left a comment

Choose a reason for hiding this comment

Nested atomic() rollback behavior

Code organization / ownership

Comments

Uh oh!

jnovinger commented Jun 22, 2026

Uh oh!

pheus left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why a `JobRunner` rather than a bare `queue.enqueue`

github-actions Bot commented Jun 17, 2026 •

edited

Loading

2. `removal_handler` is missing `ProgrammingError` handling

3. `connection.run_on_commit` internal structure

4. `SearchCacheJob` creates a `Job` DB row per committed transaction

2. `removal_handler` is missing `ProgrammingError` handling

3. `connection.run_on_commit` internal structure

4. `SearchCacheJob` creates a `Job` DB row per committed transaction

github-actions Bot commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Nested `atomic()` rollback behavior