Skip to content

Closes #21326: Defer global search cache updates to a background job#22481

Open
jnovinger wants to merge 10 commits into
featurefrom
21326-defer-search-cache-updates
Open

Closes #21326: Defer global search cache updates to a background job#22481
jnovinger wants to merge 10 commits into
featurefrom
21326-defer-search-cache-updates

Conversation

@jnovinger

Copy link
Copy Markdown
Member

Closes: #21326

Updates to the global search cache (CachedValue) currently happen synchronously inside the post_save/post_delete signal handlers, adding latency to every write. This defers that work to a background job that runs after the response is sent.

  • Search-cache updates are coalesced per transaction and dispatched on commit via a new SearchCacheJob, rather than running inline in the signal handlers.
  • When no RQ worker is available, indexing falls back to running synchronously, so installs without a worker behave as before.
  • Bulk operations enqueue a single job for the whole transaction instead of one update per object.
  • The originating database alias is captured from the save/delete and replayed on the deferred write, keeping cache entries correctly schema-routed under netbox-branching without core depending on the plugin.
  • docs/features/search.md is updated to note that search results are now eventually consistent (a newly created or changed object may not appear for a brief moment).

I'll follow up with a comment covering the design details (the coalescing buffer, the alias capture/replay, and the worker-gated fallback).

Previously the global search cache (CachedValue) was updated synchronously
inside the post_save/post_delete signal handlers, adding latency to every write
and, for bulk operations, one synchronous re-index per object.

The signal handlers now buffer dirty objects and defer the work to a background
job that runs after the transaction commits:

* netbox/netbox/search/deferred.py coalesces dirty objects per (database alias,
  transaction). The buffer lives inside a transaction.on_commit callback's
  closure, so a single flush is scheduled per alias per transaction (collapsing
  repeated operations; a deletion supersedes a pending create/update). Because
  Django clears run_on_commit on both commit and rollback, no buffered state can
  survive a rolled-back transaction. In autocommit (no open transaction), the
  indexing runs inline immediately.

* On flush, when an RQ worker is available the work is dispatched as a
  SearchCacheJob (netbox/netbox/search/tasks.py); otherwise it runs inline, so
  installs without a running worker behave as before.

* The database alias is captured from the signal's `using` kwarg and replayed on
  the deferred read and write via .using(alias). This keeps cache entries routed
  to the originating schema (e.g. a branch schema under netbox-branching) even
  though the worker has no active routing context, without core depending on the
  plugin. Only primitives (the alias string and {object_type_id: [pk]} maps)
  cross the job boundary.

* SearchBackend.cache()/remove() gain a `using` parameter, and a shared
  _remove_by_id() performs the raw delete by content type and object IDs. The
  worker re-fetches live instances on `using`, removes stale entries, and
  re-indexes within a single transaction so an object is never left with no
  cache rows. A missing schema (e.g. a branch merged/deprovisioned between
  enqueue and execution) is skipped; other database errors propagate so the job
  fails visibly.

remove() now resolves the object type via the indexer's (concrete) model rather
than ContentType.objects.get_for_model(instance), matching the content type that
cache() writes under so the two cannot diverge for a proxy/MTI searchable model.
No registered search index is currently a proxy model, so this is not an
observable change today.
Cover the deferred indexing behavior introduced for #21326:

* The existing save/delete tests now drive the post-commit flush via
  captureOnCommitCallbacks; with no RQ worker registered they exercise the
  synchronous fallback.
* Coalescing: a bulk save schedules a single flush carrying every object, and a
  create-then-delete in one transaction collapses to a removal.
* Rollback safety: a rolled-back transaction leaves no scheduled flush, and a
  commit on the same connection after a rollback still indexes (no stale buffer
  state suppresses it).
* Worker path: when a worker is available the flush enqueues a SearchCacheJob,
  and the originating database alias is forwarded to the job. (Cross-schema
  routing itself is covered by the netbox-branching test suite; here we assert
  only that the alias is captured and forwarded.)
* The deferred task tolerates an object deleted between enqueue and execution.
* AutocommitCachingTestCase uses TransactionTestCase so saves run outside an
  atomic block, exercising the inline-indexing branch that TestCase masks.
Search index updates are now applied by a background task after the response is
sent, so a newly created or changed object may not appear in search results for
a brief period. Update the global search documentation accordingly (and note the
synchronous fallback when no background worker is running).
@jnovinger jnovinger marked this pull request as draft June 17, 2026 14:44
@jnovinger

Copy link
Copy Markdown
Member Author

Some additional design context for reviewers (and the bot). Collapsed since it's deeper than most reviews will need.

Design details

Coalescing buffer

netbox/netbox/search/deferred.py buffers dirty objects per (database alias, transaction). The buffer dict lives inside the closure of the transaction.on_commit callback that will flush it, and the callback is tagged so subsequent mark_dirty() calls in the same transaction find and extend the existing batch rather than scheduling another flush. The result is one flush per alias per transaction: a 5,000-row bulk import enqueues a single job, not 5,000.

Coalescing collapses repeated operations on the same object, and a deletion supersedes a pending create/update (the object ends up removed from the index).

Rollback safety

Django clears a connection's run_on_commit list on both commit and rollback. Because the batch only exists inside a scheduled callback's closure, a rolled-back transaction's buffered objects are discarded with the callback: there is no thread- or module-level state that can leak a rolled-back object into a later transaction's flush. A commit on the same connection after a rollback indexes normally.

In autocommit (no open transaction), on_commit runs its callback immediately at registration, so mark_dirty() indexes inline rather than scheduling a flush that would fire before the batch is populated.

Schema routing without a plugin dependency

netbox-branching routes CachedValue writes to a per-branch Postgres schema via a database router that reads a context variable at query time. Deferred work runs after that context is gone (and, on a worker, in a separate process), so the deferral would otherwise write to the main schema.

Rather than have core know about branching, the database alias is captured from the signal's using kwarg (the alias Django resolved for the write) and replayed on the deferred read and write via .using(alias). QuerySet.using() overrides the router, so the entries land in the originating schema regardless of routing context. Only primitives cross the job boundary: the alias string and {object_type_id: [pk]} maps. The worker re-resolves the model and re-fetches live instances on that alias. Cross-schema correctness is covered by netbox-branching's own routing test suite; this PR's tests assert only that the alias is captured and forwarded.

Worker-gated fallback

On flush, if a default-queue worker is registered the work is dispatched as a SearchCacheJob; otherwise it runs inline. Installs without a running worker keep today's synchronous behavior, so this is not a hard dependency on having a worker.

Why a JobRunner rather than a bare queue.enqueue

The deferred work is dispatched as a SearchCacheJob(JobRunner) rather than a raw queue.enqueue('module.func', ...). The raw-enqueue form exists in core only for webhooks (extras/events.py), and predates the JobRunner framework; the documented mechanism for decoupling work from the request/response cycle (including housekeeping and synchronization jobs) is JobRunner. Coalescing keeps the volume reasonable (one job per alias per transaction), so this doesn't create a Job row per object.

@jnovinger

Copy link
Copy Markdown
Member Author

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jnovinger's task in 2m 49s —— View job


Review

  • Read changed files
  • Analyze for issues of concern
  • Post findings

1. Redis outage turns a successful save into a 500

_flush() calls any_workers_for_queue() (which opens a Redis connection) inside the on_commit callback. The callback is registered without robust=True, so any ConnectionError propagates back to the HTTP handler — causing a 500 even though the database write already committed. Before this PR, search-cache updates had no Redis dependency in the signal path.

netbox/netbox/search/deferred.py:63

transaction.on_commit(flush, using=alias)
# Should be:
transaction.on_commit(flush, using=alias, robust=True)

robust=True (Django 4.1+) logs the exception rather than propagating it, which matches the intent of graceful degradation already expressed in the worker-fallback logic. Fix this →


2. removal_handler is missing ProgrammingError handling

caching_handler wraps ObjectType.objects.get_for_model() in a ProgrammingError guard (for incomplete schemas during migrations), but removal_handler does not. Deletes can fire during migrations too.

netbox/netbox/search/backends.py:98

object_type = ObjectType.objects.get_for_model(indexer.model)  # unguarded

Fix this →


3. connection.run_on_commit internal structure

_pending_batch() unpacks each entry as (_sids, func, _robust) — a 3-tuple that is Django's internal representation, not a documented API. This is the only viable mechanism for finding an existing callback without introducing module-level state, so there may not be a better option, but it's worth flagging as a fragile dependency that could silently break on a Django upgrade.

netbox/netbox/search/deferred.py:83

for _sids, func, _robust in connection.run_on_commit:

4. SearchCacheJob creates a Job DB row per committed transaction

Job.enqueue() calls job.save(), so every transaction that touches a searchable model (when a worker is running) creates a persistent Job row. The design notes acknowledge this is one row per transaction rather than one per object, but on a busy instance this could produce a large number of short-lived Job records. It's worth considering whether a bare queue.enqueue() (bypassing the Job ORM record) would be more appropriate here, given that these housekeeping jobs have no meaningful object association or user-visible state — unlike the rest of the Job framework users.

* Register the on_commit flush with robust=True so a failure while flushing
  (e.g. Redis being unreachable when checking for an available worker) is logged
  rather than propagated, preventing a committed save from returning a 500.
* Guard removal_handler's get_for_model() lookup with the same ProgrammingError
  handling as caching_handler, for deletes that fire during migrations.
* Add a test asserting Django's connection.run_on_commit entry shape
  (savepoint_ids, func, robust), so a future Django change fails with a clear
  pointer to _pending_batch() rather than an opaque unpack error.
@jnovinger

Copy link
Copy Markdown
Member Author

1. Redis outage turns a successful save into a 500

Fixed. The on_commit flush is now registered with robust=True, so a Redis failure while checking for an available worker is logged instead of propagating. A committed save will never 500 on a search-cache update.

2. removal_handler is missing ProgrammingError handling

Fixed. removal_handler now wraps the get_for_model() lookup in the same ProgrammingError guard as caching_handler.

3. connection.run_on_commit internal structure

Agreed it's a Django internal with no public alternative (short of module-level state, which we avoid for rollback safety). Added a test that asserts the (savepoint_ids, func, robust) shape directly, so a change in a future Django release fails with a message pointing at _pending_batch() rather than an opaque unpack error.

4. SearchCacheJob creates a Job DB row per committed transaction

Staying as-is. JobRunner is the documented mechanism for decoupling work from the request/response cycle; the bare queue.enqueue() form exists in core only for webhooks and predates the JobRunner framework. Coalescing holds this to one job per transaction (not per object), and expired Job records are already pruned by the housekeeping job. Our view is that consistency with the job framework is worth the row; if row volume proves to be a problem on large instances, narrowing it is a reasonable follow-up.

@jnovinger

Copy link
Copy Markdown
Member Author

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration. See previous comments for review response.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jnovinger's task in 3m 18s —— View job


Review

  • Read changed files
  • Analyze for remaining issues of concern
  • Post findings

Previous issues #1 and #2 are correctly fixed. Issue #3 is acknowledged with a test. Issue #4 is staying as-is (noted).

One remaining concern:


Autocommit path lacks the same Redis-error protection as the transactional path

mark_dirty() has two branches:

  • Transactional (connection.in_atomic_block is True): registers an on_commit callback with robust=True, so a Redis ConnectionError inside _flush() (e.g. from any_workers_for_queue()) is logged and swallowed — the committed write is never turned into a 500.
  • Autocommit (connection.in_atomic_block is False): calls _flush() directly with no error handling. A Redis ConnectionError from the same any_workers_for_queue() call propagates through mark_dirty() → the signal handler → Django's signal.send() → the save/delete caller.

any_workers_for_queue opens a real Redis connection (get_connection(queue_name) + Worker.all_keys()); if Redis is unreachable it raises. This path was not present before this PR, so this introduces a new Redis dependency in the autocommit post_save/post_delete path.

netbox/netbox/search/deferred.py:46–48

if not connection.in_atomic_block:
    _flush({(object_type_id, pk): op}, alias)
    return

Suggested fix — wrap the autocommit flush to match the transactional path's intent:

if not connection.in_atomic_block:
    try:
        _flush({(object_type_id, pk): op}, alias)
    except Exception:
        logger.exception("Error flushing search cache (autocommit)")
    return

Fix this →

Make _flush the single guarded dispatcher so a search cache update can never
turn an already-committed save into an error, on either the transactional or the
autocommit path:

* Wrap the worker-availability probe and the job enqueue (both of which touch
  Redis, and a worker can die between them) in one try/except RedisError, falling
  back to inline indexing. The search index lives in PostgreSQL, so the inline
  fallback has no Redis dependency.
* Guard the autocommit branch of mark_dirty() so a flush failure there is logged
  rather than propagated to the post_save/post_delete caller. The transactional
  path keeps transaction.on_commit(..., robust=True) as defense in depth.
* Document that a broker failure mid-enqueue can leave a stranded PENDING Job
  row (the index is still written inline; the row ages out via housekeeping).
* Note why the removal loop is not wrapped in a transaction (single DELETE per
  content type) and that a dropped update is recovered by reindex, not retry.

Tests: add coverage for the inline fallback when the broker is unreachable and
when enqueue itself fails after the probe succeeds (pinning the decision to
guard the whole dispatch, not just the probe).
@jnovinger

Copy link
Copy Markdown
Member Author

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration.

Since the last review, the autocommit-path Redis gap you flagged is fixed: _flush is now the single guarded dispatcher, wrapping the worker probe and the enqueue together in a try/except RedisError with an inline fallback, and the autocommit branch of mark_dirty() is guarded as well. The transactional path keeps robust=True as defense in depth. See the latest commit message ("Harden deferred search cache error handling") and the collapsed design-details comment above for the full rationale, including why the removal loop isn't wrapped in a transaction and the recover-via-reindex (no retry) failure model.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jnovinger's task in 4m 5s —— View job


Review

  • Read changed files
  • Analyze latest changes for remaining issues
  • Post findings

All previous issues are resolved. No remaining concerns.

The three structural fixes hold up:

  • Single guarded dispatcher: _flush() wraps the worker probe and enqueue together in try/except RedisError, and falls through to inline indexing on any Redis failure. Both callers add a second layer (robust=True on the transactional path, except Exception on the autocommit path), so a non-Redis exception from enqueue() — e.g. a DatabaseError saving the Job row — is logged and swallowed rather than propagated. The object won't be indexed in that case, but that's consistent with the stated recover-via-reindex failure model, and such a failure would indicate broader DB trouble.
  • removal_handler ProgrammingError guard: present and symmetric with caching_handler.
  • run_on_commit tuple shape test: correctly asserts the (savepoint_ids, func, robust) structure with a pointer back to _pending_batch().

The psycopg sqlstate attribute in _is_missing_schema is correct for psycopg3, which is the only supported driver (psycopg[c,pool]==3.3.4).

Move the search deferral imports to module top level wherever possible, leaving
a single function-local import for the one genuinely cycle-bound case:

* backends.py imports OP_CACHE/OP_REMOVE/mark_dirty from .deferred at module
  level (the handlers no longer each import them locally).
* deferred.py imports RQ_QUEUE_DEFAULT and any_workers_for_queue at module level.
* The only remaining function-local import is _flush's
  `from netbox.search.tasks import SearchCacheJob, update_search_cache`, which is
  required because tasks.py imports the search_backend singleton defined at the
  bottom of backends.py (which itself imports this module). A cleaner fix
  (extracting the singleton) is tracked as a follow-up.

Because any_workers_for_queue is now bound in deferred's namespace, the tests
patch netbox.search.deferred.any_workers_for_queue (where it is looked up)
rather than utilities.rqworker.any_workers_for_queue.
* Rename mark_dirty() to mark_for_deferred_indexing(), which says what it does
  (schedule deferred search indexing) rather than borrowing vague cache-
  invalidation vocabulary. Update the docstring's "dirty" wording to match.
* Expand the comment on the on_commit robust=True flag to record why it is
  required rather than optional: Django runs on_commit callbacks synchronously
  as the atomic block exits (after COMMIT), so an exception escaping the flush
  would surface as a 500 on an already-committed write. _flush handles the
  recoverable Redis fault itself; robust=True keeps any other failure from
  becoming that post-commit 500.
@jnovinger

Copy link
Copy Markdown
Member Author

@claude /review Only report on issues of potential concern; we don't need commentary on anything that doesn't require further consideration.

Since the last review, a few follow-up items identified during local review have been addressed:

  • Reduced the deferral's nested imports to a single, genuinely cycle-bound one (_flush's import of netbox.search.tasks); everything else is now top-level. The remaining nested import is commented with the cycle and a pointer to a tracked follow-up.
  • Renamed mark_dirty() to mark_for_deferred_indexing() for clarity about what it does.
  • Expanded the robust=True comment on the on_commit flush to record why it's required (an exception escaping the post-COMMIT callback would otherwise become a 500 on an already-committed write).

Full detail is in the commit messages and the collapsed design-details comment above; please consult them as needed.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @jnovinger's task in 2m 18s —— View job


Review

  • Read changed files
  • Analyze latest changes for remaining issues
  • Post findings

All previous issues have been resolved. The three changes described in the trigger comment are all correct:

  • Import reduction: _flush now has exactly one nested import — the genuine cycle (netbox.search.tasksnetbox.search.backendsnetbox.search.deferrednetbox.search.tasks) — and everything else is top-level. The remaining nested import is commented with the cycle explanation and a pointer to a follow-up.
  • Rename: mark_dirtymark_for_deferred_indexing is complete. No stale references remain anywhere in the codebase.
  • robust=True comment: The expanded comment correctly records both that robust=True is required (not just defensive) and why — an exception escaping a post-COMMIT callback propagates out of the atomic block and becomes a 500 on an already-committed write.

No remaining concerns.

@pheus pheus left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The direction looks good to me.

I do think this needs a bit more adjustment before merge, mainly around transaction correctness and code organization.

Nested atomic() rollback behavior

I think the current batching logic can leak rolled-back work from a nested transaction into an outer batch.

_pending_batch() appears to find an existing pending callback by database alias only. That means an operation performed inside an inner atomic() block can be added to a batch that belongs to an outer on_commit() callback.

For example:

with transaction.atomic():
    site_a.save()  # schedules an outer search-cache batch

    try:
        with transaction.atomic():
            site_b.delete()  # added to the existing outer batch
            raise RuntimeError()
    except RuntimeError:
        pass

# outer transaction commits

At the database level, site_b still exists because the inner savepoint rolled back. But because the delete was appended to the outer batch, the search cache update can still remove site_b after the outer transaction commits.

Could we make the pending batch lookup savepoint-aware, or otherwise avoid reusing a batch registered outside the current savepoint scope?

I think this should have a regression test covering:

  1. schedule a search-cache update in an outer transaction;
  2. delete or update another searchable object inside an inner atomic();
  3. roll back the inner savepoint;
  4. commit the outer transaction;
  5. confirm the rolled-back inner operation did not affect the search cache.

Code organization / ownership

I also think the new deferred search-cache path would be easier to maintain with clearer ownership boundaries.

Right now the import/dependency shape feels a bit circular:

search.backends -> search.deferred -> search.tasks -> search.backends

The local import avoids the runtime circular import, but I think it is a sign that the responsibilities are not quite separated yet. deferred.py is handling transaction batching, callback coalescing, enqueue-vs-inline behavior, and dispatching. Then the task/job code reaches back into backend-specific cache internals.

I would prefer something closer to:

netbox/search/
├── backends.py   # search backend implementations
├── deferred.py   # transaction/on_commit batching and coalescing
└── jobs.py       # SearchCacheJob(JobRunner)

In particular, I think SearchCacheJob(JobRunner) should live in netbox/search/jobs.py, matching the usual NetBox job organization. The job itself can stay very thin and delegate the actual cache mutation to the backend.

For example:

# netbox/search/jobs.py

class SearchCacheJob(JobRunner):
    class Meta:
        name = "Search cache update"

    def run(self, using=None, cache_groups=None, remove_groups=None, **kwargs):
        from netbox.search.backends import search_backend

        search_backend.apply_deferred_updates(
            using=using,
            cache_groups=cache_groups,
            remove_groups=remove_groups,
            log=self.logger,
        )

Then the CachedValue-specific behavior can remain owned by CachedValueSearchBackend:

class CachedValueSearchBackend(SearchBackend):
    def apply_deferred_updates(self, using=None, cache_groups=None, remove_groups=None, log=None):
        # owns CachedValue lookups, removals, cache(..., using=...), etc.
        ...

That would also make the custom backend contract clearer. Before this change, custom SEARCH_BACKEND implementations mostly needed to provide the public cache() / remove() behavior used by the signal handlers. With the deferred path, the job now appears to assume CachedValueSearchBackend internals such as _remove_by_id() and a cache(..., using=...) signature. Keeping the deferred implementation on CachedValueSearchBackend would avoid making those assumptions globally.

Comments

Small style note: I think some of the comments in this path could be trimmed once the responsibilities are split up.

The deferred search-cache buffer used a single on_commit callback per
(database alias, transaction), found by scanning run_on_commit for the
alias tag and mutating its batch. An op buffered inside a nested
atomic() that later rolled back was appended to the outer callback's
batch, which Django's savepoint_rollback never inspects (it prunes
callbacks by the savepoint-id set captured at registration, not by their
closure). The rolled-back op survived and flushed on the outer commit,
for example removing from the cache an object whose delete was rolled
back.

Scope each flush callback to the savepoint stack active when it is
registered (tuple(connection.savepoint_ids)); _pending_batch now matches
on alias and scope. Each savepoint scope gets its own callback and batch,
so a rolled-back savepoint's callback is pruned by Django automatically.
Coalescing still holds within a scope; cross-scope ops produce one job
per scope, and correctness is preserved because the worker re-fetches
live rows at flush time.

Add regression coverage for the nested-rollback leak plus committed
nested scopes, cross-scope save-then-delete, sibling savepoints, and
deep nesting with a middle rollback.
Relocate SearchCacheJob into netbox/search/jobs.py, matching the per-app
job convention (core/jobs.py, extras/jobs.py); tasks.py was the lone
violation. The job is now thin: it delegates to the backend.

Move the deferred-update logic (the remove loop, the re-fetch +
atomic remove-then-cache loop, and the missing-schema skip) from the
free function update_search_cache onto CachedValueSearchBackend as
apply_deferred_updates(), with _is_missing_schema and the Postgres
SQLSTATE set as private members of that class. The CachedValue specifics
(_remove_by_id, cache(..., using=...)) are now reached only from within
the backend that owns them, rather than from the job and the inline
fallback reaching into backend internals. SearchBackend gains an abstract
apply_deferred_updates() (raising NotImplementedError like its siblings)
so the deferred contract is explicit for custom SEARCH_BACKEND classes.

deferred._flush now dispatches via SearchCacheJob (worker path) or
search_backend.apply_deferred_updates() (inline fallback). This renames
and relocates the backends/deferred/jobs import cycle but does not break
it; the single function-local import in _flush remains, and the clean fix
(extracting the search_backend singleton) is still tracked in #22485.
@jnovinger

Copy link
Copy Markdown
Member Author

Both addressed, in two commits now pushed.

Nested atomic() rollback leak (19ba073c9). Confirmed with a failing regression test first, and the root cause is what you described. The mechanism: Django scopes on_commit callbacks to savepoints by the savepoint-id set captured when the callback is registered, and savepoint_rollback prunes callbacks by that set. Our single callback was registered at the outer level and only mutated (not re-registered) from the inner savepoint, so the inner rollback never pruned it and the op flushed on the outer commit.

Fix: scope each flush callback to the savepoint stack active when it is registered (tuple(connection.savepoint_ids)). _pending_batch now matches on alias and scope, so each savepoint scope gets its own callback and batch, and Django's existing savepoint_rollback pruning discards a rolled-back scope's callback for free, no manual bookkeeping. Coalescing still holds within a scope; cross-scope operations produce one job per scope, which is correct because the worker re-fetches live rows at flush time. Added your requested regression test plus coverage for committed nested scopes, cross-scope save-then-delete, sibling savepoints, and deep nesting with a middle rollback.

Code organization (4b87d7383). Done as you sketched. SearchCacheJob moved to netbox/search/jobs.py and is thin; it delegates to CachedValueSearchBackend.apply_deferred_updates(), which now owns the deferred-update logic and the missing-schema handling (_is_missing_schema and the Postgres SQLSTATE set are private to that class). SearchBackend gains an abstract apply_deferred_updates() raising NotImplementedError like its siblings, so the contract is explicit for custom SEARCH_BACKEND classes rather than leaking _remove_by_id / cache(..., using=...) assumptions. Old tasks.py is gone, and I trimmed the comments while splitting things up.

One thing this does not fully resolve: it renames and relocates the backends -> deferred -> jobs -> backends cycle but does not break it. deferred._flush still carries a single function-local import, because deferred.py can't import the search_backend singleton at module level (it's bound at the bottom of backends.py, after backends.py imports deferred). Breaking the cycle properly means extracting the singleton and moving signal wiring into an AppConfig.ready(), which is broad enough that I've kept it separate as #22485.

@jnovinger jnovinger requested a review from pheus June 22, 2026 10:10

@pheus pheus left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! This looks very close.

I have two small follow-ups.

For the delete/removal tests, could we make sure the object PK is captured before calling delete() wherever we assert against CachedValue.object_id? Assertions using site.pk after site.delete() can accidentally query for object_id=None instead of the deleted object’s original ID.

It would also be useful for at least one delete/removal test to seed existing CachedValue rows before the delete and then assert that rows for the captured PK are removed.

One other API question: do we want deferred indexing to become part of the base SearchBackend contract? apply_deferred_updates() makes sense for CachedValueSearchBackend, but it is a new expectation for custom SEARCH_BACKEND implementations that may previously have only implemented cache(), remove(), search(), and clear().

Other than that, this looks good to me. Thanks again for iterating on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants