feat: [DSM-142] Use CanisterStates in ReplicatedState#10287
Open
alin-at-dfinity wants to merge 22 commits into
Open
feat: [DSM-142] Use CanisterStates in ReplicatedState#10287alin-at-dfinity wants to merge 22 commits into
CanisterStates in ReplicatedState#10287alin-at-dfinity wants to merge 22 commits into
Conversation
Lays the foundation for splitting `ReplicatedState::canister_states` into
"hot" (potentially active) and "cold" (definitely idle) pools, so that
per-round operations can skip the long tail of idle canisters.
This PR is intentionally a no-op for the running replica: it only adds
the new types and predicates. The integration into `ReplicatedState` and
the migration of all consumers follow in subsequent PRs.
Specifically:
* `CanisterState::is_cold()` — pure predicate that classifies a canister
as "definitely idle": no input/output, no task queue entries, no
heartbeat method, inactive global timer, not `Stopping`, no
unexpired best-effort callbacks, and no scheduler debits.
* `CallContextManager::has_unexpired_callbacks()` and the matching
`SystemState::has_unexpired_callbacks()` accessor, used by `is_cold`.
* `CanisterStates`, a hot/cold-partitioned container with eager
promotion (mutations land in `hot`) and lazy demotion (via
`try_cool`/`try_cool_all`), plus the common map operations
(`get`/`get_mut`/`insert`/`remove`/`contains_key`/`len`/`is_empty`/
`retain`), per-pool iterators (`hot_iter`/`hot_values`/
`hot_values_mut`), merged iterators in `CanisterId` order
(`all_iter`/`all_keys`/`all_values`), and bulk mutation
(`for_each_mut`/`try_for_each_mut`).
* `CanisterStates::validate_strict_split()` for the canonical-partition
invariant used in checkpoint validation.
* `debug_assert_invariants()` runs on every mutating operation in
debug builds.
`ColdStats` and the aggregate accessors (`total_compute_allocation`,
`total_canister_memory_usage`, `memory_taken`, `callback_count`, ...)
are intentionally **not** part of this PR — they will be added once the
struct is in place.
Co-authored-by: Cursor <cursoragent@cursor.com>
Maintains a small `ColdStats` aggregate over the canisters in the `cold` pool, updated incrementally on every transition into / out of `cold`. This lets the "touch every canister" aggregate queries — `total_compute_allocation`, `total_canister_memory_usage`, `memory_taken`, `callback_count`, `guaranteed_response_message_memory_taken`, `best_effort_message_memory_taken` — run in `O(|hot|)` instead of `O(|all canisters|)`, which is the primary motivation for the hot/cold split on subnets with a long tail of idle canisters. The aggregates are derived (not persisted) and are reconstructed by `CanisterStates::new` on checkpoint load. `debug_assert_invariants` (now also runs an `O(|cold|)` recompute and compares against the live aggregate) ensures every mutating method keeps them in sync, and the `ColdStats` struct stays module-private — callers always reach the totals through the public aggregator methods on `CanisterStates`. `MemoryTaken`'s fields are bumped from private to `pub(crate)` so that `CanisterStates::memory_taken` can construct the struct directly, keeping `MemoryTaken` in its current home in `replicated_state.rs`. `CanisterStates::memory_taken` itself is `pub(crate)` and will be wired up to `ReplicatedState::memory_taken` in the next PR; an `#[allow(dead_code)]` keeps the build warning-free until then. Aggregator behaviour is exercised by two new tests (`memory_aggregators_combine_hot_and_cold`, `callback_count_combines_hot_and_cold`) and the bookkeeping discipline is exercised by an extended set of `*_updates_cold_stats*` tests covering `insert`, `remove`, `try_cool*`, `for_each_mut`, `try_for_each_mut`, and `retain`. Co-authored-by: Cursor <cursoragent@cursor.com>
…ry. Rename raw_memory to execution_memory, so it better matches the equivalent MemoryTaken field. Update documentation and tests.
A canister can satisfy `CanisterState::is_cold()` while still holding a guaranteed-response slot reservation: `is_cold()` only requires empty input/output *messages* (the pool count) and no unexpired best-effort callback, both of which are independent of whether the canister has in-flight guaranteed-response requests. A canister that has pushed a guaranteed-response request that's already been moved to an outgoing stream still keeps the input-slot reservation for the eventual response, which contributes `MAX_RESPONSE_COUNT_BYTES` to its `guaranteed_response_message_memory_usage()`. The previous commit dropped this field from `ColdStats` on the assumption it was always zero. It isn't, and the consequence is that `guaranteed_response_message_memory_taken()` quietly under-reports subnet-wide memory: promoting a cold canister with a reservation to `hot` (e.g. on the next `get_mut`) makes the subnet total jump up out of nowhere, breaking conservation invariants in downstream code (stream handler `debug_assert!`s, in particular). Restore the field and the corresponding `add`/`sub` bookkeeping, fold it into `guaranteed_response_message_memory_taken`, `total_canister_memory_usage`, and `memory_taken`, and add a focused test (`cold_canister_with_guaranteed_response_reservation_is_aggregated`) exercising the case via `push_output_request` followed by draining the output queue. Best-effort message memory remains hot-only: an unexpired best-effort callback forces the canister into `hot`, and any expired best-effort callback shows up as a pending input which also forces `hot`. Co-authored-by: Cursor <cursoragent@cursor.com>
Switches `ReplicatedState::canister_states` from a flat
`BTreeMap<CanisterId, Arc<CanisterState>>` to `CanisterStates`,
exposing the hot/cold partition to the rest of the system and
migrating every caller.
`ReplicatedState` changes:
* `canister_states` field is now `CanisterStates`.
* Drop `canisters_iter_mut()`. Round-level callers move to
`hot_canisters_iter_mut()` (skips the long tail of cold
canisters); bulk callers move to `canisters_for_each_mut` /
`canisters_try_for_each_mut`, which iterate every canister and
re-establish the partition afterwards.
* Add `hot_canisters_iter()` for read-only hot-only iteration.
* Add `repartition_canister_states()`, called from
`StateManager::commit_and_certify` after
`flush_checkpoint_ops_and_page_maps` to drive canisters that went
quiet during the round back into `cold` before checkpointing, so
that replicas continuing through a checkpoint and replicas
(re)starting from it agree on the partition.
* `take_canister_states` / `put_canister_states` now exchange the
`CanisterStates` directly instead of going through a flat
`BTreeMap` round-trip.
* Aggregator delegations: `total_compute_allocation`, `memory_taken`,
`total_canister_memory_usage`,
`guaranteed_response_message_memory_taken`,
`best_effort_message_memory_taken`, `callback_count` now delegate
to `CanisterStates` and run in `O(|hot|)`.
`state_manager`:
* `commit_and_certify` calls `state.repartition_canister_states()`
after `flush_checkpoint_ops_and_page_maps` and before tip
handover.
* `validate_eq_canister_states` calls
`CanisterStates::validate_strict_split` on the reference state to
verify that the persisted partition matches what
`CanisterStates::new` would produce on a fresh load.
* `flush_checkpoint_ops_and_page_maps` and
`switch_to_checkpoint` switch from `canisters_iter_mut` to
`canisters_for_each_mut` / `canisters_try_for_each_mut`.
* Bench: `bench_traversal` likewise.
`execution_environment`:
* `scheduler.rs`: scheduler hot-only iteration where appropriate
(`add_heartbeat_and_global_timer_tasks`,
`purge_expired_ingress_messages`, the
`ongoing_long_install_code` check); migrate
`charge_canisters_for_resource_allocation_and_usage` and the
log-memory-store migration loop to `canisters_for_each_mut`.
* `round_schedule.rs`: `partition_canisters_to_cores` now takes /
returns a `CanisterStates`; idle canisters are dropped before the
main hot-canister iteration.
* `query_handler.rs`, `execution_environment.rs`: callers updated.
* `canister_manager/tests.rs`, scheduler tests
(`scheduling.rs`, `metrics.rs`, `dts.rs`, `ecdsa.rs`,
`round_schedule/tests.rs`, `test_utilities.rs`, `tests.rs`)
updated.
* `benches/scheduler.rs`: updated.
`canonical_state`:
* `lazy_tree_conversion.rs`: new `CanisterStatesFork<'_>` that
presents a `CanisterStates` as a `LazyFork` over the merged
hot+cold pools in `CanisterId` order.
`canister_sandbox`:
* `sandboxed_execution_controller.rs`: switch
`evict_sandbox_processes` to per-id `state.canister_state(id)` /
`state.canister_priority(id)` lookups (also enables removing the
bulk `canister_accumulated_priorities` method). This duplicates
the standalone "perf: Look up sandbox scheduler priorities per
canister" PR; whichever lands first, the other becomes a no-op.
`messaging`:
* `stream_handler/tests.rs`: pre-heat `LOCAL_CANISTER` in the
`out_of_memory` reject-signal test so that the expected and
inducted states share the same hot/cold partition.
* `stream_builder/tests.rs`, `state_machine/tests.rs`,
`tests/common/mod.rs`: caller updates.
`replicated_state` queues and system_state:
* `CanisterQueues` / `SystemState` `local_canisters` parameter type
flips from `&BTreeMap<CanisterId, Arc<CanisterState>>` to
`&CanisterStates` (no behavioural change; queues only need
`contains_key`).
* `replicated_state.rs` deletes the now-unused
`canister_accumulated_priorities` method.
`metrics.rs`:
* `check_dts` walks `hot_canisters_iter()` (only hot canisters can
have non-empty task queues).
* `check_subnet_memory_usage` switches to
`CanisterStates::memory_taken()` for `O(|hot|)` aggregation.
`test_utilities` and `state_tool`:
* `test_utilities/execution_environment`, `test_utilities/state`,
and `state_tool/src/commands/canister_metrics.rs` updated to use
the new iteration APIs.
Co-authored-by: Cursor <cursoragent@cursor.com>
… canisters fron one pool to the other; add more tests for is_cold(); misc test additions.
pull Bot
pushed a commit
to bit-cook/ic
that referenced
this pull request
May 27, 2026
dfinity#10288) Move the "drop idle canisters with 0-100 AP from the subnet schedule" logic out of the `NextExecution::None` branch of the main per-canister loop and into a dedicated pre-loop at the top of `start_iteration`. Behavior is unchanged: the same set of idle canisters with priorities in the 0-100 AP range get dropped. Also clarify the doc comment for `IterationSchedule::partition_canisters_to_cores`. This is a small standalone refactor extracted from dfinity#10287, where the main per-canister loop will switch from iterating all canisters to iterating only hot canisters (at which point hoisting becomes a correctness requirement: cold canisters would otherwise no longer be visited by the main loop and their idle entries would not be dropped). --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: IDX GitHub Automation <infra+github-automation@dfinity.org>
…c comments; simplify memory_taken().
…lementation; also apply pub(crate) to best_effort_message_memory_taken() and guaranteed_response_message_memory_taken(), as they are also potentially dangerous to use directly.
…d() test, so that all stats are covered; and for both hot and cold canisters.
Base automatically changed from
alin/DSM-142-canister-states-cold-stats
to
master
May 29, 2026 11:24
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Switches
ReplicatedState::canister_statesfrom a flatBTreeMap<CanisterId, Arc<CanisterState>>toCanisterStates, exposing the hot/cold partition to the rest of the system and migrating every caller.ReplicatedStatechanges:canister_statesfield is nowCanisterStates.canisters_iter_mut(). Round-level callers move tohot_canisters_iter_mut()(skips the long tail of cold canisters); bulk callers move tocanisters_for_each_mut/canisters_try_for_each_mut, which iterate every canister and re-establish the partition afterwards.hot_canisters_iter()for read-only hot-only iteration.repartition_canister_states(), called fromStateManager::commit_and_certifyafterflush_checkpoint_ops_and_page_mapsto drive canisters that went quiet during the round back intocoldbefore checkpointing, so that replicas continuing through a checkpoint and replicas (re)starting from it agree on the partition.take_canister_states/put_canister_statesnow exchange theCanisterStatesdirectly instead of going through a flatBTreeMapround-trip.total_compute_allocation,memory_taken,total_canister_memory_usage,guaranteed_response_message_memory_taken,best_effort_message_memory_taken,callback_countnow delegate toCanisterStatesand run inO(|hot|).state_manager:commit_and_certifycallsstate.repartition_canister_states()afterflush_checkpoint_ops_and_page_mapsand before tip handover.validate_eq_canister_statescallsCanisterStates::validate_strict_spliton the reference state to verify that the persisted partition matches whatCanisterStates::newwould produce on a fresh load.flush_checkpoint_ops_and_page_mapsandswitch_to_checkpointswitch fromcanisters_iter_muttocanisters_for_each_mut/canisters_try_for_each_mut.bench_traversallikewise.execution_environment:scheduler.rs: scheduler hot-only iteration where appropriate (add_heartbeat_and_global_timer_tasks,purge_expired_ingress_messages, theongoing_long_install_codecheck); migratecharge_canisters_for_resource_allocation_and_usageand the log-memory-store migration loop tocanisters_for_each_mut.round_schedule.rs:partition_canisters_to_coresnow takes / returns aCanisterStates; idle canisters are dropped before the main hot-canister iteration.query_handler.rs,execution_environment.rs: callers updated.canister_manager/tests.rs, scheduler tests (scheduling.rs,metrics.rs,dts.rs,ecdsa.rs,round_schedule/tests.rs,test_utilities.rs,tests.rs) updated.benches/scheduler.rs: updated.canonical_state:lazy_tree_conversion.rs: newCanisterStatesFork<'_>that presents aCanisterStatesas aLazyForkover the merged hot+cold pools inCanisterIdorder.messaging:stream_builder/tests.rs,state_machine/tests.rs,tests/common/mod.rs: caller updates.replicated_statequeues and system_state:CanisterQueues/SystemStatelocal_canistersparameter type flips from&BTreeMap<CanisterId, Arc<CanisterState>>to&CanisterStates(no behavioral change; queues only needcontains_key).metrics.rs:check_dtswalkshot_canisters_iter()(only hot canisters can have non-empty task queues).check_subnet_memory_usageswitches toCanisterStates::memory_taken()forO(|hot|)aggregation.test_utilitiesandstate_tool:test_utilities/execution_environment,test_utilities/state, andstate_tool/src/commands/canister_metrics.rsupdated to use the new iteration APIs.