[fm] Add simple disk diagnoser based on zpool health#10460
Conversation
| pub(super) fn analyze( | ||
| input: &Input, | ||
| builder: &mut SitrepBuilder<'_>, | ||
| ) -> anyhow::Result<()> { |
There was a problem hiding this comment.
So, the whole point of this PR is "to be able to build something here, and re-use it", but ironically the contents of this particular DE is particularly prone to change.
The "short version" of what we're doing:
- Look at inventory, DB state, old sitreps
- Make sure a case exists for each unhealthy zpool, with a corresponding "DiskFact"
- Close old cases if their zpools is now healthy (or expunged)
We're doing this with a jumble of indices, iterations, etc. I think those will change. I think this DE will grow to track other state about these disks. I think each of these cases will potentially grow to have different facts.
|
|
||
| /// Fetch all `fm_case_fact` rows belonging to cases in the given sitrep, | ||
| /// grouped by `case_id`. | ||
| async fn fm_case_facts_read_on_conn( |
There was a problem hiding this comment.
By reading facts alongside cases, there isn't really a need to mark "DE" on the fact table, so I removed it. It's redundant data anyway.
(Figured I'd mention this because it diverges slightly from the DB structure we talked about - but still sorts facts into case-specific buckets, so we can still "parse by the case DE type").
There was a problem hiding this comment.
Hm, I still think it's probably worth including in the DB record as a structured field, even if only for debugging reasons for now.
Also, at some point, I think we are going to probably have to figure out a way to allow multiple DEs to add facts to a case, although we don't have to cross that bridge yet. Consider the example of an ereport.data_loss.possible ereport indicating that a service processor has restarted and will need to be health-checked, as described in RFD 589. Suppose we have a trivial DE for handling data loss reports from SPs by doing a complete health check of that SP. This might open a case, and then request additional health checking of that DE, which might record some facts. Suppose one of those facts includes data that another DE would use to diagnose a fault. We should figure out how that flow will work, although we don't have to in this PR...
There was a problem hiding this comment.
Couldn't each of those DEs just make a duplicate copy of that fact in their own cases? that seems like it helps keep fact lifecycle scoped "per-case" which is what we want.
I really hesitate to include this data "just to have it" because then it means we need to handle the case where "fact.de != fact.case.de", which is an impossible data corruption case we could just avoid by omitting the column
There was a problem hiding this comment.
Specifically with your data-loss case: my main argument is that "facts are associated with cases", regardless of how they're generated.
So: in the case where we have "DE 1 which does something, but wants to write down a fact for a case managed by DE 2" - I think we can make this happen in-memory during sitrep construction, but on-disk, this could look like:
- DE1 has a case C1, queries for data
- (next sitrep) DE1 sees new data for C1, decides to open a case C2 for analysis by a different DE (DE2). It can also pass along a fact for C2
- On-disk: That fact is associated with C2. We could have a "comment" about how it was originally noticed by DE1/C1? But that origination doesn't really matter
a3cddcc to
26f2ade
Compare
andrewjstone
left a comment
There was a problem hiding this comment.
It's really exciting to see this coming together!
I think it makes sense to use JSON for payloads in the DB due to the explosion in types as discussed in chat. I wonder about the versioning strategy though. The DEs in Nexus are the only things that need to interpret payloads, but they are essentially client side versioned. During an update, Nexus will not understand new payloads. Do we plan to use a two-phase update model where reporters can't issue newly added reports until a second update, or will DE's just ignore payloads they can't understand?
Nexus is performing an atomic handoff from "old" to "new" before the database can be accessed, right? I don't think we need to worry about a mixed-version Nexus scenario - I believe we'll have "old Nexus, working with old data", then we'll perform handoff, and only worry about "new Nexus working with old + new data, which it can migrate" Regardless, there are a bunch of strategies we could use for doing "fact payload" schema migration:
|
26f2ade to
67b661f
Compare
The first fault management diagnosis engine: opens a case for any
non-Online zpool whose backing physical disk is currently in service
in the control plane, and closes it on recovery or expungement.
Supporting infrastructure introduced along the way:
- DiagnosisEngineKind::Disk variant (Rust + DB enum)
- fm_case_fact child table for per-engine state (one case has 0..N
immutable facts; stable UUIDs across sitreps; participates in
copy-forward + GC like other sitrep child tables)
- CaseBuilder::{add_fact, remove_fact, facts} API
- InServiceDisk nexus-types projection consumed by FM, populated from
the existing zpool_list_all_external_batched datastore method with
policy filtering done in the background task
Schema migration: add-disk-de-and-facts (version 260) adds the 'disk'
enum value and creates fm_case_fact.
67b661f to
793b1ec
Compare
Ah, I must be misunderstanding how payloads get populated. I was presuming that it's possible for the ingester of the payload to write to the database without actually knowing the format of the payload. But if we limit ingestion of new payloads until Nexus is updated, than I agree there is no problem. |
hawkw
left a comment
There was a problem hiding this comment.
Here's an incomplete review focusing on the database models and domain types; I haven't actually gotten as far as the actual diagnosis engine yet. I figured it would be more useful to leave a smaller review sooner rather than waiting to get to the "other half" of this PR.
|
|
||
| /// Fetch all `fm_case_fact` rows belonging to cases in the given sitrep, | ||
| /// grouped by `case_id`. | ||
| async fn fm_case_facts_read_on_conn( |
There was a problem hiding this comment.
Hm, I still think it's probably worth including in the DB record as a structured field, even if only for debugging reasons for now.
Also, at some point, I think we are going to probably have to figure out a way to allow multiple DEs to add facts to a case, although we don't have to cross that bridge yet. Consider the example of an ereport.data_loss.possible ereport indicating that a service processor has restarted and will need to be health-checked, as described in RFD 589. Suppose we have a trivial DE for handling data loss reports from SPs by doing a complete health check of that SP. This might open a case, and then request additional health checking of that DE, which might record some facts. Suppose one of those facts includes data that another DE would use to diagnose a fault. We should figure out how that flow will work, although we don't have to in this PR...
| writeln!(f, "{:>indent$}{PAYLOAD:<WIDTH$} {payload}", "")?; | ||
| writeln!(f, "{:>indent$}{COMMENT:<WIDTH$} {comment}\n", "")?; |
There was a problem hiding this comment.
nit: i might put the comment before the payload, and also consider making the JSON multiline...though you might have to indent it nicely to make it not look bad.
| for CaseFact { id, payload, comment } in facts.iter() { | ||
| const PAYLOAD: &str = "payload:"; | ||
| const COMMENT: &str = "comment:"; | ||
| const WIDTH: usize = const_max_len(&[PAYLOAD, COMMENT]); | ||
|
|
||
| writeln!(f, "{BULLET:>indent$}fact {id}")?; | ||
| writeln!(f, "{:>indent$}{PAYLOAD:<WIDTH$} {payload}", "")?; | ||
| writeln!(f, "{:>indent$}{COMMENT:<WIDTH$} {comment}\n", "")?; |
There was a problem hiding this comment.
nit: i would love to have an indented displayer for facts and make this just call that for each fact, since we might want to use that elsewhere. not a huge deal though.
| CREATE TABLE IF NOT EXISTS omicron.public.fm_case_fact ( | ||
| id UUID NOT NULL, | ||
| sitrep_id UUID NOT NULL, | ||
| case_id UUID NOT NULL, |
There was a problem hiding this comment.
would like a
| case_id UUID NOT NULL, | |
| case_id UUID NOT NULL, | |
| created_sitrep_id UUID NOT NULL, |
here
| let mut support_bundles_requested = Vec::new(); | ||
| let mut bundle_data_selections_requested = Vec::new(); | ||
| let mut case_ereports = Vec::new(); | ||
| let mut case_facts = Vec::new(); |
There was a problem hiding this comment.
would be nice to be able to with_capacity this to be as long as the case's facts map...but i also notice we are not doing this for any of the other ones so it's kinda fine i guess...
| /// Open cases from the parent sitrep, copied forward into this analysis | ||
| /// input. Closed cases live separately on the (crate-private) | ||
| /// `closed_cases_copied_forward` accessor. | ||
| pub fn open_cases(&self) -> &IdOrdMap<fm::Case> { |
There was a problem hiding this comment.
probably just me being extremely persnickety but i would have kind of rather we refactor this in a separate smaller PR. not a big deal though.
The first fault management diagnosis engine: opens a case for any
non-Online zpool whose backing physical disk is currently in service
in the control plane, and closes it on recovery or expungement.
Supporting infrastructure introduced along the way:
immutable facts; stable UUIDs across sitreps; participates in
copy-forward + GC like other sitrep child tables)
the existing zpool_list_all_external_batched datastore method with
policy filtering done in the background task