Skip to content

feat(tesseract): native domain model representation behind env flag#10986

Open
waralexrom wants to merge 2 commits into
masterfrom
tesseract-native-model-data
Open

feat(tesseract): native domain model representation behind env flag#10986
waralexrom wants to merge 2 commits into
masterfrom
tesseract-native-model-data

Conversation

@waralexrom
Copy link
Copy Markdown
Member

Summary

Introduces the Tesseract native domain model — a Rust-side representation of the schema and the bridge to populate it from JS — staged behind the off-by-default CUBEJS_TESSERACT_NATIVE_MODEL flag. The model is built and held but not yet consumed for SQL: the planner stays on the existing per-request path, so this PR is a no-op in production and a foundation for the follow-up that routes planning through the model.

Changes

  • Domain model (cubesqlplanner/src/model/*): cubes, measures, dimensions, segments, joins, pre-aggregations, access policies, and view resolution, plus the cube_bridge traits and SchemaModelBuilder that populate it from the JS schema.
  • Native endpoints + JS wrapper: prepareModel builds the model and hands JS a JsBox handle wrapped in a TesseractModel; CubeEvaluator builds it at the end of compile() only when CUBEJS_TESSERACT_NATIVE_MODEL is enabled. BaseQuery is unchanged (still nativeBuildSqlAndParams).
  • Measure types: MeasureType now supports the multi-stage-only rank / numberAgg types so cubes with rank measures build; build_multi_stage_spec maps rank to a filtering stage.
  • Scope trim: hierarchies are deliberately excluded — they're presentation-only metadata (BI drill-down, /meta) and never participate in SQL generation. Pre-aggregation build/refresh metadata (refresh_key, indexes, build_range, …) is collected but not yet read, kept on purpose for the upcoming index/refresh-key SQL work.

Testing

  • cargo check + cargo clippy --tests clean; 975 cubesqlplanner lib tests pass (incl. YAML-fixture model build with rank/numberAgg).
  • yarn test:bridge → 205/205 native bridge tests, including the new model-roundtrip suite and the extended object-bridges-coverage for the new cube/measure/dimension/segment/granularity/pre-agg getters.
  • tsc passes on @cubejs-backend/shared, @cubejs-backend/native, @cubejs-backend/schema-compiler.
  • Flag off by default → no production behavior change.

…ehind CUBEJS_TESSERACT_NATIVE_MODEL

Introduces the Tesseract domain Model (cubesqlplanner/src/model/*) and the
cube_bridge traits to populate it from the JS schema, plus the native
endpoints (prepareModel / modelBuildSqlAndParams) and the TesseractModel
JS wrapper.

Population is gated behind the new CUBEJS_TESSERACT_NATIVE_MODEL flag
(off by default): CubeEvaluator builds the model at the end of compile()
only when the flag is on. The planner is NOT routed through the model yet
— BaseQuery stays on the per-request nativeBuildSqlAndParams path, so the
model is built and held but not consumed for SQL.

MeasureType supports the multi-stage-only rank / numberAgg types so cubes
with rank measures build; build_multi_stage_spec maps rank to a filtering
stage on the parsed enum.

Tests: native bridge model-roundtrip + extended object-bridges-coverage
for the new cube/dimension/measure/segment/granularity/pre-agg getters;
Rust YAML-fixture model build incl. rank/numberAgg.
Hierarchies are presentation-only metadata (BI drill-down, exposed via
/meta) and never participate in SQL generation, so they don't need to be
collected into the Tesseract domain model that the planner will read.

Removes model/hierarchy.rs and cube_bridge/hierarchy_definition.rs,
Cube.hierarchies, ViewSpec.evaluated_hierarchies, IncludedMemberKind::
Hierarchy, the CubeDefinition.hierarchies() getter, and the harness /
fixture references. View included members of kind "hierarchies" are now
skipped during model build (they contribute no SQL member).
@waralexrom waralexrom requested review from a team as code owners June 1, 2026 17:17
@github-actions github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels Jun 1, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 1, 2026

Claude finished @waralexrom's task in 3m 22s —— View job


Code review — verdict: approve with notes (0 high, 0 medium, 4 low)

The PR cleanly stages the native model behind CUBEJS_TESSERACT_NATIVE_MODEL (default off), routes through tryPrepareTesseractModel with errors degraded to warnings, and never touches the planner — so the production behavior promise (no-op when flag is off) holds. The Rust side is well-organized, the bridge surface is mirrored by the JS contract in bridge-fixtures.ts, and the new model-roundtrip test exercises real prepareModel → handle → describe roundtrips. Posted 4 low-severity inline notes; none block merge.

Details

Inline findings

  • model/builder.rs:375-389build_case_label swallows MemberSql::sql() errors and substitutes an empty CaseLabel::String(""). Since this is the one-shot build phase (and tryPrepareTesseractModel already has a warning channel), the natural fix is to return Result<CaseLabel, CubeError> and let a malformed case label fail fast at build time.
  • model/model.rs:69-71ModelBuilder::add_cube silently overwrites on duplicate CubeName. build already returns Result; surfacing a "duplicate cube" error would catch a class of schema-compiler bugs that otherwise vanish silently.
  • model/path.rs:53-65MemberPath::parse is intentionally 2-segment-only. None of the current call sites feed join-hinted paths through it, but the restriction is load-bearing and quietly violatable; worth either a clearer doc-comment or growing the type to model join hints explicitly.
  • packages/cubejs-schema-compiler/src/compiler/SchemaSource.ts:40-46wrapDimension / wrapPreAggregation mutate gran.name/idx.name on the underlying objects, leaking a name field back onto the cached EvaluatedCube. Spreading into a fresh object would keep this wrapper read-only against the evaluator.

Notes on what looks right

  • CubeEvaluator.tryPrepareTesseractModel correctly degrades native errors into warnings, and compile() still proceeds — schema compilation cannot fail just because the model layer hiccups.
  • Lifetime story (NativeRustHandle held by TesseractModel, released via JS GC when the evaluator is replaced) is sound and called out in comments.
  • MeasureType::Rank / NumberAgg only become valid when the multi-stage flag is set; build_multi_stage_spec maps Rank to Filtering and everything else to Aggregating, matching the JS validator contract.
  • build_included_member correctly drops "hierarchies" entries via Result<Option<_>> + transpose() + filter_map, so a view that includes a hierarchy contributes no SQL member.
  • The bridge-test harness uses a drift guard (invoke_bridge_inner compares invoked vs. expected field sets) so a new trait method without a matching invoke entry fails loudly — good defense against silent coverage loss.

Security / perf / docs

  • No new auth or input-validation surface; prepareModel consumes already-validated schema-compiler output.
  • Model build is per-schema-compile, not per-request, so cost is one-time.
  • Env flag isn't documented user-facing yet — fine for an off-by-default feature, but worth a docs follow-up once it flips on.
· [Branch](https://github.com/cube-js/cube/tree/tesseract-native-model-data)

Comment on lines +375 to +389
fn build_case_label(label: StringOrSql) -> CaseLabel {
match label {
StringOrSql::String(s) => CaseLabel::String(s),
StringOrSql::MemberSql(member) => {
// StructWithSqlMember holds a `sql` callable behind another
// trait — surface it as Expression. This swallows the
// `sql()` Result; we'd rather fail at build time, but the
// bridge surface returns it eagerly.
match member.sql() {
Ok(sql) => CaseLabel::Sql(Expression::new(sql)),
Err(_) => CaseLabel::String(String::new()),
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_case_label silently swallows a MemberSql::sql() error and substitutes an empty CaseLabel::String(""). That converts a real bridge failure (e.g. a malformed JS callable returned for a case label) into a built model with a broken/empty label that won't surface until a query actually hits this case. The comment acknowledges this is a shortcut, but since this is the model-build phase (one-shot, errors are already meant to be reported via tryPrepareTesseractModel's warning channel), the natural fix is to make build_case_label return Result<CaseLabel, CubeError> and propagate the error up through build_case_variant. That way a broken label fails fast at build time rather than producing wrong SQL later.

Comment on lines +69 to +71
pub fn add_cube(&mut self, cube: Rc<Cube>) {
let name = cube.name.clone();
self.cubes.insert(name, cube);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_cube silently overwrites on duplicate CubeName. If two cube definitions ever share a name (e.g. a bug in schema-compiler, a stale view) the second one wins with no diagnostic — the model loses a cube and nothing tells you. Consider returning an error when the key is already present, since SchemaModelBuilder::build already returns Result<Model, CubeError> and could surface a clear "duplicate cube" message at model-build time.

Comment on lines +53 to +65
/// Parses a `Cube.member` reference. Returns an error for paths
/// that do not split into exactly two segments (we'll grow this
/// to support view-style join chains later).
pub fn parse(path: &str) -> Result<Self, cubenativeutils::CubeError> {
match path.split_once('.') {
Some((cube, name)) if !cube.is_empty() && !name.is_empty() => {
Ok(MemberPath::new(CubeName::new(cube), name.to_string()))
}
_ => Err(cubenativeutils::CubeError::user(format!(
"Invalid member path: {path}"
))),
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemberPath::parse is restricted to exactly two segments, but it's already being called on inputs that the JS layer may hand over with a join-hint prefix (e.g. evaluatePreAggregationReferences collects with { collectJoinHints: true }, producing View.Cube.member). In the current PR none of the call sites in builder.rs go through those join-hinted paths (access-policy resolves to cube.member, alias members come from pathFromArray of a 2-element array, view includedMember.memberPath is cube.member, etc.), so this is fine for now — but the restriction is load-bearing and easy to violate as soon as something starts feeding multi-segment paths through. Worth either: (a) leaving a clearer doc-comment that this is intentionally 2-segment-only and any join-hinted input needs different handling, or (b) growing the type now to model Vec<JoinHint> + member so callers can't accidentally bypass it.

Comment on lines +40 to +46
const wrapped = Object.create(dim);
wrapped.granularities = Object.entries(dim.granularities).map(([name, gran]: [string, any]) => {
if (gran.name === undefined) {
gran.name = name;
}
return gran;
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapDimension / wrapPreAggregation mutate the underlying gran / idx objects in place (gran.name = name). These objects live on cube.dimensions[*].granularities / cube.preAggregations[*].indexes in the evaluator, so the mutation leaks back into the cached EvaluatedCube state. It's idempotent on repeated calls, but it also means any downstream code that introspects the evaluator will start seeing a name field stamped on every granularity / index — surprising for a wrapper that's meant to be read-only against the source of truth. Cleaner: spread { name, ...gran } into a fresh object rather than mutating the original.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.26%. Comparing base (35bed42) to head (96675df).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10986      +/-   ##
==========================================
+ Coverage   78.81%   83.26%   +4.45%     
==========================================
  Files         470      254     -216     
  Lines       93438    76836   -16602     
  Branches     3466        0    -3466     
==========================================
- Hits        73644    63981    -9663     
+ Misses      19291    12855    -6436     
+ Partials      503        0     -503     
Flag Coverage Δ
cube-backend ?
cubesql 83.26% <ø> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update Javascript code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants