Skip to content

feat: add json_tree explain format#5576

Open
Swiddis wants to merge 7 commits into
opensearch-project:mainfrom
Swiddis:feat/explain-json
Open

feat: add json_tree explain format#5576
Swiddis wants to merge 7 commits into
opensearch-project:mainfrom
Swiddis:feat/explain-json

Conversation

@Swiddis

@Swiddis Swiddis commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds a new machine-readable explain format. Working with the other format is fine for humans but hard to parse and analyze in bulk. This should help simplify some types of testing for e.g. #5505, as well as address #5519, and enable work that lets us see whether specific optimizations are applied over a query set en masse.

Existing explain queries are unaffected, the new format is accessible via _plugins/_ppl/_explain?format=json_tree.

> curl -XPOST localhost:9200/_plugins/_ppl/_explain?format=json_tree -H'Content-Type: application/json' --data '{"query":"source=logs | stats avg(latency_ms) as avg_lat by http_method, status_code | sort - avg_lat | head"}'
{
  "calcite": {
    "logical": {
      "rels": [
        {
          "id": "0",
          "relOp": "org.opensearch.sql.opensearch.storage.scan.CalciteLogicalIndexScan",
          "table": [
            "OpenSearch",
            "logs"
          ],
          "inputs": []
        },
        {
          "id": "1",
          "relOp": "LogicalProject",
          "fields": [
            "http_method",
            "status_code",
            "latency_ms"
          ],
          "exprs": [
            {
              "input": 5,
              "name": "$5"
            },
            {
              "input": 23,
              "name": "$23"
            },
            {
              "input": 14,
              "name": "$14"
            }
          ]
        },
        {
          "id": "2",
          "relOp": "LogicalAggregate",
          "group": [
            0,
            1
          ],
          "aggs": [
            {
              "agg": {
                "name": "AVG",
                "kind": "AVG",
                "syntax": "FUNCTION"
              },
              "type": {
                "type": "DOUBLE",
                "nullable": true
              },
              "distinct": false,
              "operands": [
                2
              ],
              "name": "avg_lat"
            }
          ]
        },
        {
          "id": "3",
          "relOp": "LogicalProject",
          "fields": [
            "avg_lat",
            "http_method",
            "status_code"
          ],
          "exprs": [
            {
              "input": 2,
              "name": "$2"
            },
            {
              "input": 0,
              "name": "$0"
            },
            {
              "input": 1,
              "name": "$1"
            }
          ]
        },
        {
          "id": "4",
          "relOp": "LogicalSort",
          "collation": [
            {
              "field": 0,
              "direction": "DESCENDING",
              "nulls": "LAST"
            }
          ],
          "fetch": {
            "literal": 10,
            "type": {
              "type": "INTEGER",
              "nullable": false
            }
          }
        },
        {
          "id": "5",
          "relOp": "org.opensearch.sql.calcite.plan.rel.LogicalSystemLimit",
          "collation": [
            {
              "field": 0,
              "direction": "DESCENDING",
              "nulls": "LAST"
            }
          ],
          "fetch": {
            "literal": 10000,
            "type": {
              "type": "INTEGER",
              "nullable": false
            }
          },
          "type": "QUERY_SIZE_LIMIT"
        }
      ]
    },
    "physical": {
      "rels": [
        {
          "id": "0",
          "relOp": "org.opensearch.sql.opensearch.storage.scan.CalciteEnumerableIndexScan",
          "table": [
            "OpenSearch",
            "logs"
          ],
          "PushDownContext": "[AGGREGATION->rel#950:LogicalAggregate.NONE.[](input=RelSubset#949,group={0, 1},avg_lat=AVG($2)), PROJECT->[avg_lat, http_method, status_code]]",
          "sourceBuilder": {
            "from": 0,
            "size": 0,
            "timeout": "1m",
            "aggregations": {
              "composite_buckets": {
                "composite": {
                  "size": 10000,
                  "sources": [
                    {
                      "http_method": {
                        "terms": {
                          "field": "http_method.keyword",
                          "missing_bucket": true,
                          "missing_order": "first",
                          "order": "asc"
                        }
                      }
                    },
                    {
                      "status_code": {
                        "terms": {
                          "field": "status_code",
                          "missing_bucket": true,
                          "missing_order": "first",
                          "order": "asc"
                        }
                      }
                    }
                  ]
                },
                "aggregations": {
                  "avg_lat": {
                    "avg": {
                      "field": "latency_ms"
                    }
                  }
                }
              }
            }
          },
          "inputs": []
        },
        {
          "id": "1",
          "relOp": "org.opensearch.sql.opensearch.planner.physical.CalciteEnumerableTopK",
          "collation": [
            {
              "field": 0,
              "direction": "DESCENDING",
              "nulls": "LAST"
            }
          ],
          "fetch": {
            "literal": 10,
            "type": {
              "type": "INTEGER",
              "nullable": false
            }
          }
        },
        {
          "id": "2",
          "relOp": "org.apache.calcite.adapter.enumerable.EnumerableLimit",
          "fetch": {
            "literal": 10000,
            "type": {
              "type": "INTEGER",
              "nullable": false
            }
          }
        }
      ]
    }
  }
}

Related Issues

Resolves #5519

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Swiddis added 5 commits June 19, 2026 22:53
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
String logical = RelOptUtil.toString(rel, SqlExplainLevel.NO_ATTRIBUTES);
listener.onResponse(
new ExplainResponse(new ExplainResponseNodeV2(logical, null, null)));
if (format == Format.JSON_TREE) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core serialization block lives here, the rest is mostly plumbing -- we use calcite's own RelJsonWriter

@github-actions

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The parseSourceBuilderInPhysicalTree method silently catches all exceptions and only logs at debug level. If JSON parsing fails for a sourceBuilder field, the field remains as a string instead of being parsed to an object. This breaks the contract of the json_tree format, which promises structured JSON objects. Clients expecting parsed objects will receive strings instead, potentially causing downstream parsing failures or incorrect behavior.

private void parseSourceBuilderInPhysicalTree(Object physicalTree) {
  try {
    if (!(physicalTree instanceof Map)) {
      return;
    }
    Map<String, Object> tree = (Map<String, Object>) physicalTree;
    Object relsObj = tree.get("rels");
    if (!(relsObj instanceof List)) {
      return;
    }

    List<Object> rels = (List<Object>) relsObj;
    for (Object relObj : rels) {
      if (!(relObj instanceof Map)) {
        continue;
      }
      Map<String, Object> rel = (Map<String, Object>) relObj;

      // Parse sourceBuilder if it exists as a JSON string
      Object sourceBuilderObj = rel.get("sourceBuilder");
      if (sourceBuilderObj instanceof String) {
        try {
          String sourceBuilderJson = (String) sourceBuilderObj;
          Object parsed = objectMapper.readValue(sourceBuilderJson, Object.class);
          rel.put("sourceBuilder", parsed);
        } catch (Exception e) {
          logger.debug("Failed to parse sourceBuilder JSON: {}", e.getMessage());
        }
      }
    }
  } catch (Exception e) {
    logger.warn("Failed to parse sourceBuilder in physical tree: " + e.getMessage());
  }
}
Possible Issue

In the json_tree format branch, if physicalError.get() is not null, the exception is thrown but the CalcitePlanContext.skipEncoding ThreadLocal is never reset. This occurs because the finally block that resets it is outside the json_tree branch. If the same thread is reused for subsequent requests, skipEncoding remains set to true, potentially affecting those requests.

if (physicalError.get() != null) {
  throw physicalError.get();
}

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Handle unsupported format parameter explicitly

The default implementation silently ignores the format parameter, which could lead
to unexpected behavior when callers expect format-specific output. Consider logging
a warning or throwing an exception to alert implementers that format support is not
implemented.

core/src/main/java/org/opensearch/sql/executor/ExecutionEngine.java [79-87]

 default void explain(
     RelNode plan,
     ExplainMode mode,
     Format format,
     CalcitePlanContext context,
     ResponseListener<ExplainResponse> listener) {
+  if (format != null && format != Format.JSON && format != Format.YAML) {
+    listener.onFailure(
+        new UnsupportedOperationException(
+            getClass().getSimpleName() + " does not support format: " + format));
+    return;
+  }
   // Default: ignore format parameter, delegate to old signature for BWC
   explain(plan, mode, context, listener);
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion adds explicit validation for unsupported formats, which improves error handling. However, the check format != Format.JSON && format != Format.YAML is overly restrictive since the new Format.JSON_TREE is a valid format that should be supported. The default implementation is intentionally delegating to the old signature for backward compatibility, so the current behavior is acceptable.

Low
Validate sourceBuilder parsing completion

The parseSourceBuilderInPhysicalTree method modifies the physicalTree object
in-place but doesn't validate that the modification succeeded. If parsing fails
silently (caught by the internal try-catch), the tree may contain inconsistent data
with some sourceBuilders parsed and others not.

opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java [180-213]

-parseSourceBuilderInPhysicalTree(physicalTree);
+boolean parseSuccess = parseSourceBuilderInPhysicalTree(physicalTree);
+if (!parseSuccess) {
+  logger.warn("Some sourceBuilder fields could not be parsed in physical tree");
+}
Suggestion importance[1-10]: 4

__

Why: The suggestion proposes returning a boolean to track parsing success, but the parseSourceBuilderInPhysicalTree method already has internal error handling with debug logging at line 206. The method is designed to be best-effort, and partial failures are acceptable. Adding a return value would require changing the method signature without significant benefit since failures are already logged.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Expose queried indices as a structured field in PPL _explain response

1 participant