Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions superset/daos/dashboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@

class DashboardDAO(BaseDAO[Dashboard]):
base_filter = DashboardAccessFilter
# Column used by MCP tools for title-based identifier fallback, so a
# slug-like identifier can resolve to a dashboard even when its slug
# field is empty.
title_column = "dashboard_title"

@classmethod
def apply_column_operators(
Expand Down
71 changes: 70 additions & 1 deletion superset/mcp_service/mcp_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from __future__ import annotations

import logging
import re
from abc import ABC, abstractmethod
from datetime import datetime, timezone
Comment on lines 18 to 23
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This fallback performs query.all() with full query_options, which in dashboard lookups includes heavy eager loads; on large instances this can load every dashboard plus related slices/owners/tags into memory just to find one match. Restrict the scan to lightweight columns (e.g., id/title only, without eager options), then fetch the single matched row with query_options. [performance]

Severity Level: Critical 🚨
- ❌ get_dashboard_info slug-fallback loads every dashboard row eagerly.
- ⚠️ Large installations risk timeouts on slug-like lookups.
- ⚠️ Increased memory pressure from unnecessary relationship loading.
Steps of Reproduction ✅
1. In `superset/mcp_service/dashboard/tool/get_dashboard_info.py:111–120`, see that
`eager_options` includes multiple `subqueryload` calls for `Dashboard.slices`,
`Slice.owners`, `Slice.tags`, `Dashboard.owners`, `Dashboard.tags`, and `Dashboard.roles`,
and these are passed to `ModelGetInfoCore` as `query_options=eager_options` at lines
122–131.

2. For a Superset instance with a large number of dashboards and charts (hundreds or
thousands), ensure there exists at least one dashboard whose `slug` is empty but whose
`dashboard_title` slugifies to a predictable value such as `"world-banks-data"`.

3. From any MCP client, call the `get_dashboard_info` tool with `request.identifier` set
to that slug-like value so that `ModelGetInfoCore._find_object` in
`superset/mcp_service/mcp_core.py:47–81` exhausts ID/UUID/slug lookups and falls through
to `_find_by_slugified_title`.

4. In `_find_by_slugified_title` (`superset/mcp_service/mcp_core.py`, lines shown in this
PR from `if self.query_options:` through the `matches = [...]` list comprehension), the
code applies the heavy `query.options(*self.query_options)` and then executes
`query.all()`, loading every dashboard plus all eagerly-loaded related
slices/owners/tags/roles into memory just to filter in Python, causing high memory usage
and slow responses for this fallback path.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/mcp_service/mcp_core.py
**Line:** 18:23
**Comment:**
	*Performance: This fallback performs `query.all()` with full `query_options`, which in dashboard lookups includes heavy eager loads; on large instances this can load every dashboard plus related slices/owners/tags into memory just to find one match. Restrict the scan to lightweight columns (e.g., id/title only, without eager options), then fetch the single matched row with `query_options`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

from typing import Any, Callable, Dict, Generic, List, Literal, Type, TypeVar
Expand All @@ -28,6 +29,23 @@
from superset.mcp_service.constants import ModelType
from superset.mcp_service.utils import _is_uuid


def _slugify(value: str) -> str:
"""Normalize a string to a slug-like form for comparison.

Lowercases, drops apostrophes so possessives collapse
("World Bank's" → "worldbanks" territory), then collapses any
remaining non-alphanumerics to single hyphens and trims
leading/trailing hyphens. Mirrors how agents typically guess slugs
from a dashboard title (e.g. "World Bank's Data" → "world-banks-data").
"""
lowered = value.lower()
# Drop apostrophes entirely so "bank's" collapses to "banks" rather than
# splitting into "bank-s". Covers straight and curly variants.
stripped = re.sub(r"['’]", "", lowered)
return re.sub(r"[^a-z0-9]+", "-", stripped).strip("-")


# Type variables for generic model tools
T = TypeVar("T") # For model objects
S = TypeVar("S", bound=BaseModel) # For Pydantic schemas
Expand Down Expand Up @@ -262,6 +280,7 @@ def __init__(
supports_slug: bool = False,
logger: logging.Logger | None = None,
query_options: list[Any] | None = None,
title_column_name: str | None = None,
) -> None:
super().__init__(logger)
self.dao_class = dao_class
Expand All @@ -270,6 +289,49 @@ def __init__(
self.serializer = serializer
self.supports_slug = supports_slug
self.query_options = query_options or []
# When set, enables a slugified-title fallback after slug lookup
# fails, so identifiers like "world-banks-data" still resolve to
# "World Bank's Data" when the dashboard's slug field is empty.
# Defaults to the DAO's `title_column` attribute when not overridden.
self.title_column_name = title_column_name or getattr(
dao_class, "title_column", None
)

def _find_by_slugified_title(self, identifier: str) -> Any:
"""Resolve a slug-like identifier by matching against slugified titles.

Loads all rows and compares `_slugify(row.title)` to `_slugify(identifier)`.
If multiple rows match, logs a warning and returns the first one —
collisions in real dashboards are rare and the caller can always
disambiguate by id or UUID.
"""
if not self.title_column_name:
return None
target = _slugify(identifier)
if not target:
return None

from superset.extensions import db

model_class = self.dao_class.model_cls
query = db.session.query(model_class)
if self.query_options:
query = query.options(*self.query_options)
matches = [
obj
for obj in query.all()
if _slugify(getattr(obj, self.title_column_name, "") or "") == target
Comment on lines +317 to +323
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Architect Review — CRITICAL

The slugified-title fallback in ModelGetInfoCore queries Dashboard rows directly via db.session.query without applying the DAO's base_filter (DashboardAccessFilter), so get_dashboard_info can return dashboards the user is not allowed to access when resolving slug-like identifiers.

Suggestion: Build the fallback query through the DAO (or call BaseDAO._apply_base_filter/DashboardAccessFilter on the query) before scanning titles so that only rows allowed by the base_filter are considered for slugified-title matches.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** superset/mcp_service/mcp_core.py
**Line:** 317:323
**Comment:**
	*CRITICAL: The slugified-title fallback in ModelGetInfoCore queries Dashboard rows directly via db.session.query without applying the DAO's base_filter (DashboardAccessFilter), so get_dashboard_info can return dashboards the user is not allowed to access when resolving slug-like identifiers.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

]
if not matches:
return None
if len(matches) > 1:
ids = [getattr(m, "id", None) for m in matches]
self._log_warning(
f"Identifier '{identifier}' matched {len(matches)} rows by "
f"slugified title (ids={ids}); returning the first. Pass an "
"id or UUID to disambiguate."
)
return matches[0]

def _find_object(self, identifier: int | str) -> Any:
"""Find object by identifier using appropriate method."""
Expand Down Expand Up @@ -309,7 +371,14 @@ def _find_object(self, identifier: int | str) -> Any:
query = db.session.query(model_class).filter(id_or_slug_filter(identifier))
if opts:
query = query.options(*opts)
return query.one_or_none()
slug_result = query.one_or_none()
if slug_result is not None:
return slug_result

# Many dashboards have empty slugs, so slug lookup alone silently
# fails when agents pass a slug-like string derived from the
# dashboard title. Fall back to slugified-title matching.
return self._find_by_slugified_title(identifier)

# If we get here, it's an invalid identifier
return None
Expand Down
213 changes: 213 additions & 0 deletions tests/unit_tests/mcp_service/test_mcp_core.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""
Unit tests for mcp_core reusable core classes.

Focused on the ModelGetInfoCore title-based fallback that resolves
slug-like identifiers (e.g. "world-banks-data") to dashboards whose
slug column is empty but whose title matches.
"""

from datetime import datetime
from unittest.mock import MagicMock, patch

import pytest
from pydantic import BaseModel

from superset.mcp_service.mcp_core import _slugify, ModelGetInfoCore


class _FakeOutput(BaseModel):
id: int
title: str
Comment on lines +35 to +37
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a class docstring describing the purpose of this test schema. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The class is newly introduced and has no docstring. If the custom rule requires
documentation for new class definitions, this is a real violation in the
existing code.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/test_mcp_core.py
**Line:** 35:37
**Comment:**
	*Custom Rule: Add a class docstring describing the purpose of this test schema.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎



class _FakeError(BaseModel):
error: str
error_type: str
timestamp: datetime
Comment on lines +40 to +43
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a class docstring explaining the role of this error schema in the tests. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The class has no docstring in the final file. This matches the suggestion and
constitutes a real documentation omission if the rule requires docstrings on
new class definitions.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/test_mcp_core.py
**Line:** 40:43
**Comment:**
	*Custom Rule: Add a class docstring explaining the role of this error schema in the tests.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎



class _Unset:
"""Sentinel meaning "DAO has no title_column attribute at all"."""


@pytest.fixture(autouse=True)
def _patch_id_or_slug_filter():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add an explicit return type annotation for this fixture function (for example, an iterator/generator type) to satisfy typing requirements for new functions. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The fixture function is defined without a return type annotation. If the rule
requires explicit typing on new functions, this is a genuine violation in the
current code.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/test_mcp_core.py
**Line:** 51:51
**Comment:**
	*Custom Rule: Add an explicit return type annotation for this fixture function (for example, an iterator/generator type) to satisfy typing requirements for new functions.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

"""id_or_slug_filter is called inside _find_object's slug branch, but its
return value is only used by SQLAlchemy internals we've mocked away —
we just need it to not blow up."""
with patch("superset.models.dashboard.id_or_slug_filter", return_value=MagicMock()):
yield


def _make_dashboard(id_: int, title: str, slug: str = "") -> MagicMock:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a short docstring describing what object this helper constructs and what inputs it expects. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The helper function has no docstring in the final file. That is a real
documentation gap if the custom rule mandates docstrings for added helpers.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/test_mcp_core.py
**Line:** 59:59
**Comment:**
	*Custom Rule: Add a short docstring describing what object this helper constructs and what inputs it expects.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

dashboard = MagicMock()
dashboard.id = id_
dashboard.dashboard_title = title
dashboard.slug = slug
return dashboard


def _build_core(
*,
supports_slug: bool = True,
title_column: str | None = "dashboard_title",
dao_title_column: str | None | type[_Unset] = None,
) -> tuple[ModelGetInfoCore, MagicMock]:
"""Build a core with configurable title-column wiring.

- `title_column` is the explicit override passed into the core.
- `dao_title_column` simulates the DAO's class attribute; defaults
to None so we rely on the explicit override in most tests.
"""
dao_class = MagicMock()
# getattr(model_class, "dashboard_title") needs to return a truthy column
# so the fallback proceeds past its guard.
dao_class.model_cls = MagicMock(dashboard_title=MagicMock())
dao_class.find_by_id.return_value = None
# MagicMock auto-vivifies attrs, so explicitly control title_column.
if dao_title_column is _Unset:
del dao_class.title_column
else:
dao_class.title_column = dao_title_column

def serializer(obj: MagicMock) -> _FakeOutput:
return _FakeOutput(id=obj.id, title=obj.dashboard_title)

core = ModelGetInfoCore(
dao_class=dao_class,
output_schema=_FakeOutput,
error_schema=_FakeError,
serializer=serializer,
supports_slug=supports_slug,
title_column_name=title_column,
)
return core, dao_class


def test_slugify_matches_agent_guesses() -> None:
"""Agents slugify titles by lowercasing and hyphenating non-alphanumerics."""
assert _slugify("World Bank's Data") == "world-banks-data"
assert _slugify(" Multiple Spaces ") == "multiple-spaces"
assert _slugify("!!!") == ""


def _make_db_mocks(*, all_rows: list[MagicMock]) -> tuple[MagicMock, MagicMock]:
"""Return (slug_query, title_query) for db.session.query.side_effect.

The core calls db.session.query twice during a string-identifier lookup:
once for the id_or_slug_filter query (returns None to force fallback),
and once for the slugified-title scan (returns `all_rows`).
"""
slug_query = MagicMock()
slug_query.filter.return_value = slug_query
slug_query.options.return_value = slug_query
slug_query.one_or_none.return_value = None

title_query = MagicMock()
title_query.options.return_value = title_query
title_query.all.return_value = all_rows
return slug_query, title_query


@patch("superset.extensions.db")
def test_title_fallback_resolves_dashboard_with_empty_slug(
mock_db: MagicMock,
) -> None:
"""Regression: slug lookup must not silently fail when slug is empty."""
core, _ = _build_core()
dashboard = _make_dashboard(id_=2, title="World Bank's Data", slug="")
slug_query, title_query = _make_db_mocks(all_rows=[dashboard])
mock_db.session.query.side_effect = [slug_query, title_query]

result = core.run_tool("world-banks-data")

assert isinstance(result, _FakeOutput)
assert result.id == 2
assert result.title == "World Bank's Data"


@patch("superset.extensions.db")
def test_title_fallback_ambiguous_picks_first_and_warns(
mock_db: MagicMock, caplog: pytest.LogCaptureFixture
) -> None:
"""Two dashboards slugify to the same identifier — pick the first, warn."""
core, _ = _build_core()
dash_a = _make_dashboard(id_=2, title="World Bank's Data")
dash_b = _make_dashboard(id_=7, title="World Banks Data")
slug_query, title_query = _make_db_mocks(all_rows=[dash_a, dash_b])
mock_db.session.query.side_effect = [slug_query, title_query]

with caplog.at_level("WARNING"):
result = core.run_tool("world-banks-data")

assert isinstance(result, _FakeOutput)
assert result.id == 2 # first match wins
assert any("matched 2 rows" in rec.message for rec in caplog.records)


@patch("superset.extensions.db")
def test_not_found_error_when_no_title_match(
mock_db: MagicMock,
) -> None:
"""No slug, no title, no slugified-title match — plain not_found."""
core, _ = _build_core()
slug_query, title_query = _make_db_mocks(all_rows=[])
mock_db.session.query.side_effect = [slug_query, title_query]

result = core.run_tool("does-not-exist")

assert isinstance(result, _FakeError)
assert result.error_type == "not_found"


def test_title_fallback_disabled_returns_not_found() -> None:
"""When neither override nor DAO provides a title column, no fallback."""
core, _ = _build_core(
supports_slug=False, title_column=None, dao_title_column=_Unset
)

result = core.run_tool("anything")

assert isinstance(result, _FakeError)
assert result.error_type == "not_found"


def test_title_column_defaults_from_dao_attribute() -> None:
"""No explicit override → core picks up DAO.title_column."""
core, _ = _build_core(title_column=None, dao_title_column="dashboard_title")
assert core.title_column_name == "dashboard_title"


def test_explicit_title_column_overrides_dao_attribute() -> None:
"""Explicit override beats the DAO default."""
core, _ = _build_core(title_column="custom_col", dao_title_column="dashboard_title")
assert core.title_column_name == "custom_col"


@pytest.mark.parametrize(
"identifier,expected_slug",
[
("World Bank's Data", "world-banks-data"),
("multi space", "multi-space"),
("leading--and--trailing--", "leading-and-trailing"),
],
)
def test_slugify_handles_edge_cases(identifier: str, expected_slug: str) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a test docstring summarizing the edge-case behavior validated by this parametrized test. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The parametrized test function has no docstring in the final file. If the
custom rule requires docstrings for new test functions, this is a real
violation.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/test_mcp_core.py
**Line:** 212:212
**Comment:**
	*Custom Rule: Add a test docstring summarizing the edge-case behavior validated by this parametrized test.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

assert _slugify(identifier) == expected_slug
Loading