design-system-mcp: Update get_components to optionally support multiple names#78185
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Size Change: 0 B Total Size: 7.97 MB ℹ️ View Unchanged
|
|
Flaky tests detected in d851ecb. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/25924368281
|
| function fakeDetail( name: string ): ComponentDetail { | ||
| return { | ||
| name, | ||
| description: `${ name } description.`, | ||
| packageName: '@wordpress/ui', | ||
| importStatement: `import { ${ name } } from '@wordpress/ui';`, | ||
| props: [], | ||
| stories: [], | ||
| }; | ||
| } |
There was a problem hiding this comment.
Is there a chance that, since we're mocking this data, any real API response breakage won't be flagged by tests?
Maybe we could generate mock data by running a real API call?
There was a problem hiding this comment.
Is there a chance that, since we're mocking this data, any real API response breakage won't be flagged by tests?
Maybe we could generate mock data by running a real API call?
Yeah, I waffled quite a bit on just how much integration testing made sense here to give us confidence. With how this is set up currently, what you're describing probably falls more in the domain of the tests for data.ts, or a separate end-to-end / integration test setup. Personally I'd be concerned about the time cost and chance for flakiness through HTTP availability.
Some other ideas we could consider to improve our confidence around the type of issue you describe:
- If Storybook published their own TypeScript types for manifest shape, we could have TypeScript-based indicators if the object shape ever changes.
- Similarly, we could create our own expected schema and validate that the actual manifest output matches our expected shape.
- We could have a low-fidelity test that runs after
npm run storybook:buildto make sure thatstorybook/build/components/manifest.jsonexists and contains a particular property we're expecting to access.
Since the manifest we're loading is managed in the same project, this isn't really an external change outside our realm of control. The scenario you're describing would be a process failure where someone updates the manifest without updating dependencies in other parts of the same project.
There was a problem hiding this comment.
Could we generate the manifest locally and validate its schema?
Alternatively, boosting our confidence via type checks (using Storybook types), that would be a nice improvement.
As a last resort, we could also consider adding some runtime validation — ie, when we receive the data payload, we analyze its shape and throw a warning / error. It won't prevent breakages, but at least it makes them obvious and allows us to fix them quickly. We could even instruct the agent to take that report and open a GH issue directly in the MCP response
There was a problem hiding this comment.
To address this, I did a bit of both:
- In 4c026d4, I updated to use Storybook's internal TypeScript types for the manifest object shape. This isn't complete, however, because it doesn't include the react-docgen types. And through some debugging I discovered that our current version of Storybook doesn't properly respect our use of react-docgen-typescript when generating the manifest. So it is very likely that this will in-fact break in future version upgrades (
reactDocgenbecomingreactDocgenTypeScript). - For that reason, I also added an extra check in 8f53c41 to ensure that the actual generated manifest can be parsed by the design-system-mcp package logic. This way, if the manifest shape changes (e.g. in future update) in a way that would prevent the MCP from being able to access component data, the CI build will fail.
There was a problem hiding this comment.
Looks like the manifest validation check is failing in CI, likely because the experimentalComponentsManifest option is only enabled in production envinronment, while the CI check runs in the test environment.
Maybe we can change the check from NODE_ENV === 'production' to NODE_ENV !== 'development' ?
There was a problem hiding this comment.
Yeah I think you're right. I was a bit confused because this was working locally for me, which would somehow imply that npm run storybook:build on my computer is NODE_ENV=production. I guess that's expected when the environment variable is not set. And because it's set in CI, we'll need to target both 'production' and 'test'. At that point 'development' is basically just npm run storybook:dev, which is the one place I think it'd be more important to not generate the manifests.
There was a problem hiding this comment.
Although I think in newer versions Storybook now defaults componentsManifest: true (no longer experimental, default to true). We could probably remove the option when we upgrade if we're not as concerned about preventing it being generated.
fbb9953 to
ac99e4b
Compare
| const missing: string[] = []; | ||
|
|
||
| for ( const componentName of names ) { | ||
| const detail = await getComponentDetail( componentName ); |
There was a problem hiding this comment.
Small nit (feel free to ignore):
the await in this for loop is effectively async only for the first iteration, since from the second iteration onwards the content is cached (in the getComponentDetail implementation).
Conceptually, it could make sense to rewrite using Promise.all/map for added robustness if caching ever changes?
There was a problem hiding this comment.
I think I see what you're saying, that this is fine, but only because we're piercing an abstraction to know how getComponentDetail would cache, but in a closed-box situation we would want to parallelize this.
I think it makes sense to change, but after starting to dig into it, it reveals a separate problem knock-on requirement to fetchComponents and getDesignTokens: Because we only cache the value once we finish fetching it, parallelized calls could trigger multiple simultaneous fetches. We can clean this up by caching the promise and not the value, but that's a bit of a bigger refactor that might be worth doing separate from this pull request as a quick follow-on. Especially since this should work well enough as-is.
There was a problem hiding this comment.
We can clean this up by caching the promise and not the value, but that's a bit of a bigger refactor that might be worth doing separate from this pull request as a quick follow-on. Especially since this should work well enough as-is.
I created a separate pull request at #78311 with this refactor. We could either merge it into this branch or separately after this one lands.
There was a problem hiding this comment.
Definitely good as a follow-up after this PR gets merged, IMO
| name: z | ||
| .union( [ | ||
| z.string().min( 1 ), | ||
| z.array( z.string().min( 1 ) ).min( 1 ), |
There was a problem hiding this comment.
Should also set a max limit to protect against huge reponses?
There was a problem hiding this comment.
Should also set a
maxlimit to protect against huge reponses?
It might be a good idea. It doesn't seem like the MCP spec itself sets any constraints on maximum tool response sizes, but I feel like I've seen agents reject especially large responses. I'm not sure what a reasonable maximum would be. Maybe 10 as an arbitrary number that should be safely compact enough and sufficient to fulfill most requests?
There was a problem hiding this comment.
Updated to set max of 10 component names in 7586be4.
There was a problem hiding this comment.
This file is currently published in the npm release. We could move it to scripts/ or change the files glob to exclude it?
There was a problem hiding this comment.
This file is currently published in the npm release. We could move it to
scripts/or change thefilesglob to exclude it?
Moved to scripts/ in c2ff825.
Separately, I noticed that we don't have any type-checking on these files, both bin/ and now scripts/. We'll probably want a separate config similar to what we do for the theme package, because just adding it to include causes it to produce compiled files in that folder.
ciampo
left a comment
There was a problem hiding this comment.
LGTM 🚀 Feels like the PR is in a good state to merge and iterate as needed in follow-ups
|
|
||
| export async function handler( { name }: z.infer< typeof inputSchema > ) { | ||
| const names = Array.isArray( name ) ? name : [ name ]; | ||
| const sections: string[] = []; |
There was a problem hiding this comment.
Apologies in case this comment doesn't make much sense, but — seeing that sections is an array, could it ever include duplicate entries (eg [ 'Button;, 'Button' ]) ? Would it make sense to make it a Set instead?
Also, separately — how do we differentiate between components with the same name from different packages? Button, for example, exists in @wordpress/ui and @wordpress/components. Is parseComponentDetail sylently dropping cross-package duplicates?
There was a problem hiding this comment.
In other words, what I believe may be happening as of now:
- calling
get_component_details("Button")returns only one package'sButton(the otherButtonis lost) - calling
get_component_details(["Button", "Button"])returns the same package'sButtontwice
Am I understanding correctly?
There was a problem hiding this comment.
I had designed it as an intentional constraint that there's at most one component of any given name in the design system (i.e. one "Button"), forcing us to choose between @wordpress/components and @wordpress/ui as the one that should be used today. In other words, it's an explicit goal of the MCP to provide clear guidance. That "choosing" happens manually ahead-of-time by us applying tags: ['manifest'] to the Storybook story, and rigor to make sure that we don't apply it to more than one component of the same name. In the future hopefully we can drop this manual tagging in favor of just deferring to whatever's in the canonical set of design system packages (@wordpress/ui, excluding @wordpress/components altogether).
That being said, I've had it in my mind that the MCP could be more context-aware in ways that would reveal this kind of duplication:
- Since many components are blocked by styling incompatibilities when used alongside
@wordpress/components, it would be nice if the client could be aware and express when knowing that that constraint doesn't apply (e.g. a greenfield project or self-contained screen, like the AI experiment and Connectors screen are already doing). - I think it could be good to provide some context to the agent about which components are "coming soon" or explicitly deprecated, so that it can weigh that information and relay to the user accordingly (e.g. "here's what to use today, but heads-up about XYZ coming in the future"). I might imagine this would end up being surfaced as a separate section(s) of components in the listing, and then maybe combined with additional parameters on the tool, or entirely separate tools for getting the detail.
As far as duplication, I think it's technically possible, but seems both unlikely and unproblematic to preemptively address. If we start seeing agents issuing tool calls with duplicate names, I feel like that suggests a bigger problem for why it chose to do that that should be addressed at the source rather than deduplicating the responses.
There was a problem hiding this comment.
Thank you for the extended context 👌 I agree with everything you said.
Also, none of the above should be a blocker for this PR, anyway.
May help catch future issues if the shape changes in future updates (i.e. TypeScript will start failing)
Because we're now testing the generated manifest
Exclude from published build
Avoid running into issues where agent clients won't process large tool responses. 10 components should be sufficient for most usage and fall within most client limitations.
Avoid hard-coding specific component name. We're primarily interested that we didn't lose the ability to parse props across components.
5207ab9 to
d851ecb
Compare
What?
Updates the WordPress Design System MCP
get_component_detailstool to allow retrieving multiple components' details at once by passing an array of names, while still supporting individual component details by passing a string.Why?
AI agents will often try to limit their tool use. When working on a complex user interaction that might involve several components, agents interacting with the current server implementation may only retrieve details for a few components. If there are more relevant components, it has been observed to make assumptions (often incorrect) about how a component should work, rather than make another tool call. We can anticipate and work within this tool call constraint by allowing a single tool call to retrieve information for several components at once, supporting common workflows where a developer would be arranging several design system components into a UI or component implementation.
How?
Testing Instructions
Repeat testing instructions from #77159.
In MCP inspector tool, observe that
get_componentstool accepts an array of component names.In real agent usage, provide the agent a prompt which asks it to retrieve details about several components. (Example, which covers both existing and missing components: "In the WordPress Design System, what's the difference between Button, ToggleButton, and MenuButton? When should I use each?").
Testing Instructions for Keyboard
Screenshots or screencast
Initial state:
Single component:
Multiple components:
Claude Desktop example prompt, showing multiple requests for components and handling missing components:
Use of AI Tools
Implemented using Claude Code Opus 4.7. Several rounds of iteration, particularly around input schema and test coverage approach.