-
Notifications
You must be signed in to change notification settings - Fork 2.8k
draft: add SkillStore abstraction #11480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
tstadel
wants to merge
15
commits into
skills-toolset
Choose a base branch
from
skills-toolset-skill-store-abstraction
base: skills-toolset
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+533
−164
Draft
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
0d92db9
draft: add SkillStore abstraction
tstadel 99a2016
Update haystack/tools/skills/skill_store.py
tstadel 40ce5a6
Update haystack/tools/skills/skill_store.py
tstadel 7905338
convert into protocol
tstadel 4d6fafe
improve protocol
tstadel 4cdd5f4
remove asserts
tstadel 01ba7c1
fix docstrings
tstadel ccdee75
remove path and str params
tstadel 5b1509a
refactor modules
tstadel 85db16a
docstrings
tstadel c28c831
docstrings
tstadel c466a6c
enforce default_from_dict
tstadel 60769d4
apply feedback
tstadel f468da3
fix headers
tstadel 783b7bf
fix headers
tstadel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from dataclasses import dataclass, field | ||
| from pathlib import Path | ||
|
|
||
|
|
||
| @dataclass | ||
| class SkillMeta: | ||
| """ | ||
| Metadata describing a single skill. | ||
|
|
||
| :param name: The skill's name, used by the agent to load it. | ||
| :param description: A short description of when to use the skill. Shown to the agent up front. | ||
| :param path: The skill's directory. Set by `FileSystemSkillStore`; can be `None` for other stores. | ||
| """ | ||
|
|
||
| name: str | ||
| description: str | ||
| path: Path | None = field(default=None) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| import sys | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from lazy_imports import LazyImporter | ||
|
|
||
| _import_structure = {"skill_store": ["FileSystemSkillStore"]} | ||
|
|
||
| if TYPE_CHECKING: | ||
| from .skill_store import FileSystemSkillStore as FileSystemSkillStore | ||
| else: | ||
| sys.modules[__name__] = LazyImporter(name=__name__, module_file=__file__, import_structure=_import_structure) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| # SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| import yaml | ||
|
|
||
| from haystack.core.serialization import default_from_dict, default_to_dict | ||
| from haystack.dataclasses.skill_meta import SkillMeta | ||
| from haystack.skill_stores.types.protocol import SKILL_FILE_NAME | ||
|
|
||
|
|
||
| def _parse_frontmatter(text: str) -> tuple[dict[str, Any], str]: | ||
| """ | ||
| Split a `SKILL.md` file into its YAML frontmatter and markdown body. | ||
|
|
||
| The frontmatter is the YAML block delimited by leading and trailing `---` lines. If no frontmatter is | ||
| present, an empty mapping and the original text are returned. | ||
|
|
||
| :param text: The full contents of a `SKILL.md` file. | ||
| :returns: A tuple of (frontmatter mapping, body). | ||
| :raises ValueError: If the frontmatter is present but is not a valid YAML mapping. | ||
| """ | ||
| stripped = text.lstrip() | ||
| if not stripped.startswith("---"): | ||
| return {}, text | ||
|
|
||
| # Drop the leading '---' line, then split on the closing '---'. | ||
| after_open = stripped[len("---") :].lstrip("\n") | ||
| parts = after_open.split("\n---", 1) | ||
| if len(parts) != 2: | ||
| return {}, text | ||
|
|
||
| frontmatter_block, body = parts | ||
| loaded = yaml.safe_load(frontmatter_block) or {} | ||
| if not isinstance(loaded, dict): | ||
| raise ValueError("Skill frontmatter must be a YAML mapping.") # noqa: TRY004 | ||
| return loaded, body.lstrip("\n") | ||
|
|
||
|
|
||
| class FileSystemSkillStore: | ||
| """ | ||
| SkillStore backed by a directory of skill sub-directories on the local filesystem. | ||
|
|
||
| Expected layout: | ||
|
|
||
| ``` | ||
| skills/ | ||
| pdf-forms/ | ||
| SKILL.md # frontmatter (name, description) + markdown instructions | ||
| reference/forms.md # optional bundled file | ||
| ``` | ||
|
|
||
| Only the frontmatter of each `SKILL.md` is read at construction time (cheap); bodies and bundled | ||
| files are read lazily when the agent calls the corresponding tool. | ||
|
|
||
| :param skills_dir: Root directory that contains one sub-directory per skill. | ||
| :raises ValueError: If `skills_dir` does not exist, is not a directory, a skill is missing a required | ||
| frontmatter field, or two skills share the same name. | ||
| """ | ||
|
|
||
| def __init__(self, skills_dir: str | Path) -> None: | ||
| self.skills_dir = Path(skills_dir) | ||
| self._skills = self._scan() | ||
|
|
||
| def _scan(self) -> dict[str, SkillMeta]: | ||
| if not self.skills_dir.is_dir(): | ||
| raise ValueError(f"Skills directory '{self.skills_dir}' does not exist or is not a directory.") | ||
|
|
||
| skills: dict[str, SkillMeta] = {} | ||
| for skill_file in sorted(self.skills_dir.glob(f"*/{SKILL_FILE_NAME}")): | ||
| skill_dir = skill_file.parent | ||
| frontmatter, _ = _parse_frontmatter(skill_file.read_text(encoding="utf-8")) | ||
|
|
||
| name = frontmatter.get("name", skill_dir.name) | ||
| description = frontmatter.get("description") | ||
| if not description: | ||
| raise ValueError(f"Skill '{name}' ({skill_file}) is missing a 'description' in its frontmatter.") | ||
| if name in skills: | ||
| raise ValueError(f"Duplicate skill name '{name}' found in '{self.skills_dir}'.") | ||
|
|
||
| skills[name] = SkillMeta(name=name, description=description, path=skill_dir) | ||
| return skills | ||
|
|
||
| def list_skills(self) -> dict[str, SkillMeta]: | ||
| """Lists all skills available on disk""" | ||
| return self._skills | ||
|
|
||
| def load_skill_body(self, name: str) -> str: | ||
| """Loads the skill body from disk""" | ||
| meta = self._skills.get(name) | ||
| if meta is None: | ||
| raise KeyError(name) | ||
| if meta.path is None: | ||
| raise ValueError(f"Skill '{name}' is missing its directory path in metadata.") | ||
| _, body = _parse_frontmatter((meta.path / SKILL_FILE_NAME).read_text(encoding="utf-8")) | ||
| return body | ||
|
|
||
| def list_skill_files(self, name: str) -> list[str]: | ||
| """List all files in a skill directory, excluding the SKILL.md file.""" | ||
| meta = self._skills.get(name) | ||
| if meta is None: | ||
| raise KeyError(name) | ||
| if meta.path is None: | ||
| raise ValueError(f"Skill '{name}' is missing its directory path in metadata.") | ||
| return sorted( | ||
| p.relative_to(meta.path).as_posix() | ||
| for p in meta.path.rglob("*") | ||
| if p.is_file() and p.name != SKILL_FILE_NAME | ||
| ) | ||
|
|
||
| def read_skill_file(self, name: str, path: str) -> str: | ||
| """read_skill_file implementation that prevents path traversal outside the skill directory.""" | ||
| meta = self._skills.get(name) | ||
| if meta is None: | ||
| raise KeyError(name) | ||
| if meta.path is None: | ||
| raise ValueError(f"Skill '{name}' is missing its directory path in metadata.") | ||
| skill_dir = meta.path.resolve() | ||
| target = (skill_dir / path).resolve() | ||
| if skill_dir != target and skill_dir not in target.parents: | ||
| raise PermissionError(f"path escapes the '{name}' skill directory") | ||
| if not target.is_file(): | ||
| raise FileNotFoundError(f"File '{path}' not found in skill '{name}'") | ||
| return target.read_text(encoding="utf-8") | ||
|
|
||
| def to_dict(self) -> dict[str, Any]: | ||
| """Serialize this store to a dictionary for use with :meth:`from_dict`.""" | ||
| return default_to_dict(self, skills_dir=str(self.skills_dir)) | ||
|
|
||
| @classmethod | ||
| def from_dict(cls, data: dict[str, Any]) -> "FileSystemSkillStore": | ||
| """Deserialize a FileSystemSkillStore from its dictionary representation.""" | ||
| return default_from_dict(cls, data) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
tstadel marked this conversation as resolved.
|
||
|
|
||
| from .protocol import SkillStore | ||
|
|
||
| __all__ = ["SkillStore"] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you make a version of this SkillStore using a db (like the one to be used in platform) to make sure this protocol definition works for at least two different backends? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| # SPDX-FileCopyrightText: 2022-present deepset GmbH <info@deepset.ai> | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from typing import Any, Protocol, runtime_checkable | ||
|
|
||
| from haystack.dataclasses.skill_meta import SkillMeta | ||
|
|
||
| SKILL_FILE_NAME = "SKILL.md" | ||
|
|
||
|
|
||
| @runtime_checkable | ||
| class SkillStore(Protocol): | ||
| """ | ||
| Protocol for a skill storage layer. | ||
|
|
||
| A `SkillStore` is responsible for discovering available skills and providing their content on demand. | ||
| Implement this class to back `haystack.tools.SkillToolset` with any storage system — a local | ||
| directory, a database, a remote API, or an in-memory fixture. | ||
|
|
||
| The three content methods (`load_skill_body`, `list_skill_files`, | ||
| `read_skill_file`) are called lazily at agent runtime, not at construction time, so | ||
| implementations may defer I/O until a skill is actually needed. | ||
| """ | ||
|
|
||
| def list_skills(self) -> dict[str, SkillMeta]: | ||
| """ | ||
| Discover and return all available skills. | ||
|
|
||
| Called once during `haystack.tools.SkillToolset` initialization to build the skills catalog. | ||
|
|
||
| :returns: Mapping of skill name to its metadata. | ||
| """ | ||
| ... | ||
|
|
||
|
tstadel marked this conversation as resolved.
|
||
| def load_skill_body(self, name: str) -> str: | ||
| """ | ||
| Return the markdown body of the named skill's instructions. | ||
|
|
||
| :param name: Skill name as returned by `list_skills`. | ||
| :returns: The raw markdown body (frontmatter stripped). | ||
| :raises KeyError: If no skill with `name` exists. | ||
| """ | ||
| ... | ||
|
|
||
| def list_skill_files(self, name: str) -> list[str]: | ||
| """ | ||
| Return the relative paths of any files bundled with the named skill. | ||
|
|
||
| :param name: Skill name as returned by `list_skills`. | ||
| :returns: Sorted list of POSIX-style paths relative to the skill root. Empty when there are no extras. | ||
| :raises KeyError: If no skill with `name` exists. | ||
| """ | ||
| ... | ||
|
|
||
| def read_skill_file(self, name: str, path: str) -> str: | ||
| """ | ||
| Read a file bundled with the named skill. | ||
|
|
||
| :param name: Skill name as returned by `list_skills`. | ||
| :param path: Path of the file relative to the skill root (e.g. `"reference/forms.md"`). | ||
| :returns: The file's text content. | ||
| :raises KeyError: If no skill with `name` exists. | ||
| :raises PermissionError: If `path` escapes the skill's root (path-traversal attempt). | ||
| :raises FileNotFoundError: If the file does not exist within the skill. | ||
| """ | ||
| ... | ||
|
|
||
| def to_dict(self) -> dict[str, Any]: | ||
| """ | ||
| Serialize this store to a dictionary for use with `from_dict`. | ||
|
|
||
| Override both this method and `from_dict` to make your custom store serializable with | ||
| `haystack.tools.SkillToolset`. | ||
| """ | ||
| ... | ||
|
|
||
| @classmethod | ||
| def from_dict(cls, data: dict[str, Any]) -> "SkillStore": | ||
| """ | ||
| Deserialize a store from a dictionary produced by `to_dict`. | ||
|
|
||
| Override both this method and `to_dict` to make your custom store serializable with | ||
| `haystack.tools.SkillToolset`. | ||
|
|
||
| :param data: Dictionary as produced by `to_dict`. | ||
| """ | ||
| ... | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.