OpenHands · aivong-openhands · May 1, 2026 · May 1, 2026 · May 1, 2026 · May 1, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -99,7 +99,9 @@ When editing or adding skills in this repo, follow these rules (and add new skil
 ## CI / validation gotchas
 
 - The test suite expects **every directory under `skills/`** to be listed in a marketplace. If you add a new skill (or rebase onto a main branch that added skills), update the appropriate marketplace file or CI will fail with `Skills missing from marketplace: [...]`.
+- New marketplace skills also need `README.md` and `.plugin/plugin.json`; otherwise `tests/test_skills_have_readme.py` and `tests/test_skill_plugin_loading.py` will fail.
 - `scripts/sync_extensions.py` keeps generated artifacts in sync: Claude Code command files, README catalog section, coverage checks, and vendor symlinks. Run `python scripts/sync_extensions.py --check` (or just push — CI runs it) to verify everything is consistent. Run without `--check` to auto-fix.
+- After adding `.plugin/plugin.json` for a skill, re-run `scripts/sync_extensions.py` so the expected `.claude-plugin` and `.codex-plugin` symlinks are created.
 - The sync script uses PyYAML to parse SKILL.md frontmatter. If you add a skill with a slash trigger (e.g., `triggers: ["/mycommand"]`), the script auto-generates `commands/mycommand.md`. **Note:** Slash triggers in SKILL.md frontmatter are deprecated — prefer adding a `commands/command-name.md` file to the plugin's `commands/` directory instead. Keyword triggers (non-slash) remain the recommended way to activate skills by topic.
 
 ## OpenHands SDK documentation policy

diff --git a/README.md b/README.md
@@ -32,7 +32,7 @@ Browse available plugins in [`plugins/`](plugins/).
 ## Extensions Catalog
 
 <!-- BEGIN AUTO-GENERATED CATALOG -->
-This repository contains **2 marketplace(s)** with **47 extensions** (38 skills, 9 plugins).
+This repository contains **2 marketplace(s)** with **51 extensions** (42 skills, 9 plugins).
 
 ### Quick Start
 
@@ -87,14 +87,14 @@ OpenHands skills for interacting, improving, and refactoring large codebases
 
 ### openhands-extensions
 
-Official skills and plugins for OpenHands — the open-source AI software engineer.
+Official skills and plugins for OpenHands - the open-source AI software engineer.
 
-**43 extensions** (36 skills, 7 plugins)
+**47 extensions** (40 skills, 7 plugins)
 
 | Name | Type | Description | Commands |
 |------|------|-------------|----------|
 | add-skill | skill | Add (import) an OpenHands skill from a GitHub repository into the current workspace. | — |
-| agent-creator | skill | Create file-based sub-agents as Markdown files — no Python code required. Guides the user through a structured interv... | `/agent-creator` |
+| agent-creator | skill | Create file-based sub-agents as Markdown files - no Python code required. Guides the user through a structured interv... | `/agent-creator` |
 | agent-memory | skill | Persist and retrieve repository-specific knowledge using AGENTS.md files. Use when you want to save important informa... | `/remember` |
 | agent-sdk-builder | skill | Guided workflow for building custom AI agents using the OpenHands Software Agent SDK. Use when you want to create a n... | `/agent-builder` |
 | azure-devops | skill | Interact with Azure DevOps repositories, pull requests, and APIs using the AZURE_DEVOPS_TOKEN environment variable. U... | — |
@@ -111,7 +111,7 @@ Official skills and plugins for OpenHands — the open-source AI software engine
 | github | skill | Interact with GitHub repositories, pull requests, issues, and workflows using the GITHUB_TOKEN environment variable a... | — |
 | github-pr-review | skill | Post structured PR reviews to GitHub with inline comments/suggestions in a single API call. | `/github-pr-review` |
 | gitlab | skill | Interact with GitLab repositories, merge requests, and APIs using the GITLAB_TOKEN environment variable. Use when wor... | — |
-| iterate | skill | Iterate on a GitHub pull request — drive it through CI, code review, and QA until merge-ready. Monitors state, fixes ... | `/iterate`, `/verify`, `/babysit` |
+| iterate | skill | Iterate on a GitHub pull request - drive it through CI, code review, and QA until merge-ready. Monitors state, fixes ... | `/iterate`, `/verify`, `/babysit` |
 | jupyter | skill | Read, modify, execute, and convert Jupyter notebooks programmatically. Use when working with .ipynb files for data sc... | — |
 | kubernetes | skill | Set up and manage local Kubernetes clusters using KIND (Kubernetes IN Docker). Use when testing Kubernetes applicatio... | — |
 | learn-from-code-review | skill | Distill code review feedback from GitHub PRs into reusable skills and guidelines. Use when users ask to learn from co... | `/learn-from-reviews` |
@@ -120,18 +120,22 @@ Official skills and plugins for OpenHands — the open-source AI software engine
 | notion | skill | Create, search, and update Notion pages/databases using the Notion API. Use for documenting work, generating runbooks... | — |
 | npm | skill | Handle npm package installation in non-interactive environments by piping confirmations. Use when installing Node.js ... | — |
 | onboarding | plugin | Assess repository agent-readiness across five pillars, propose high-impact fixes, and generate repo-specific AGENTS.m... | — |
-| openhands | plugin | Unified OpenHands plugin — bundles Cloud CLI, REST API (openhands-api), and Automations (openhands-automation) into a... | `/openhands-cloud` |
+| openhands | plugin | Unified OpenHands plugin - bundles Cloud CLI, REST API (openhands-api), and Automations (openhands-automation) into a... | `/openhands-cloud` |
 | openhands-api | skill | Use the OpenHands Cloud REST API (V1) to create and manage app conversations, including multi-conversation delegation... | — |
 | openhands-automation | skill | Create and manage OpenHands automations - scheduled tasks that run in sandboxes. Use the prompt preset to create auto... | `/automation:create` |
 | openhands-sdk | skill | Reference skill for the OpenHands Software Agent SDK - build AI agents with custom tools, LLM configuration, conversa... | `/sdk` |
 | pdflatex | skill | Install and use pdflatex to compile LaTeX documents into PDFs on Linux. Use when generating academic papers, research... | — |
-| pr-review | plugin | Automated PR code review — analyzes diffs and posts inline review comments via the GitHub API. | — |
+| pr-review | plugin | Automated PR code review - analyzes diffs and posts inline review comments via the GitHub API. | — |
 | prd | skill | Generate a Product Requirements Document (PRD) for a new feature through an interactive clarifying-question workflow.... | `/prd` |
 | release-notes | plugin | Generate consistent, well-structured release notes from git history. Produces categorized changelog with breaking cha... | `/release-notes` |
 | security | skill | Security best practices for secure coding, authentication, authorization, and data protection. Use when developing fe... | — |
 | skill-creator | skill | Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an ex... | — |
 | ssh | skill | Establish and manage SSH connections to remote machines, including key generation, configuration, and file transfers.... | — |
 | swift-linux | skill | Install and configure Swift programming language on Debian Linux for server-side development. Use when building Swift... | — |
+| test-improvement-patterns | skill | Common execution patterns for implementing validated test-suite improvements with safe refactoring and TDD loops. | — |
+| test-improvement-workflow | skill | Concise orchestrator for auditing, prioritizing, validating, and implementing test-suite improvements using Dave Farl... | — |
+| test-prioritization-framework | skill | Reliability-first framework for consolidating and prioritizing test audit findings into CRITICAL, HIGH, and MEDIUM work. | — |
+| test-validation-checklist | skill | Checklist for validating proposed test-suite improvements against the real code before implementation. | — |
 | theme-factory | skill | Toolkit for styling artifacts with a theme. These artifacts can be slides, docs, reportings, HTML landing pages, etc.... | — |
 | uv | skill | Common project, dependency, and environment operations using uv. | — |
 | vercel | skill | Deploy and manage applications on Vercel, including preview deployments and deployment protection. | — |

diff --git a/marketplaces/openhands-extensions.json b/marketplaces/openhands-extensions.json
@@ -5,15 +5,15 @@
         "email": "contact@all-hands.dev"
     },
     "metadata": {
-        "description": "Official skills and plugins for OpenHands — the open-source AI software engineer.",
+        "description": "Official skills and plugins for OpenHands - the open-source AI software engineer.",
         "maintainer": "OpenHands",
         "homepage": "https://github.com/OpenHands/extensions"
     },
     "plugins": [
         {
             "name": "agent-creator",
             "source": "./skills/agent-creator",
-            "description": "Create file-based sub-agents as Markdown files — no Python code required. Guides the user through a structured interview and generates a ready-to-deploy .md agent file following the OpenHands SDK specification.",
+            "description": "Create file-based sub-agents as Markdown files - no Python code required. Guides the user through a structured interview and generates a ready-to-deploy .md agent file following the OpenHands SDK specification.",
             "category": "development",
             "keywords": [
                 "agent",
@@ -362,7 +362,7 @@
         {
             "name": "openhands",
             "source": "./plugins/openhands",
-            "description": "Unified OpenHands plugin — bundles Cloud CLI, REST API (openhands-api), and Automations (openhands-automation) into a single plugin.",
+            "description": "Unified OpenHands plugin - bundles Cloud CLI, REST API (openhands-api), and Automations (openhands-automation) into a single plugin.",
             "category": "openhands",
             "keywords": [
                 "openhands",
@@ -403,7 +403,7 @@
         {
             "name": "pr-review",
             "source": "./plugins/pr-review",
-            "description": "Automated PR code review — analyzes diffs and posts inline review comments via the GitHub API.",
+            "description": "Automated PR code review - analyzes diffs and posts inline review comments via the GitHub API.",
             "category": "code-quality",
             "keywords": [
                 "pr-review",
@@ -536,7 +536,7 @@
         {
             "name": "iterate",
             "source": "./skills/iterate",
-            "description": "Iterate on a GitHub pull request — drive it through CI, code review, and QA until merge-ready. Monitors state, fixes failures, addresses review feedback, retries flaky checks, and pushes fixes in one continuous loop.",
+            "description": "Iterate on a GitHub pull request - drive it through CI, code review, and QA until merge-ready. Monitors state, fixes failures, addresses review feedback, retries flaky checks, and pushes fixes in one continuous loop.",
             "category": "productivity",
             "keywords": [
                 "github",
@@ -546,6 +546,58 @@
                 "pull-request",
                 "iterate"
             ]
+        },
+        {
+            "name": "test-improvement-workflow",
+            "source": "./skills/test-improvement-workflow",
+            "description": "Concise orchestrator for auditing, prioritizing, validating, and implementing test-suite improvements using Dave Farley's 8 properties.",
+            "category": "testing",
+            "keywords": [
+                "testing",
+                "test-quality",
+                "workflow",
+                "dave-farley",
+                "audit"
+            ]
+        },
+        {
+            "name": "test-prioritization-framework",
+            "source": "./skills/test-prioritization-framework",
+            "description": "Reliability-first framework for consolidating and prioritizing test audit findings into CRITICAL, HIGH, and MEDIUM work.",
+            "category": "testing",
+            "keywords": [
+                "testing",
+                "prioritization",
+                "test-quality",
+                "audit",
+                "reliability"
+            ]
+        },
+        {
+            "name": "test-validation-checklist",
+            "source": "./skills/test-validation-checklist",
+            "description": "Checklist for validating proposed test-suite improvements against the real code before implementation.",
+            "category": "testing",
+            "keywords": [
+                "testing",
+                "validation",
+                "audit",
+                "checklist",
+                "verification"
+            ]
+        },
+        {
+            "name": "test-improvement-patterns",
+            "source": "./skills/test-improvement-patterns",
+            "description": "Common execution patterns for implementing validated test-suite improvements with safe refactoring and TDD loops.",
+            "category": "testing",
+            "keywords": [
+                "testing",
+                "patterns",
+                "refactoring",
+                "tdd",
+                "test-quality"
+            ]
         }
     ]
 }
diff --git a/skills/test-improvement-patterns/.claude-plugin b/skills/test-improvement-patterns/.claude-plugin
@@ -0,0 +1 @@
+.plugin
diff --git a/skills/test-improvement-patterns/.codex-plugin b/skills/test-improvement-patterns/.codex-plugin
@@ -0,0 +1 @@
+.plugin
diff --git a/skills/test-improvement-patterns/.plugin/plugin.json b/skills/test-improvement-patterns/.plugin/plugin.json
@@ -0,0 +1,19 @@
+{
+  "name": "test-improvement-patterns",
+  "version": "1.0.0",
+  "description": "Common execution patterns for implementing validated test-suite improvements with safe refactoring and TDD loops.",
+  "author": {
+    "name": "OpenHands",
+    "email": "contact@all-hands.dev"
+  },
+  "homepage": "https://github.com/OpenHands/extensions",
+  "repository": "https://github.com/OpenHands/extensions",
+  "license": "MIT",
+  "keywords": [
+    "testing",
+    "patterns",
+    "refactoring",
+    "tdd",
+    "test-quality"
+  ]
+}
diff --git a/skills/test-improvement-patterns/README.md b/skills/test-improvement-patterns/README.md
@@ -0,0 +1,12 @@
+# Test Improvement Patterns
+
+Execution guidance and recurring patterns for implementing validated test-suite improvements.
+
+## What it covers
+
+- when to use TDD versus refactoring
+- safe execution and commit loops
+- recurring clean-up patterns for tests
+- commit naming and verification reminders
+
+Use this after prioritization and validation, when you are ready to change code.
diff --git a/skills/test-improvement-patterns/SKILL.md b/skills/test-improvement-patterns/SKILL.md
@@ -0,0 +1,171 @@
+---
+name: test-improvement-patterns
+description: >
+  Common execution patterns for implementing validated test-suite improvements.
+  Use after planning work to apply safe refactoring, TDD loops, and recurring
+  test clean-up patterns.
+triggers:
+  - test improvement patterns
+  - apply test refactoring patterns
+  - execute test improvements
+version: 1.0.0
+metadata:
+  openhands:
+    requires:
+      bins: ["pytest", "git"]
+---
+
+# Test Improvement Patterns
+
+Use this skill after the user approves validated improvements. It focuses on how to execute the work safely and efficiently.
+
+## Planning principles
+
+- Use `tdd` when the change introduces new behavior or new helper code.
+- Use `refactoring` when the change should preserve behavior.
+- Keep changes small enough to verify after each phase.
+- Commit before and after large refactoring phases so you always have a safe checkpoint.
+
+## Execution loops
+
+### TDD loop for new behavior
+
+```bash
+# 1. Write a failing test
+pytest test_file.py::TestClass::test_method -v
+
+# 2. Implement the minimum code to pass
+pytest test_file.py::TestClass::test_method -v
+
+# 3. Commit the working change before further cleanup
+git add -A && git commit -m "feat: add helper method"
+```
+
+### Refactoring loop for existing behavior
+
+```bash
+# 1. Save the known-good state
+git add -A && git commit -m "checkpoint: before refactoring"
+
+# 2. Make behavior-preserving changes
+pytest test_file.py -v
+
+# 3. Commit the refactor
+git add -A && git commit -m "refactor: consolidate tests with parameterization"
+```
+
+## Commit message conventions
+
+| Change type | Format | Example |
+|---|---|---|
+| New feature | `feat: <description>` | `feat: add is_unchanged helper` |
+| Refactoring | `refactor: <description>` | `refactor: consolidate tests with parametrize` |
+| Tests only | `test: <description>` | `test: add failing coverage for helper` |
+
+## Code hygiene reminder
+
+Keep imports at the top of the file. Do not hide imports inside individual tests unless delayed import behavior is the subject of the test.
+
+## Common improvement patterns
+
+### Pattern 0: Replace session-scoped mutable fixtures
+
+**Problem**: shared fixture state leaks between tests.
+
+```python
+@pytest.fixture(autouse=True, scope='session')
+def mock_api_client():
+    with mock_service() as client:
+        service._client = client
+        yield client
+        service._client = None
+```
+
+**Preferred direction**: use function scope unless there is a strong, proven reason not to.
+
+```python
+@pytest.fixture(autouse=True)
+def mock_api_client():
+    with mock_service() as client:
+        service._client = client
+        yield client
+        service._client = None
+```
+
+### Pattern 1: Reduce implementation coupling
+
+**Problem**: tests reach into internal data structures.
+
+```python
+assert ("key", "value") in result.unchanged
+```
+
+**Preferred direction**: assert through a stable interface.
+
+```python
+assert result.is_unchanged("key")
+```
+
+### Pattern 2: Consolidate with parameterization
+
+**Problem**: many tests differ only by input and expected output.
+
+```python
+def test_case_a(self):
+    assert func("a") == "A"
+
+def test_case_b(self):
+    assert func("b") == "B"
+```
+
+**Preferred direction**: collapse them into one parametrized test.
+
+```python
+@pytest.mark.parametrize("input,expected", [("a", "A"), ("b", "B")])
+def test_func_transforms_correctly(self, input, expected):
+    assert func(input) == expected
+```
+
+### Pattern 3: Extract common assertions
+
+**Problem**: the same assertion bundle appears across multiple tests.
+
+```python
+content = file.read_text()
+assert "key1" in content
+assert "key2" in content
+```
+
+**Preferred direction**: move the repeated assertion logic into a reusable helper.
+
+```python
+def assert_file_contains_all(file_path, expected_strings):
+    content = file_path.read_text()
+    for expected in expected_strings:
+        assert expected in content
+```
+
+### Pattern 4: Replace `time.sleep()` with time control
+
+**Problem**: real delays make tests slow and flaky.
+
+```python
+time.sleep(1)
+assert runtime.running_time >= 1.0
+```
+
+**Preferred direction**: use deterministic time control.
+
+```python
+from datetime import timedelta
+from freezegun import freeze_time
+
+with freeze_time("2025-01-01 12:00:00") as frozen_time:
+    start_runtime()
+    frozen_time.tick(delta=timedelta(seconds=5))
+    assert runtime.running_time >= 5.0
+```
+
+## Final verification reminder
+
+After each pattern application, rerun the smallest relevant test slice first, then the broader suite needed to prove no regression.
diff --git a/skills/test-improvement-workflow/.claude-plugin b/skills/test-improvement-workflow/.claude-plugin
@@ -0,0 +1 @@
+.plugin
diff --git a/skills/test-improvement-workflow/.codex-plugin b/skills/test-improvement-workflow/.codex-plugin
@@ -0,0 +1 @@
+.plugin