diff --git a/services/testing/README.md b/services/testing/README.md index cb6c908f..241f4e32 100644 --- a/services/testing/README.md +++ b/services/testing/README.md @@ -24,10 +24,20 @@ This directory is on the Python path via `pyproject.toml` - `spec_collector.py` — pytest plugin (registered via `pytest_plugins` in the repo-root `conftest.py`). Turns each MD spec into a pytest item that builds the service payload, calls the service via `ApolloClient`, and runs the judge. - Any project YAML in the response (`response_yaml`, `workflow_yaml`, - `content_yaml`, or a `workflow_yaml` attachment) is written to a `tmp/` - folder next to the spec file (e.g. - `services/workflow_chat/tests/acceptance/tmp/.yaml`) for inspection. + For inspection, three artifacts are written to a `tmp/` folder next to the + spec file: + - `.yaml` — any project YAML in the response (`response_yaml`, + `workflow_yaml`, `content_yaml`, or a `workflow_yaml` attachment). + - `.txt` — the response text, prefixed with the agent path + (e.g. `agents: router -> planner -> job_code_agent`). + - `.judges.txt` — the judge verdict(s) in the same format printed + during the run (per-judge PASS/FAIL header + criteria/flags summary). + + Filenames use `__` as the metadata separator (the spec id and extension only + use `.`/`-`), so they stay splittable. Multi-run specs append `__run-N`. Pass + `-E