diff --git a/.claude/skills/database-migrations/SKILL.md b/.claude/skills/database-migrations/SKILL.md new file mode 100644 index 0000000..b1562c2 --- /dev/null +++ b/.claude/skills/database-migrations/SKILL.md @@ -0,0 +1,429 @@ +--- +name: database-migrations +description: Database migration best practices for schema changes, data migrations, rollbacks, and zero-downtime deployments across PostgreSQL, MySQL, and common ORMs (Prisma, Drizzle, Kysely, Django, TypeORM, golang-migrate). +origin: ECC +--- + +# Database Migration Patterns + +Safe, reversible database schema changes for production systems. + +## When to Activate + +- Creating or altering database tables +- Adding/removing columns or indexes +- Running data migrations (backfill, transform) +- Planning zero-downtime schema changes +- Setting up migration tooling for a new project + +## Core Principles + +1. **Every change is a migration** — never alter production databases manually +2. **Migrations are forward-only in production** — rollbacks use new forward migrations +3. **Schema and data migrations are separate** — never mix DDL and DML in one migration +4. **Test migrations against production-sized data** — a migration that works on 100 rows may lock on 10M +5. **Migrations are immutable once deployed** — never edit a migration that has run in production + +## Migration Safety Checklist + +Before applying any migration: + +- [ ] Migration has both UP and DOWN (or is explicitly marked irreversible) +- [ ] No full table locks on large tables (use concurrent operations) +- [ ] New columns have defaults or are nullable (never add NOT NULL without default) +- [ ] Indexes created concurrently (not inline with CREATE TABLE for existing tables) +- [ ] Data backfill is a separate migration from schema change +- [ ] Tested against a copy of production data +- [ ] Rollback plan documented + +## PostgreSQL Patterns + +### Adding a Column Safely + +```sql +-- GOOD: Nullable column, no lock +ALTER TABLE users ADD COLUMN avatar_url TEXT; + +-- GOOD: Column with default (Postgres 11+ is instant, no rewrite) +ALTER TABLE users ADD COLUMN is_active BOOLEAN NOT NULL DEFAULT true; + +-- BAD: NOT NULL without default on existing table (requires full rewrite) +ALTER TABLE users ADD COLUMN role TEXT NOT NULL; +-- This locks the table and rewrites every row +``` + +### Adding an Index Without Downtime + +```sql +-- BAD: Blocks writes on large tables +CREATE INDEX idx_users_email ON users (email); + +-- GOOD: Non-blocking, allows concurrent writes +CREATE INDEX CONCURRENTLY idx_users_email ON users (email); + +-- Note: CONCURRENTLY cannot run inside a transaction block +-- Most migration tools need special handling for this +``` + +### Renaming a Column (Zero-Downtime) + +Never rename directly in production. Use the expand-contract pattern: + +```sql +-- Step 1: Add new column (migration 001) +ALTER TABLE users ADD COLUMN display_name TEXT; + +-- Step 2: Backfill data (migration 002, data migration) +UPDATE users SET display_name = username WHERE display_name IS NULL; + +-- Step 3: Update application code to read/write both columns +-- Deploy application changes + +-- Step 4: Stop writing to old column, drop it (migration 003) +ALTER TABLE users DROP COLUMN username; +``` + +### Removing a Column Safely + +```sql +-- Step 1: Remove all application references to the column +-- Step 2: Deploy application without the column reference +-- Step 3: Drop column in next migration +ALTER TABLE orders DROP COLUMN legacy_status; + +-- For Django: use SeparateDatabaseAndState to remove from model +-- without generating DROP COLUMN (then drop in next migration) +``` + +### Large Data Migrations + +```sql +-- BAD: Updates all rows in one transaction (locks table) +UPDATE users SET normalized_email = LOWER(email); + +-- GOOD: Batch update with progress +DO $$ +DECLARE + batch_size INT := 10000; + rows_updated INT; +BEGIN + LOOP + UPDATE users + SET normalized_email = LOWER(email) + WHERE id IN ( + SELECT id FROM users + WHERE normalized_email IS NULL + LIMIT batch_size + FOR UPDATE SKIP LOCKED + ); + GET DIAGNOSTICS rows_updated = ROW_COUNT; + RAISE NOTICE 'Updated % rows', rows_updated; + EXIT WHEN rows_updated = 0; + COMMIT; + END LOOP; +END $$; +``` + +## Prisma (TypeScript/Node.js) + +### Workflow + +```bash +# Create migration from schema changes +npx prisma migrate dev --name add_user_avatar + +# Apply pending migrations in production +npx prisma migrate deploy + +# Reset database (dev only) +npx prisma migrate reset + +# Generate client after schema changes +npx prisma generate +``` + +### Schema Example + +```prisma +model User { + id String @id @default(cuid()) + email String @unique + name String? + avatarUrl String? @map("avatar_url") + createdAt DateTime @default(now()) @map("created_at") + updatedAt DateTime @updatedAt @map("updated_at") + orders Order[] + + @@map("users") + @@index([email]) +} +``` + +### Custom SQL Migration + +For operations Prisma cannot express (concurrent indexes, data backfills): + +```bash +# Create empty migration, then edit the SQL manually +npx prisma migrate dev --create-only --name add_email_index +``` + +```sql +-- migrations/20240115_add_email_index/migration.sql +-- Prisma cannot generate CONCURRENTLY, so we write it manually +CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_email ON users (email); +``` + +## Drizzle (TypeScript/Node.js) + +### Workflow + +```bash +# Generate migration from schema changes +npx drizzle-kit generate + +# Apply migrations +npx drizzle-kit migrate + +# Push schema directly (dev only, no migration file) +npx drizzle-kit push +``` + +### Schema Example + +```typescript +import { pgTable, text, timestamp, uuid, boolean } from "drizzle-orm/pg-core"; + +export const users = pgTable("users", { + id: uuid("id").primaryKey().defaultRandom(), + email: text("email").notNull().unique(), + name: text("name"), + isActive: boolean("is_active").notNull().default(true), + createdAt: timestamp("created_at").notNull().defaultNow(), + updatedAt: timestamp("updated_at").notNull().defaultNow(), +}); +``` + +## Kysely (TypeScript/Node.js) + +### Workflow (kysely-ctl) + +```bash +# Initialize config file (kysely.config.ts) +kysely init + +# Create a new migration file +kysely migrate make add_user_avatar + +# Apply all pending migrations +kysely migrate latest + +# Rollback last migration +kysely migrate down + +# Show migration status +kysely migrate list +``` + +### Migration File + +```typescript +// migrations/2024_01_15_001_create_user_profile.ts +import { type Kysely, sql } from 'kysely' + +// IMPORTANT: Always use Kysely, not your typed DB interface. +// Migrations are frozen in time and must not depend on current schema types. +export async function up(db: Kysely): Promise { + await db.schema + .createTable('user_profile') + .addColumn('id', 'serial', (col) => col.primaryKey()) + .addColumn('email', 'varchar(255)', (col) => col.notNull().unique()) + .addColumn('avatar_url', 'text') + .addColumn('created_at', 'timestamp', (col) => + col.defaultTo(sql`now()`).notNull() + ) + .execute() + + await db.schema + .createIndex('idx_user_profile_avatar') + .on('user_profile') + .column('avatar_url') + .execute() +} + +export async function down(db: Kysely): Promise { + await db.schema.dropTable('user_profile').execute() +} +``` + +### Programmatic Migrator + +```typescript +import { Migrator, FileMigrationProvider } from 'kysely' +import { promises as fs } from 'fs' +import * as path from 'path' +// ESM only — CJS can use __dirname directly +import { fileURLToPath } from 'url' +const migrationFolder = path.join( + path.dirname(fileURLToPath(import.meta.url)), + './migrations', +) + +// `db` is your Kysely database instance +const migrator = new Migrator({ + db, + provider: new FileMigrationProvider({ + fs, + path, + migrationFolder, + }), + // WARNING: Only enable in development. Disables timestamp-ordering + // validation, which can cause schema drift between environments. + // allowUnorderedMigrations: true, +}) + +const { error, results } = await migrator.migrateToLatest() + +results?.forEach((it) => { + if (it.status === 'Success') { + console.log(`migration "${it.migrationName}" executed successfully`) + } else if (it.status === 'Error') { + console.error(`failed to execute migration "${it.migrationName}"`) + } +}) + +if (error) { + console.error('migration failed', error) + process.exit(1) +} +``` + +## Django (Python) + +### Workflow + +```bash +# Generate migration from model changes +python manage.py makemigrations + +# Apply migrations +python manage.py migrate + +# Show migration status +python manage.py showmigrations + +# Generate empty migration for custom SQL +python manage.py makemigrations --empty app_name -n description +``` + +### Data Migration + +```python +from django.db import migrations + +def backfill_display_names(apps, schema_editor): + User = apps.get_model("accounts", "User") + batch_size = 5000 + users = User.objects.filter(display_name="") + while users.exists(): + batch = list(users[:batch_size]) + for user in batch: + user.display_name = user.username + User.objects.bulk_update(batch, ["display_name"], batch_size=batch_size) + +def reverse_backfill(apps, schema_editor): + pass # Data migration, no reverse needed + +class Migration(migrations.Migration): + dependencies = [("accounts", "0015_add_display_name")] + + operations = [ + migrations.RunPython(backfill_display_names, reverse_backfill), + ] +``` + +### SeparateDatabaseAndState + +Remove a column from the Django model without dropping it from the database immediately: + +```python +class Migration(migrations.Migration): + operations = [ + migrations.SeparateDatabaseAndState( + state_operations=[ + migrations.RemoveField(model_name="user", name="legacy_field"), + ], + database_operations=[], # Don't touch the DB yet + ), + ] +``` + +## golang-migrate (Go) + +### Workflow + +```bash +# Create migration pair +migrate create -ext sql -dir migrations -seq add_user_avatar + +# Apply all pending migrations +migrate -path migrations -database "$DATABASE_URL" up + +# Rollback last migration +migrate -path migrations -database "$DATABASE_URL" down 1 + +# Force version (fix dirty state) +migrate -path migrations -database "$DATABASE_URL" force VERSION +``` + +### Migration Files + +```sql +-- migrations/000003_add_user_avatar.up.sql +ALTER TABLE users ADD COLUMN avatar_url TEXT; +CREATE INDEX CONCURRENTLY idx_users_avatar ON users (avatar_url) WHERE avatar_url IS NOT NULL; + +-- migrations/000003_add_user_avatar.down.sql +DROP INDEX IF EXISTS idx_users_avatar; +ALTER TABLE users DROP COLUMN IF EXISTS avatar_url; +``` + +## Zero-Downtime Migration Strategy + +For critical production changes, follow the expand-contract pattern: + +``` +Phase 1: EXPAND + - Add new column/table (nullable or with default) + - Deploy: app writes to BOTH old and new + - Backfill existing data + +Phase 2: MIGRATE + - Deploy: app reads from NEW, writes to BOTH + - Verify data consistency + +Phase 3: CONTRACT + - Deploy: app only uses NEW + - Drop old column/table in separate migration +``` + +### Timeline Example + +``` +Day 1: Migration adds new_status column (nullable) +Day 1: Deploy app v2 — writes to both status and new_status +Day 2: Run backfill migration for existing rows +Day 3: Deploy app v3 — reads from new_status only +Day 7: Migration drops old status column +``` + +## Anti-Patterns + +| Anti-Pattern | Why It Fails | Better Approach | +|-------------|-------------|-----------------| +| Manual SQL in production | No audit trail, unrepeatable | Always use migration files | +| Editing deployed migrations | Causes drift between environments | Create new migration instead | +| NOT NULL without default | Locks table, rewrites all rows | Add nullable, backfill, then add constraint | +| Inline index on large table | Blocks writes during build | CREATE INDEX CONCURRENTLY | +| Schema + data in one migration | Hard to rollback, long transactions | Separate migrations | +| Dropping column before removing code | Application errors on missing column | Remove code first, drop column next deploy | diff --git a/.claude/skills/docker-patterns/SKILL.md b/.claude/skills/docker-patterns/SKILL.md new file mode 100644 index 0000000..c438c4a --- /dev/null +++ b/.claude/skills/docker-patterns/SKILL.md @@ -0,0 +1,364 @@ +--- +name: docker-patterns +description: Docker and Docker Compose patterns for local development, container security, networking, volume strategies, and multi-service orchestration. +origin: ECC +--- + +# Docker Patterns + +Docker and Docker Compose best practices for containerized development. + +## When to Activate + +- Setting up Docker Compose for local development +- Designing multi-container architectures +- Troubleshooting container networking or volume issues +- Reviewing Dockerfiles for security and size +- Migrating from local dev to containerized workflow + +## Docker Compose for Local Development + +### Standard Web App Stack + +```yaml +# docker-compose.yml +services: + app: + build: + context: . + target: dev # Use dev stage of multi-stage Dockerfile + ports: + - "3000:3000" + volumes: + - .:/app # Bind mount for hot reload + - /app/node_modules # Anonymous volume -- preserves container deps + environment: + - DATABASE_URL=postgres://postgres:postgres@db:5432/app_dev + - REDIS_URL=redis://redis:6379/0 + - NODE_ENV=development + depends_on: + db: + condition: service_healthy + redis: + condition: service_started + command: npm run dev + + db: + image: postgres:16-alpine + ports: + - "5432:5432" + environment: + POSTGRES_USER: postgres + POSTGRES_PASSWORD: postgres + POSTGRES_DB: app_dev + volumes: + - pgdata:/var/lib/postgresql/data + - ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql + healthcheck: + test: ["CMD-SHELL", "pg_isready -U postgres"] + interval: 5s + timeout: 3s + retries: 5 + + redis: + image: redis:7-alpine + ports: + - "6379:6379" + volumes: + - redisdata:/data + + mailpit: # Local email testing + image: axllent/mailpit + ports: + - "8025:8025" # Web UI + - "1025:1025" # SMTP + +volumes: + pgdata: + redisdata: +``` + +### Development vs Production Dockerfile + +```dockerfile +# Stage: dependencies +FROM node:22-alpine AS deps +WORKDIR /app +COPY package.json package-lock.json ./ +RUN npm ci + +# Stage: dev (hot reload, debug tools) +FROM node:22-alpine AS dev +WORKDIR /app +COPY --from=deps /app/node_modules ./node_modules +COPY . . +EXPOSE 3000 +CMD ["npm", "run", "dev"] + +# Stage: build +FROM node:22-alpine AS build +WORKDIR /app +COPY --from=deps /app/node_modules ./node_modules +COPY . . +RUN npm run build && npm prune --production + +# Stage: production (minimal image) +FROM node:22-alpine AS production +WORKDIR /app +RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001 +USER appuser +COPY --from=build --chown=appuser:appgroup /app/dist ./dist +COPY --from=build --chown=appuser:appgroup /app/node_modules ./node_modules +COPY --from=build --chown=appuser:appgroup /app/package.json ./ +ENV NODE_ENV=production +EXPOSE 3000 +HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:3000/health || exit 1 +CMD ["node", "dist/server.js"] +``` + +### Override Files + +```yaml +# docker-compose.override.yml (auto-loaded, dev-only settings) +services: + app: + environment: + - DEBUG=app:* + - LOG_LEVEL=debug + ports: + - "9229:9229" # Node.js debugger + +# docker-compose.prod.yml (explicit for production) +services: + app: + build: + target: production + restart: always + deploy: + resources: + limits: + cpus: "1.0" + memory: 512M +``` + +```bash +# Development (auto-loads override) +docker compose up + +# Production +docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d +``` + +## Networking + +### Service Discovery + +Services in the same Compose network resolve by service name: +``` +# From "app" container: +postgres://postgres:postgres@db:5432/app_dev # "db" resolves to the db container +redis://redis:6379/0 # "redis" resolves to the redis container +``` + +### Custom Networks + +```yaml +services: + frontend: + networks: + - frontend-net + + api: + networks: + - frontend-net + - backend-net + + db: + networks: + - backend-net # Only reachable from api, not frontend + +networks: + frontend-net: + backend-net: +``` + +### Exposing Only What's Needed + +```yaml +services: + db: + ports: + - "127.0.0.1:5432:5432" # Only accessible from host, not network + # Omit ports entirely in production -- accessible only within Docker network +``` + +## Volume Strategies + +```yaml +volumes: + # Named volume: persists across container restarts, managed by Docker + pgdata: + + # Bind mount: maps host directory into container (for development) + # - ./src:/app/src + + # Anonymous volume: preserves container-generated content from bind mount override + # - /app/node_modules +``` + +### Common Patterns + +```yaml +services: + app: + volumes: + - .:/app # Source code (bind mount for hot reload) + - /app/node_modules # Protect container's node_modules from host + - /app/.next # Protect build cache + + db: + volumes: + - pgdata:/var/lib/postgresql/data # Persistent data + - ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql # Init scripts +``` + +## Container Security + +### Dockerfile Hardening + +```dockerfile +# 1. Use specific tags (never :latest) +FROM node:22.12-alpine3.20 + +# 2. Run as non-root +RUN addgroup -g 1001 -S app && adduser -S app -u 1001 +USER app + +# 3. Drop capabilities (in compose) +# 4. Read-only root filesystem where possible +# 5. No secrets in image layers +``` + +### Compose Security + +```yaml +services: + app: + security_opt: + - no-new-privileges:true + read_only: true + tmpfs: + - /tmp + - /app/.cache + cap_drop: + - ALL + cap_add: + - NET_BIND_SERVICE # Only if binding to ports < 1024 +``` + +### Secret Management + +```yaml +# GOOD: Use environment variables (injected at runtime) +services: + app: + env_file: + - .env # Never commit .env to git + environment: + - API_KEY # Inherits from host environment + +# GOOD: Docker secrets (Swarm mode) +secrets: + db_password: + file: ./secrets/db_password.txt + +services: + db: + secrets: + - db_password + +# BAD: Hardcoded in image +# ENV API_KEY=sk-proj-xxxxx # NEVER DO THIS +``` + +## .dockerignore + +``` +node_modules +.git +.env +.env.* +dist +coverage +*.log +.next +.cache +docker-compose*.yml +Dockerfile* +README.md +tests/ +``` + +## Debugging + +### Common Commands + +```bash +# View logs +docker compose logs -f app # Follow app logs +docker compose logs --tail=50 db # Last 50 lines from db + +# Execute commands in running container +docker compose exec app sh # Shell into app +docker compose exec db psql -U postgres # Connect to postgres + +# Inspect +docker compose ps # Running services +docker compose top # Processes in each container +docker stats # Resource usage + +# Rebuild +docker compose up --build # Rebuild images +docker compose build --no-cache app # Force full rebuild + +# Clean up +docker compose down # Stop and remove containers +docker compose down -v # Also remove volumes (DESTRUCTIVE) +docker system prune # Remove unused images/containers +``` + +### Debugging Network Issues + +```bash +# Check DNS resolution inside container +docker compose exec app nslookup db + +# Check connectivity +docker compose exec app wget -qO- http://api:3000/health + +# Inspect network +docker network ls +docker network inspect _default +``` + +## Anti-Patterns + +``` +# BAD: Using docker compose in production without orchestration +# Use Kubernetes, ECS, or Docker Swarm for production multi-container workloads + +# BAD: Storing data in containers without volumes +# Containers are ephemeral -- all data lost on restart without volumes + +# BAD: Running as root +# Always create and use a non-root user + +# BAD: Using :latest tag +# Pin to specific versions for reproducible builds + +# BAD: One giant container with all services +# Separate concerns: one process per container + +# BAD: Putting secrets in docker-compose.yml +# Use .env files (gitignored) or Docker secrets +``` diff --git a/.claude/skills/postgres-optimization/SKILL.md b/.claude/skills/postgres-optimization/SKILL.md new file mode 100644 index 0000000..b72bc7b --- /dev/null +++ b/.claude/skills/postgres-optimization/SKILL.md @@ -0,0 +1,319 @@ +--- +name: postgres-optimization +description: Comprehensive PostgreSQL optimization — indexes, query plans, partitioning, JSONB, connection pooling, and unconventional techniques like constraint exclusion, function-based indexes, and hash uniqueness +--- + +# PostgreSQL Optimization + +> **"Beyond 'just add an index' — systematic and creative solutions for real performance problems."** + +--- + +## 1. Index Strategies + +### Standard Index Types + +```sql +-- B-tree: equality and range queries (default) +CREATE INDEX idx_orders_customer_id ON orders (customer_id); + +-- Composite: equality columns first, range/sort last +CREATE INDEX idx_orders_status_created ON orders (status, created_at DESC); + +-- Partial: smaller index for filtered queries +CREATE INDEX idx_orders_pending ON orders (created_at) + WHERE status = 'pending'; + +-- Covering: avoids table lookup entirely +CREATE INDEX idx_users_email_name ON users (email) INCLUDE (name, avatar_url); + +-- GIN: for JSONB containment queries +CREATE INDEX idx_products_metadata ON products USING GIN (metadata); + +-- GiST: for full-text search +CREATE INDEX idx_articles_search ON articles USING GiST ( + to_tsvector('english', title || ' ' || body) +); + +-- Concurrent creation (no table lock) +CREATE INDEX CONCURRENTLY idx_large_table_col ON large_table (col); +``` + +### Function-Based Indexes (Reduce Size for Low-Cardinality Columns) + +When a timestamp column is queried at coarser granularity (daily), indexing the full timestamp wastes space: + +```sql +-- Instead of indexing the full timestamptz (214 MB) +CREATE INDEX sale_sold_at_ix ON sale(sold_at); + +-- Index only the derived date (66 MB — 3x smaller) +CREATE INDEX sale_sold_at_date_ix +ON sale((date_trunc('day', sold_at AT TIME ZONE 'UTC'))::date); +``` + +**Important:** Queries must use the exact same expression to hit this index: + +```sql +-- Uses the index ✓ +WHERE date_trunc('day', sold_at AT TIME ZONE 'UTC')::date BETWEEN '2025-01-01' AND '2025-01-31' + +-- Does NOT use the index ✗ +WHERE (sold_at AT TIME ZONE 'UTC')::date BETWEEN '2025-01-01' AND '2025-01-31' +``` + +**PostgreSQL 18+:** Use virtual generated columns to avoid discipline issues: + +```sql +ALTER TABLE sale ADD sold_at_date DATE +GENERATED ALWAYS AS (date_trunc('day', sold_at AT TIME ZONE 'UTC')); +``` + +### Hash Index for Uniqueness on Large Text Values + +For tables with large text columns (URLs, documents), a B-Tree unique index stores actual values in leaf nodes — making it nearly the size of the table. A hash exclusion constraint stores only hash values: + +```sql +-- B-Tree unique (154 MB for a 160 MB table) +CREATE UNIQUE INDEX urls_url_unique_ix ON urls(url); + +-- Hash exclusion constraint (32 MB — 5x smaller) +ALTER TABLE urls +ADD CONSTRAINT urls_url_unique_hash +EXCLUDE USING HASH (url WITH =); +``` + +Uniqueness is still enforced: + +```sql +INSERT INTO urls (id, url) VALUES (1000002, 'https://example.com'); +-- ERROR: conflicting key value violates exclusion constraint +``` + +Hash index lookup is also faster (0.022 ms vs B-Tree 0.046 ms). + +**Limitations of hash exclusion vs B-Tree unique:** + +| Feature | B-Tree Unique | Hash Exclusion | +|---------|--------------|----------------| +| Foreign key reference | ✓ | ✗ | +| `ON CONFLICT (column)` | ✓ | ✗ | +| `ON CONFLICT ON CONSTRAINT` | ✓ | ✓ (DO NOTHING only) | +| `ON CONFLICT DO UPDATE` | ✓ | ✗ | +| `MERGE` | ✓ | ✓ | + +Use `MERGE` as a workaround for upserts: + +```sql +MERGE INTO urls t +USING (VALUES (1000004, 'https://example.com')) AS s(id, url) +ON t.url = s.url +WHEN MATCHED THEN UPDATE SET id = s.id +WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url); +``` + +--- + +## 2. Constraint Exclusion + +PostgreSQL can use check constraints to skip impossible query scans entirely. + +```sql +CREATE TABLE users ( + id INT PRIMARY KEY, + username TEXT NOT NULL, + plan TEXT NOT NULL, + CONSTRAINT plan_check CHECK (plan IN ('free', 'pro')) +); +``` + +Without constraint exclusion, `WHERE plan = 'Pro'` (capital P — impossible) scans the whole table. With it: + +```sql +SET constraint_exclusion TO 'on'; + +EXPLAIN ANALYZE SELECT * FROM users WHERE plan = 'Pro'; +-- Result: One-Time Filter: false | Execution Time: 0.008 ms +``` + +| Environment | Recommendation | +|-------------|----------------| +| OLTP production | Leave as `'partition'` (default) | +| BI / Data Warehouse | Set to `'on'` | +| Ad-hoc query / reporting | Set to `'on'` | + +**Cost:** Slight extra planning overhead evaluating constraints. + +--- + +## 3. Reading Query Plans + +```sql +EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) +SELECT o.id, o.total, u.name +FROM orders o +JOIN users u ON o.user_id = u.id +WHERE o.status = 'shipped' + AND o.created_at > NOW() - INTERVAL '30 days' +ORDER BY o.created_at DESC +LIMIT 20; +``` + +What to look for: + +| Signal | Meaning | +|--------|---------| +| `Seq Scan` on large table | Missing index | +| `Nested Loop` with high row estimates | Missing join index | +| `Sort` without `Index Scan` | In-memory/disk sort — add index | +| `Buffers: shared hit` vs `shared read` | Cache efficiency | + +--- + +## 4. Partitioning + +Partition tables over ~10M rows when queries consistently filter on the partition key. + +```sql +CREATE TABLE events ( + id BIGINT GENERATED ALWAYS AS IDENTITY, + event_type TEXT NOT NULL, + payload JSONB NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +) PARTITION BY RANGE (created_at); + +CREATE TABLE events_2024_q1 PARTITION OF events + FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); +CREATE TABLE events_2024_q2 PARTITION OF events + FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); + +-- Indexes are inherited automatically (PG 11+) +CREATE INDEX ON events (created_at, event_type); +``` + +--- + +## 5. JSONB Operations + +```sql +-- Containment query (uses GIN index) +SELECT * FROM products +WHERE metadata @> '{"category": "electronics"}' + AND (metadata ->> 'price')::numeric < 500; + +-- Update nested JSONB +UPDATE products +SET metadata = jsonb_set(metadata, '{stock}', to_jsonb(stock - 1)) +WHERE id = 'abc'; + +-- Expand JSONB arrays +SELECT id, jsonb_array_elements_text(metadata -> 'tags') AS tag +FROM products +WHERE metadata ? 'tags'; +``` + +--- + +## 6. Connection Pooling + +Each PostgreSQL connection uses ~10 MB of server memory. Always use a pooler in front of Postgres for web applications. + +```ini +# pgbouncer.ini +[databases] +app = host=localhost port=5432 dbname=app + +[pgbouncer] +pool_mode = transaction # transaction-level for web apps +max_client_conn = 1000 +default_pool_size = 25 +min_pool_size = 5 +reserve_pool_size = 5 +server_idle_timeout = 300 +``` + +Use **session-level pooling** only if the app relies on prepared statements or temp tables. + +--- + +## 7. Diagnostic Queries + +**Slow queries:** +```sql +SELECT query, calls, mean_exec_time, total_exec_time +FROM pg_stat_statements +ORDER BY total_exec_time DESC +LIMIT 10; +``` + +**Unused indexes:** +```sql +SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) +FROM pg_stat_user_indexes +WHERE idx_scan = 0 +ORDER BY pg_relation_size(indexrelid) DESC; +``` + +**Compare index vs table size:** +```sql +SELECT + relname AS name, + pg_size_pretty(pg_relation_size(oid)) AS size +FROM pg_class +WHERE relname LIKE 'your_table%' +ORDER BY pg_relation_size(oid) DESC; +``` + +**Check constraint exclusion setting:** +```sql +SHOW constraint_exclusion; +``` + +--- + +## 8. Decision Tree + +``` +Query too slow? +├── Check EXPLAIN ANALYZE +│ ├── Seq Scan on large table → Add index +│ ├── Condition always false? → Enable constraint_exclusion +│ └── Sort without index → Add sorted index +│ +├── Index too large? +│ ├── Timestamp queried by day/week → Function-based index on truncated date +│ └── Large text uniqueness → Hash exclusion constraint +│ +├── Table over 10M rows with date filters? +│ └── Partition by range +│ +└── Connection exhaustion? + └── Add PgBouncer / pgcat +``` + +--- + +## 9. Anti-Patterns + +- Indexing every column instead of analyzing actual query patterns +- Using `SELECT *` when only a few columns are needed +- Not using `EXPLAIN ANALYZE` to verify index usage +- Storing large blobs in JSONB when a typed table is better +- Skipping connection pooling +- Running `VACUUM FULL` during peak hours (locks entire table) + +--- + +## 10. Checklist + +- [ ] `pg_stat_statements` enabled for query monitoring +- [ ] Indexes match actual query patterns +- [ ] Composite indexes ordered: equality → sort → range +- [ ] `EXPLAIN ANALYZE` run on all critical queries +- [ ] Partial indexes for frequently filtered subsets +- [ ] Function-based indexes for low-cardinality derived columns +- [ ] Hash exclusion for large-text uniqueness where upserts aren't needed +- [ ] Constraint exclusion enabled on BI/reporting databases +- [ ] Connection pooler (PgBouncer/pgcat) deployed +- [ ] Table partitioning for tables over 10M rows +- [ ] Unused indexes identified and dropped diff --git a/.claude/skills/productionalize-node/SKILL.md b/.claude/skills/productionalize-node/SKILL.md new file mode 100644 index 0000000..ccf5bfe --- /dev/null +++ b/.claude/skills/productionalize-node/SKILL.md @@ -0,0 +1,110 @@ +--- +name: productionalize-node +description: Use when transforming a vibe-coded Node.js project into production-ready code. Triggers on requests to productionalize, harden, or bring a Node.js codebase to production quality. Handles TypeScript strict migration, testing, linting, security hardening, CI setup, dependency management, and documentation. +--- + +# productionalize-node + +One-shot skill that takes a scaffolded-but-hollow Node.js project to production quality. Works standalone without any sub-skills installed. Enhanced by optional sub-skills when available. + +## Parameters + +- `--spec=` — Reference spec (API spec, RFC, protocol doc) for compliance validation. +- `--mode=parallel|sequential` — Execution mode. Default: `parallel`. +- `--coverage=` — Target test coverage percentage. Default: `80`. Confirmed with user during opening flow. + +## Phase 0: Opening Flow + +Execute these steps before any code changes: + +1. Read `package.json`, `tsconfig.json`, `README.md`, and list the top-level directory structure. Identify the runtime, framework, language version, and current tooling. +2. If `--spec` was not provided, ask: _"Do you have a reference spec (API spec, RFC, protocol doc) for this project? It can be a file path or URL."_ Store the answer for Phase 1. +3. Ask: _"Do you have specific tooling preferences, or should I propose a default stack?"_ +4. Read `references/tooling-defaults.md`. Present the default tooling stack to the user. Do not install anything until the user confirms. +5. Ask about optional items: + - _"Would you like me to set up CHANGELOG + commit conventions?"_ + - _"Should I evaluate whether CORS is needed?"_ +6. Confirm coverage target: _"I'll target 80% test coverage. Does that work, or would you prefer a different target?"_ Use `--coverage` value if provided instead of asking. + +## Phase 1: Assessment + +1. Read `references/assessment-checklist.md`. +2. Evaluate all 19 checklist items against the codebase. For each item, record: status (pass/fail/partial), evidence, and remediation needed. +3. If a spec was provided (`--spec`), read it and cross-reference against the codebase for compliance gaps. +4. Generate a plan document at `docs/vibe-to-production-plan.md` with findings, prioritized remediation steps, and phase assignments. +5. Present the plan to the user. Wait for approval before proceeding to execution. + +## Execution + +Read the corresponding phase file from `references/` before executing each phase. + +### Phase Dependency Order + +``` +Phase 1 (Foundation) — no dependencies +Phase 2 (Quality Infra) ─┐ +Phase 3 (Hardening) ─┤── depend on Phase 1, run in parallel with each other +Phase 4 (Dependencies) ─┘ +Phase 5 (Testing) — depends on Phases 2, 3 +Phase 6 (CI/CD) — depends on Phases 2, 4 +Phase 7 (Documentation) — depends on all above +Phase 8 (Final Review) — depends on all above +``` + +### Parallel Mode (default) + +If a parallel agent dispatch skill is available, invoke it to run independent phases concurrently. Otherwise, execute phases sequentially within each dependency group: + +1. Execute Phase 1. Run quality gates. +2. Execute Phases 2, 3, 4 concurrently (or sequentially if no dispatch skill). Run quality gates after each. +3. Execute Phase 5. Run quality gates. +4. Execute Phase 6. Run quality gates. +5. Execute Phase 7. Run quality gates. +6. Execute Phase 8. + +### Sequential Mode + +Execute phases one at a time in dependency order. Confirm with the user between each phase before proceeding. + +## Quality Gates + +After each phase completes, run every available check: + +```bash +pnpm run check:types # TypeScript type check +pnpm run lint # Lint +pnpm test -- --run # Unit tests +pnpm run build # Build +``` + +Skip any command that is not yet configured (e.g., lint may not exist until Phase 2). If any check fails, remediate the failure before proceeding to the next phase. + +## Sub-skill Integration + +Sub-skills are capability-based, not name-bound. For each capability below, invoke the matching skill if one is available. Otherwise, follow the inline fallback instructions in the relevant phase file. + +| Capability | Used In | Fallback | +| ------------------------------ | ---------------- | ---------------------------------------------------- | +| Security review | Phase 3, Phase 8 | Manual review per `references/phase-3-hardening.md` | +| TDD workflow | Phase 5 | Write tests per `references/phase-5-testing.md` | +| Code review | Phase 8 | Self-review per `references/phase-8-final-review.md` | +| Plan writing | Assessment | Generate plan inline | +| Parallel agent dispatch | Parallel mode | Execute sequentially within dependency groups | +| Verification before completion | All phases | Run quality gates manually after each phase | + +## Completion + +1. Read `references/phase-8-final-review.md`. +2. Evaluate all 13 completion criteria against the final codebase state. +3. Produce a final report table with pass/fail per criterion: + +``` +| # | Criterion | Status | Notes | +|---|---------------------|--------|-------| +| 1 | TypeScript strict | PASS | | +| 2 | ... | ... | | +``` + +4. If any criterion fails, remediate and re-evaluate. +5. Leave `docs/vibe-to-production-plan.md` in the repo as an artifact of the transformation. +6. Present the final report to the user. diff --git a/.claude/skills/productionalize-node/references/assessment-checklist.md b/.claude/skills/productionalize-node/references/assessment-checklist.md new file mode 100644 index 0000000..795eb39 --- /dev/null +++ b/.claude/skills/productionalize-node/references/assessment-checklist.md @@ -0,0 +1,136 @@ +# Assessment Checklist + +Evaluate each item against the target codebase. Record: status (pass/fail/partial), evidence, and remediation needed. + +## 1. TypeScript Configuration + +**What to check:** Does `tsconfig.json` exist? Is `strict: true` enabled? Check for: `strictNullChecks`, `noImplicitAny`, `noImplicitReturns`, `isolatedModules`, `target` (should be ES2022+), `module` (ESNext/NodeNext), `moduleResolution` (bundler/nodenext). +**How to evaluate:** Read `tsconfig.json` at the repo root (and any extended configs like `tsconfig.build.json`). Verify the `compilerOptions` object explicitly sets `strict: true`. If `strict` is absent, check whether the individual strict-family flags are enabled independently. Inspect `target`, `module`, and `moduleResolution` values. +**Pass criteria:** All strict flags enabled, modern target and module settings. +**Common vibe-code state:** tsconfig exists but `strict` is false or missing, loose target like ES5/ES6, `moduleResolution` set to `node` (legacy) instead of `bundler` or `nodenext`. + +## 2. Error Handling + +**What to check:** Search for unhandled promise rejections, missing try/catch around async operations, generic catch blocks that swallow errors, missing error types/classes. +**How to evaluate:** Grep for `catch` blocks — look for empty catches (`catch {}`, `catch (e) {}`), catches that only `console.log` the error, and catches that don't rethrow or wrap. Search for `.then()` chains without `.catch()`. Search for `async` functions that lack any error handling. Check whether custom error classes exist (search for `extends Error`). Check if `process.on('unhandledRejection')` or `process.on('uncaughtException')` are set up in entry points. +**Pass criteria:** Custom error hierarchy exists, async operations have proper error handling, errors propagate with context (wrapped with cause or custom message). +**Common vibe-code state:** No custom errors, bare `try/catch` with `console.log`, unhandled promise rejections, `.catch(() => {})` to silence errors. + +## 3. Input Validation + +**What to check:** Search for raw `req.body`, `req.params`, `req.query` usage without validation. Check for missing Zod/Joi schemas at API boundaries. +**How to evaluate:** Find all route handlers (search for `app.get`, `app.post`, `router.get`, `router.post`, etc.). For each handler, check whether request data is validated before use. Search for Zod schemas (`z.object`), Joi schemas (`Joi.object`), or manual validation. Check if validated types flow through to business logic or if `any` is used after the boundary. +**Pass criteria:** Every API endpoint validates its input using a schema library. Validated types propagate into handler logic. No raw `req.body` access without prior validation. +**Common vibe-code state:** Routes access `req.body.fieldName` directly with no validation, casting to `any`, or ad-hoc `if (!req.body.field)` checks that miss edge cases. + +## 4. Security + +**What to check:** Hardcoded secrets (grep for API keys, passwords, tokens in source), missing `helmet`, missing rate limiting, raw `process.env` without validation. +**How to evaluate:** Grep for common secret patterns: strings matching API key formats, `password`, `secret`, `token`, `apiKey` assigned to string literals. Check for `.env` files committed to git (`git ls-files .env`). Check if `helmet` middleware is used in Express apps. Check if rate limiting middleware exists. Search for `process.env.` usage and verify it goes through a validation layer. Check for SQL injection vectors (string concatenation in queries), XSS vectors (unescaped user input in responses), and command injection (`exec`, `spawn` with user input). +**Pass criteria:** No hardcoded secrets, helmet enabled, rate limiting on public endpoints, env vars validated, no injection vectors. +**Common vibe-code state:** API keys in source code, no helmet, no rate limiting, raw `process.env` usage everywhere. + +## 5. Test Coverage + +**What to check:** Do any test files exist? What framework is used? Can you run a coverage report? +**How to evaluate:** Search for test files (`*.test.ts`, `*.spec.ts`, `__tests__/`). Check `package.json` for test framework dependencies (vitest, jest, mocha). Run the test suite and check if it passes. If possible, run with coverage (`--coverage` flag). Count test files vs source files to estimate coverage breadth. Check if tests are meaningful (not just `expect(true).toBe(true)`). +**Pass criteria:** Test framework configured and passing, meaningful tests exist for core business logic, coverage is measurable and reasonable (>70% for critical paths). +**Common vibe-code state:** Zero test files, or a test framework is installed but no actual tests written, or tests exist but are all skipped/broken. + +## 6. Linting + Formatting + +**What to check:** Does an ESLint config exist? Does a Prettier config exist? Are they consistent (no conflicting rules)? +**How to evaluate:** Look for ESLint config files (`eslint.config.mjs`, `.eslintrc.*`, `eslint.config.ts`). Look for Prettier config (`.prettierrc`, `.prettierrc.json`, `prettier.config.*`). Check if `eslint-config-prettier` or `eslint-plugin-prettier` is used to avoid conflicts. Run `pnpm run lint` and `pnpm run format:check` (or equivalent) to see if the codebase passes. Check if ESLint uses typescript-eslint for TS-aware rules. +**Pass criteria:** Both ESLint and Prettier configured, no conflicts between them, codebase passes both checks, typescript-eslint rules enabled. +**Common vibe-code state:** No ESLint config, or a default config with no TS support, no Prettier, or both installed but never run (many violations). + +## 7. CI Pipeline + +**What to check:** Does `.github/workflows/` exist? What checks run? Does it mirror local dev checks? +**How to evaluate:** List files in `.github/workflows/`. Read each workflow YAML. Check which steps run (lint, format check, typecheck, test, build). Compare the CI steps to the local dev scripts in `package.json`. Verify that CI runs on pull requests and pushes to main. Check if CI uses the same Node version and package manager as the project. +**Pass criteria:** CI exists, runs all quality checks (lint, format, types, tests, build), triggers on PRs and main pushes, uses correct Node version and package manager. +**Common vibe-code state:** No CI at all, or a minimal CI that only runs `npm test` (which may be a no-op), or CI exists but is broken/disabled. + +## 8. Package.json Scripts + +**What to check:** Does `package.json` have `build`, `test`, `lint`, `check:types`, `format:check` scripts? +**How to evaluate:** Read `package.json` and inspect the `scripts` object. Verify each essential script exists and runs the correct tool. Essential scripts: `build` (should run `tsc` or equivalent), `test` (should run test framework), `lint` (should run ESLint), `check:types` (should run `tsc --noEmit`), `format:check` (should run `prettier --check`). Bonus: `format` (auto-fix), `lint:fix`. +**Pass criteria:** All five essential scripts present and functional. +**Common vibe-code state:** Only `start` and `dev` scripts, missing `build`, `lint`, `check:types`, and `format:check`. + +## 9. Makefile + +**What to check:** Does a Makefile or similar automation exist? Does it mirror CI? +**How to evaluate:** Check for `Makefile` at repo root. If present, read it and compare targets to CI steps and package.json scripts. Look for a `check` or `ci` target that runs the full quality pipeline. Check for a `help` target. If no Makefile, check for alternatives like `justfile`, `taskfile.yml`, or scripts in a `scripts/` directory. +**Pass criteria:** Makefile (or equivalent) exists with a single command that runs the full quality pipeline matching CI. +**Common vibe-code state:** No Makefile or automation — developer must remember each command individually. + +## 10. Dependency Audit + +**What to check:** Run `pnpm audit` or `npm audit` and list CVEs by severity. +**How to evaluate:** Run `pnpm audit` (or `npm audit`) and capture the output. Count vulnerabilities by severity (critical, high, moderate, low). For critical and high CVEs, note the affected package and whether a fix is available. Check if `pnpm audit --fix` resolves any issues. If audit fails due to lockfile issues, note that as a separate problem. +**Pass criteria:** Zero critical or high vulnerabilities. Moderate/low are acceptable if no fix is available. +**Common vibe-code state:** Multiple critical/high vulnerabilities from outdated transitive dependencies, audit never run. + +## 11. Dependency Freshness + +**What to check:** Check if any dependency was published less than 1 week ago. +**How to evaluate:** For each dependency in `package.json` (both `dependencies` and `devDependencies`), run `npm view time --json` and check the publish date of the installed version. Flag any dependency whose installed version was published less than 1 week ago (7 days from today). Very new packages may have undiscovered bugs or supply chain risks. +**Pass criteria:** No dependencies published in the last 7 days. If found, recommend pinning to the previous version or waiting. +**Common vibe-code state:** Dependencies installed via `latest` tag without checking publish date, possibly pulling in a compromised or broken release. + +## 12. Dependency Upgrades + +**What to check:** Check for outdated deps that have newer stable versions published more than 1 week ago. +**How to evaluate:** Run `pnpm outdated` (or `npm outdated`). For each outdated dependency, check if the newer version is a major/minor/patch bump. For major bumps, check the changelog for breaking changes. Focus on: security-related updates, dependencies with known issues, and dependencies more than 1 major version behind. Ignore pre-release versions. +**Pass criteria:** All dependencies within 1 minor version of latest stable (published >7 days ago). No outdated dependencies with known security fixes. +**Common vibe-code state:** Many dependencies several major versions behind, locked to old versions with no explanation. + +## 13. Logger + +**What to check:** Is there structured logging (pino, winston)? Or just `console.log` scattered around? +**How to evaluate:** Search for `console.log`, `console.error`, `console.warn` usage in source code (not test files). Check if a logging library is installed (`pino`, `winston`, `bunyan`). If a logger exists, check whether it's used consistently or if `console.log` still appears alongside it. Check if the logger supports structured output (JSON format), log levels, and context attachment. +**Pass criteria:** Structured logger configured and used consistently. No `console.log` in production source code. Logger supports JSON output and log levels. +**Common vibe-code state:** `console.log` everywhere, no structured logging, no log levels, debug output left in production code. + +## 14. .gitignore + Editor Configs + +**What to check:** Does `.gitignore` cover `node_modules`, `dist`, `.env`, `coverage`? Does `.editorconfig` or `.vscode/settings.json` exist? +**How to evaluate:** Read `.gitignore` and verify it includes: `node_modules/`, `dist/` (or build output dir), `.env` (and `.env.*` variants), `coverage/`, `*.log`, `.DS_Store`. Check for `.editorconfig` with consistent settings (indent style, indent size, end of line, charset). Check for `.vscode/settings.json` with workspace-specific settings. Verify that no ignored files are actually tracked (`git ls-files .env node_modules`). +**Pass criteria:** `.gitignore` covers all standard entries. Editor config exists for consistency. No ignored files accidentally tracked. +**Common vibe-code state:** Minimal `.gitignore` missing `.env` or `coverage`, no `.editorconfig`, `node_modules` or `.env` accidentally committed. + +## 15. README Quality + +**What to check:** Does the README have: project description, install instructions, usage/API docs, environment setup, examples? +**How to evaluate:** Read `README.md`. Check for these sections (or equivalent content): (1) project description — what it does, who it's for; (2) installation instructions — prerequisites, install command; (3) usage — basic API examples or getting started guide; (4) environment setup — required env vars, how to configure; (5) examples — runnable code snippets or link to examples directory. Check if the README is up-to-date with the current codebase (e.g., referenced commands actually work). +**Pass criteria:** All five sections present and accurate. README reflects current state of the project. +**Common vibe-code state:** Default `create-react-app` or `npm init` README, or a README with only the project name, or completely outdated instructions. + +## 16. Subpath Exports / Package Design + +**What to check:** If it's a library, does `package.json` have an `exports` field? `sideEffects`? `files` field? +**How to evaluate:** Read `package.json` and check for: `exports` field (maps subpath imports to source/dist files), `files` field (limits what gets published to npm), `sideEffects` (for tree-shaking), `main` and `types` fields (for backwards compatibility). If `exports` exists, verify each entry points to a real file. Check if the `files` field excludes test files, examples, and dev configs from the published package. If it's not a library (pure application), this check may not apply. +**Pass criteria:** For libraries: `exports` field properly configured, `files` limits published content, `types` field set. For applications: mark as N/A. +**Common vibe-code state:** No `exports` field, everything published via default (including tests and dev files), no `files` field. + +## 17. Spec Compliance + +**What to check:** If a spec or standard was provided (e.g., RFC, OpenAPI doc, protocol spec), read it and cross-reference against the codebase. List deviations. +**How to evaluate:** If the user provides a spec document or reference, read it thoroughly. Map each requirement in the spec to its implementation in the codebase. Check for: missing endpoints/methods, incorrect field names, wrong data types, missing validation rules, deviations from the spec's error handling requirements. List each deviation with the spec reference and the actual implementation. +**Pass criteria:** All spec requirements implemented correctly. No undocumented deviations. +**Common vibe-code state:** Partial implementation of the spec, field names that don't match, missing required behaviors, no awareness that a spec exists. If no spec is provided, mark as N/A. + +## 18. CORS Evaluation (optional) + +**What to check:** Does the app serve cross-origin requests? Is CORS middleware configured? Should it be? +**How to evaluate:** Check if the application is an API server that will be called from browsers (e.g., a frontend app on a different origin). Search for `cors` middleware installation and configuration. If CORS is configured, check: allowed origins (should not be `*` in production), allowed methods, allowed headers, credentials setting. If the app is server-to-server only (e.g., an MPP SDK), CORS may not be needed — note this. +**Pass criteria:** If cross-origin access is needed: CORS configured with specific origins, not wildcard. If not needed: CORS absent or disabled (not misconfigured). Mark as N/A for libraries or server-to-server apps. +**Common vibe-code state:** `cors()` with no options (allows everything), or CORS missing when the frontend needs it, or CORS enabled unnecessarily. + +## 19. Env Validation + +**What to check:** Search for raw `process.env` usage. Check if there's a centralized env config. Check for missing env vars that would crash at runtime. +**How to evaluate:** Grep for `process.env.` across all source files (not just test files). Check if env vars are accessed directly in business logic or if there's a centralized env validation module. If centralized: verify it validates all required vars at startup (fail-fast). Check for type coercion (e.g., `process.env.PORT` is a string, not a number). Look for optional vars that have sensible defaults. Verify that a `.env.example` or documentation lists all required vars. +**Pass criteria:** Centralized env validation at startup, all required vars validated with correct types, fail-fast on missing vars, `.env.example` exists documenting all vars. +**Common vibe-code state:** `process.env.THING` scattered throughout codebase, no validation, app crashes deep in a request handler when an env var is missing, no `.env.example`. diff --git a/.claude/skills/productionalize-node/references/phase-1-foundation.md b/.claude/skills/productionalize-node/references/phase-1-foundation.md new file mode 100644 index 0000000..a333425 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-1-foundation.md @@ -0,0 +1,128 @@ +# Phase 1: Foundation + +No dependencies. This phase establishes the project's structural foundation. + +## 1a: TypeScript Strict Migration + +1. Check if `tsconfig.json` exists. If not, run `npx tsc --init`. +2. Enable strict mode and recommended flags: + ```json + { + "compilerOptions": { + "strict": true, + "noImplicitReturns": true, + "isolatedModules": true, + "esModuleInterop": true, + "declaration": true, + "declarationMap": true, + "sourceMap": true, + "target": "ES2022", + "module": "NodeNext", + "moduleResolution": "NodeNext", + "outDir": "dist", + "rootDir": "src" + }, + "include": ["src"], + "exclude": ["node_modules", "dist", "**/*.test.ts"] + } + ``` +3. Fix all type errors introduced by strict mode. Common fixes: + - Add explicit return types to functions + - Handle null/undefined with narrowing or optional chaining + - Replace `any` with proper types + - Add missing type annotations to function parameters +4. Ensure `"type": "module"` is in package.json for ESM projects. +5. Verify: `pnpm run check:types` passes with zero errors. + +## 1b: Package.json Scripts + Makefile + +1. Ensure these scripts exist in package.json: + ```json + { + "scripts": { + "build": "tsc", + "check:types": "tsc --noEmit", + "test": "vitest", + "lint": "eslint .", + "format:check": "prettier --check .", + "format": "prettier --write .", + "prepare": "tsc" + } + } + ``` + Adjust based on confirmed tooling (e.g., if user chose Jest over Vitest). + +2. Create a `Makefile` that mirrors the CI pipeline: + ```makefile + .PHONY: help check format lint types test build deps-audit deps-freshness + + help: ## Show all targets + @grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}' + + check: format lint types test build deps-audit deps-freshness ## Run full quality pipeline (mirrors CI) + + format: ## Check formatting + pnpm run format:check + + lint: ## Run linter + pnpm run lint + + types: ## Type-check + pnpm run check:types + + test: ## Run tests + pnpm test -- --run + + build: ## Build + pnpm run build + + deps-audit: ## Audit dependencies for CVEs + pnpm audit --audit-level=moderate + + deps-freshness: ## Check no dependency is newer than 1 week + @node scripts/check-deps-freshness.js + ``` + +3. If a `scripts/check-deps-freshness.js` doesn't exist, create it (see Phase 4 for the script). + +## 1c: .gitignore + Editor Configs + +1. Ensure `.gitignore` covers at minimum: + ``` + node_modules/ + dist/ + coverage/ + .env + .env.* + !.env.example + *.log + .DS_Store + ``` + +2. Create `.editorconfig` if missing: + ```ini + root = true + + [*] + indent_style = space + indent_size = 2 + end_of_line = lf + charset = utf-8 + trim_trailing_whitespace = true + insert_final_newline = true + ``` + +3. If the project uses VS Code, ensure `.vscode/settings.json` has: + ```json + { + "editor.formatOnSave": true, + "editor.defaultFormatter": "esbenp.prettier-vscode" + } + ``` + +## Verification + +After completing all sub-phases: +- `pnpm run check:types` passes +- `pnpm run build` succeeds +- All source files are properly gitignored diff --git a/.claude/skills/productionalize-node/references/phase-2-quality-infra.md b/.claude/skills/productionalize-node/references/phase-2-quality-infra.md new file mode 100644 index 0000000..307bbc2 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-2-quality-infra.md @@ -0,0 +1,84 @@ +# Phase 2: Quality Infrastructure + +Depends on Phase 1 (Foundation). + +## 2a: Linting + Formatting Setup + +### ESLint (flat config) + +1. Install: `pnpm add -D eslint @eslint/js typescript-eslint` +2. Create `eslint.config.mjs`: + ```js + import eslint from '@eslint/js' + import tseslint from 'typescript-eslint' + + export default tseslint.config( + eslint.configs.recommended, + ...tseslint.configs.recommended, + { + rules: { + '@typescript-eslint/no-unused-vars': ['error', { argsIgnorePattern: '^_', varsIgnorePattern: '^_' }], + '@typescript-eslint/no-explicit-any': 'warn', + }, + }, + { + ignores: ['dist/', 'coverage/', 'node_modules/', '**/*.test.ts'], + } + ) + ``` +3. Adjust rules based on the project's needs. Don't be overly strict on a first pass — `warn` for things the team can tighten later. + +### Prettier + +1. Install: `pnpm add -D prettier` +2. Create `.prettierrc`: + ```json + { + "semi": false, + "singleQuote": true, + "trailingComma": "all", + "printWidth": 100, + "tabWidth": 2 + } + ``` + Adjust to match existing code style if the project already has a dominant pattern. +3. Create `.prettierignore`: + ``` + dist/ + coverage/ + pnpm-lock.yaml + ``` + +### Run and fix + +1. Run `pnpm run format -- --write .` to auto-format all files. +2. Run `pnpm run lint -- --fix` to auto-fix lint issues. +3. Fix remaining lint errors manually. +4. Verify: `pnpm run format:check` and `pnpm run lint` both pass with zero errors. + +## 2b: Logger Setup + +1. Check if the project already uses a logger. If yes, keep it. +2. If only `console.log` is used: + - Install pino: `pnpm add pino` + - Create a logger module (e.g., `src/logger.ts`): + ```typescript + import pino from 'pino' + + export const logger = pino({ + level: process.env.LOG_LEVEL ?? 'info', + }) + ``` + - Replace `console.log` calls with appropriate logger levels: + - `console.log` → `logger.info` + - `console.error` → `logger.error` + - `console.warn` → `logger.warn` + - Debug/trace output → `logger.debug` +3. Do NOT replace console.log in test files or CLI scripts where stdout is the interface. +4. Verify: application starts and produces structured JSON logs. + +## Verification + +- `pnpm run lint` passes with zero errors +- `pnpm run format:check` passes +- Application logs are structured JSON (if logger was set up) diff --git a/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3-hardening.md b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3-hardening.md new file mode 100644 index 0000000..07172d3 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3-hardening.md @@ -0,0 +1,28 @@ +# Phase 3: Hardening + +Depends on Phase 1 (Foundation). Can run in parallel with Phases 2 and 4. + +This phase hardens the application against errors, invalid input, and security threats. + +## Sub-phases + +Execute in order (each builds on the previous): + +1. Read and execute `phase-3.1-error-handling.md` +2. Read and execute `phase-3.2-input-validation.md` +3. Read and execute `phase-3.3-security.md` +4. Read and execute `phase-3.4-env-validation.md` +5. Read and execute `phase-3.5-cors-evaluation.md` (only if user opted in) + +## Capability-based sub-skill integration + +If a security review skill is available, invoke it after completing sub-phases 3.3 and 3.4 to validate the security hardening. Otherwise, use the inline checklists in each sub-phase file. + +## Verification + +After all sub-phases complete: +- `pnpm run check:types` passes +- `pnpm run build` succeeds +- Application starts and handles errors gracefully (no unhandled rejections) +- No hardcoded secrets in source code +- All env vars validated at startup diff --git a/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.1-error-handling.md b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.1-error-handling.md new file mode 100644 index 0000000..1a8f2a4 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.1-error-handling.md @@ -0,0 +1,97 @@ +# Phase 3.1: Error Handling + +## Goal + +Replace ad-hoc error handling with a structured error hierarchy and consistent patterns. + +## Steps + +### 1. Create a custom error hierarchy + +Create `src/errors.ts` (or similar, matching project structure): + +```typescript +export class AppError extends Error { + readonly statusCode: number + readonly details: Record + + constructor(message: string, statusCode = 500, details: Record = {}) { + super(message) + this.name = this.constructor.name + this.statusCode = statusCode + this.details = details + } +} + +export class ValidationError extends AppError { + constructor(message: string, details: Record = {}) { + super(message, 400, details) + } +} + +export class NotFoundError extends AppError { + constructor(resource: string, id?: string) { + super(`${resource}${id ? ` '${id}'` : ''} not found`, 404, { resource, id }) + } +} + +export class UnauthorizedError extends AppError { + constructor(message = 'Unauthorized') { + super(message, 401) + } +} +``` + +Adapt the hierarchy to the project's domain. Don't over-engineer — start with 3-5 error classes that cover the actual error cases in the codebase. + +### 2. Add Express error handler (if applicable) + +```typescript +import type { ErrorRequestHandler } from 'express' + +export const errorHandler: ErrorRequestHandler = (err, _req, res, _next) => { + const statusCode = err instanceof AppError ? err.statusCode : 500 + const message = err instanceof AppError ? err.message : 'Internal Server Error' + + // Log the full error for debugging + logger.error({ err, statusCode }, message) + + // Don't leak internal errors to clients + res.status(statusCode).json({ + error: { message, ...(err instanceof AppError ? err.details : {}) }, + }) +} +``` + +### 3. Scan and fix error handling patterns + +Search the codebase for these anti-patterns and fix them: + +- **Bare `catch` with `console.log`**: Replace with logger + proper error propagation +- **Empty catch blocks**: At minimum log the error. If truly ignorable, add a comment explaining why. +- **`catch (e: any)`**: Type the error properly or use `unknown` and narrow +- **Missing async error handling**: Ensure all async route handlers catch errors (use express-async-errors or wrap handlers) +- **Unhandled promise rejections**: Add `process.on('unhandledRejection', ...)` in the entry point + +### 4. Add process-level handlers + +In the application entry point: + +```typescript +process.on('unhandledRejection', (reason) => { + logger.error({ reason }, 'Unhandled promise rejection') + process.exit(1) +}) + +process.on('uncaughtException', (err) => { + logger.error({ err }, 'Uncaught exception') + process.exit(1) +}) +``` + +## Verification + +- No bare `console.error` or `console.log` in catch blocks (use logger) +- All Express routes have error handling +- Custom error classes are used for domain-specific errors +- Process-level handlers catch unhandled rejections diff --git a/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.2-input-validation.md b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.2-input-validation.md new file mode 100644 index 0000000..f02c458 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.2-input-validation.md @@ -0,0 +1,90 @@ +# Phase 3.2: Input Validation + +## Goal + +Validate all external input at system boundaries using Zod schemas. + +## Steps + +### 1. Install Zod + +```bash +pnpm add zod +``` + +### 2. Identify validation boundaries + +External input enters the system through: +- **API request bodies** (`req.body`) +- **API query parameters** (`req.query`) +- **API path parameters** (`req.params`) +- **Webhook payloads** +- **File uploads / external file reads** +- **CLI arguments** + +List all entry points in the codebase. + +### 3. Create Zod schemas for each boundary + +For each entry point, create a schema that validates the expected shape: + +```typescript +import { z } from 'zod' + +export const CreateUserSchema = z.object({ + email: z.string().email(), + name: z.string().min(1).max(255), + role: z.enum(['admin', 'user']).default('user'), +}) + +export type CreateUserInput = z.infer +``` + +Place schemas near the code that uses them (colocated), or in a `schemas/` directory if the project has many. + +### 4. Add validation middleware (Express) + +```typescript +import type { Request, Response, NextFunction } from 'express' +import type { ZodSchema } from 'zod' + +export function validate(schema: ZodSchema) { + return (req: Request, _res: Response, next: NextFunction) => { + const result = schema.safeParse(req.body) + if (!result.success) { + throw new ValidationError('Invalid request body', { + issues: result.error.issues, + }) + } + req.body = result.data + next() + } +} +``` + +Apply to routes: +```typescript +app.post('/users', validate(CreateUserSchema), createUser) +``` + +### 5. Validate path and query params too + +Don't just validate body — query params and path params are strings by default and need coercion: + +```typescript +const PaginationSchema = z.object({ + page: z.coerce.number().int().positive().default(1), + limit: z.coerce.number().int().positive().max(100).default(20), +}) +``` + +### 6. Internal trust boundary + +Do NOT add Zod validation to internal function calls between your own modules. Validation belongs at the system boundary only. Internal code can trust the types. + +## Verification + +- Every API endpoint validates its input with a Zod schema +- No raw `req.body` access without prior validation +- Validation errors return 400 with descriptive messages +- Internal code does not over-validate diff --git a/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.3-security.md b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.3-security.md new file mode 100644 index 0000000..c9ae5ce --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.3-security.md @@ -0,0 +1,97 @@ +# Phase 3.3: Security Hardening + +## Goal + +Protect the application against common web vulnerabilities (OWASP Top 10). + +## Steps + +### 1. Security headers (Express) + +Install and configure helmet: + +```bash +pnpm add helmet +``` + +```typescript +import helmet from 'helmet' +app.use(helmet()) +``` + +For non-Express frameworks, add equivalent headers manually or use the framework's security plugin. + +### 2. Rate limiting + +Install and configure rate limiting: + +```bash +pnpm add express-rate-limit +``` + +```typescript +import rateLimit from 'express-rate-limit' + +const limiter = rateLimit({ + windowMs: 15 * 60 * 1000, // 15 minutes + limit: 100, + standardHeaders: 'draft-7', + legacyHeaders: false, +}) + +app.use(limiter) +``` + +Consider stricter limits for auth endpoints (login, password reset). + +### 3. Secrets audit + +Search the entire codebase for hardcoded secrets: + +```bash +# Patterns to search for +grep -rn "password\s*=" --include="*.ts" --include="*.js" . +grep -rn "api[_-]?key\s*=" --include="*.ts" --include="*.js" . +grep -rn "secret\s*=" --include="*.ts" --include="*.js" . +grep -rn "token\s*=" --include="*.ts" --include="*.js" . +grep -rn "Bearer " --include="*.ts" --include="*.js" . +``` + +Move all secrets to environment variables. Create a `.env.example` with placeholder values. + +### 4. Dependency injection for secrets + +Never import secrets directly. Pass them through configuration: + +```typescript +// Bad +const apiKey = process.env.API_KEY // scattered across files + +// Good — centralized in env.ts (Phase 3.4 will formalize this) +export const config = { apiKey: env.API_KEY } +``` + +### 5. Security checklist (inline fallback) + +If no security review skill is available, manually verify: + +- [ ] No hardcoded secrets in source code +- [ ] All secrets loaded from environment variables +- [ ] `.env` is in `.gitignore` +- [ ] `.env.example` exists with placeholder values +- [ ] helmet (or equivalent) is configured +- [ ] Rate limiting is configured on public endpoints +- [ ] No SQL/NoSQL injection vectors (parameterized queries) +- [ ] No XSS vectors (output encoding, CSP headers) +- [ ] No command injection (avoid `exec`, `spawn` with user input) +- [ ] HTTPS enforced in production (or documented as a deployment concern) +- [ ] Auth tokens have expiry +- [ ] Sensitive data not logged (passwords, tokens, PII) + +## Verification + +- `grep` for hardcoded secrets returns zero results +- helmet middleware is active +- Rate limiting is configured +- `.env.example` exists +- Security checklist items pass diff --git a/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.4-env-validation.md b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.4-env-validation.md new file mode 100644 index 0000000..b69640c --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.4-env-validation.md @@ -0,0 +1,86 @@ +# Phase 3.4: Env Validation + +## Goal + +Replace raw `process.env` access with type-safe, validated environment configuration using t3-env. + +## Steps + +### 1. Install t3-env + +```bash +pnpm add @t3-oss/env-core zod +``` + +(Zod should already be installed from Phase 3.2.) + +### 2. Inventory all env var usage + +Search the codebase for all `process.env` references: + +```bash +grep -rn "process\.env\." --include="*.ts" --include="*.js" src/ +``` + +List every env var used, its expected type, and whether it's required or has a default. + +### 3. Create env configuration + +Create `src/env.ts`: + +```typescript +import { createEnv } from '@t3-oss/env-core' +import { z } from 'zod' + +export const env = createEnv({ + server: { + NODE_ENV: z.enum(['development', 'production', 'test']).default('development'), + PORT: z.coerce.number().default(3000), + DATABASE_URL: z.string().url(), + API_KEY: z.string().min(1), + LOG_LEVEL: z.enum(['debug', 'info', 'warn', 'error']).default('info'), + }, + runtimeEnv: process.env, +}) +``` + +Adapt the schema to match the actual env vars discovered in step 2. Only include vars the application actually uses. + +### 4. Replace all process.env usage + +Replace every `process.env.X` with `env.X`. This gives you: +- Type safety (autocomplete, compile-time errors) +- Runtime validation (app crashes immediately with a clear error if env is misconfigured) +- Single source of truth for all configuration + +### 5. Update .env.example + +Ensure `.env.example` lists all required env vars with placeholder values and comments: + +```env +# Server +NODE_ENV=development +PORT=3000 + +# Database +DATABASE_URL=postgresql://user:password@localhost:5432/mydb + +# API +API_KEY=your-api-key-here +``` + +### 6. Add startup validation + +t3-env validates on import, so the app will crash at startup if required vars are missing. Verify this works: + +```bash +# Should crash with clear error message about missing DATABASE_URL +unset DATABASE_URL && node dist/index.js +``` + +## Verification + +- Zero `process.env` references outside of `src/env.ts` +- `env.ts` has Zod schemas for every env var +- App crashes with clear error if required env var is missing +- `.env.example` is complete and up to date diff --git a/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.5-cors-evaluation.md b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.5-cors-evaluation.md new file mode 100644 index 0000000..2af70f2 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-3-hardening/phase-3.5-cors-evaluation.md @@ -0,0 +1,69 @@ +# Phase 3.5: CORS Evaluation (Optional) + +Only execute this sub-phase if the user opted in during Phase 0. + +## Goal + +Evaluate whether the application needs CORS configuration, and set it up correctly if so. + +## Decision Criteria + +CORS is needed if ANY of these are true: +- The API is consumed by a browser-based frontend on a different origin +- The API is a public API that third-party websites may call +- The app serves both a frontend and an API from different origins/ports + +CORS is NOT needed if: +- The API is only consumed by server-side clients (other services, CLI tools) +- The frontend and API are served from the same origin +- The app is behind a reverse proxy that handles CORS + +## Steps + +### 1. Analyze the application + +- Check if there are any CORS-related headers or middleware already +- Check the README or docs for information about how the API is consumed +- Look for frontend code or references to a frontend +- Ask the user if unclear: "Is this API consumed by a browser-based frontend on a different origin?" + +### 2. If CORS is needed + +Install and configure: + +```bash +pnpm add cors +pnpm add -D @types/cors +``` + +```typescript +import cors from 'cors' + +app.use(cors({ + origin: env.CORS_ORIGIN, // Add to env.ts + methods: ['GET', 'POST', 'PUT', 'DELETE', 'PATCH'], + allowedHeaders: ['Content-Type', 'Authorization'], + credentials: true, + maxAge: 86400, // 24 hours preflight cache +})) +``` + +Add `CORS_ORIGIN` to `env.ts` and `.env.example`. + +### 3. If CORS is NOT needed + +Document the decision: +```typescript +// CORS is intentionally not configured. +// This API is consumed only by server-side clients. +``` + +### 4. Present recommendation to user + +Whether CORS is needed or not, present your finding and recommendation. Let the user confirm before making changes. + +## Verification + +- CORS decision is documented (either configured or explicitly not needed) +- If configured: only the intended origins are allowed (no wildcard `*` in production) +- If configured: credentials handling matches the auth strategy diff --git a/.claude/skills/productionalize-node/references/phase-4-dependencies.md b/.claude/skills/productionalize-node/references/phase-4-dependencies.md new file mode 100644 index 0000000..c27358c --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-4-dependencies.md @@ -0,0 +1,109 @@ +# Phase 4: Dependencies + +Depends on Phase 1 (Foundation). + +## 4a: Dependency Audit (CVEs) + +1. Run `pnpm audit` (or `npm audit` if using npm). +2. For each vulnerability: + - **Critical/High**: Fix immediately. Upgrade the dependency or find an alternative. + - **Moderate**: Fix if a patch is available. Document if no fix exists. + - **Low**: Document and move on. +3. If a dependency has no fix available, evaluate: + - Can it be replaced with an alternative? + - Can the vulnerable code path be avoided? + - Document the risk in the plan. +4. Run `pnpm audit` again to verify zero critical/high CVEs. + +## 4b: Dependency Upgrades + +Upgrade dependencies to the latest stable version that is NOT newer than 1 week old. + +1. List outdated dependencies: `pnpm outdated` (or `npm outdated`). +2. For each outdated dependency: + - Check publish date: `npm view time --json` + - Only upgrade if the latest version was published MORE than 7 days ago. + - Check the changelog/release notes for breaking changes. +3. Upgrade one at a time (or in small batches of related deps). +4. After each upgrade batch, run the quality gates: + ```bash + pnpm run check:types + pnpm run lint + pnpm test -- --run + pnpm run build + ``` +5. If an upgrade breaks something, either fix the breakage or skip that upgrade and document why. + +## 4c: Dependency Freshness Gate + +Create a CI + Makefile check that rejects dependencies newer than 1 week. + +1. Create `scripts/check-deps-freshness.js`: + ```javascript + #!/usr/bin/env node + + /** + * Checks that no dependency in package.json was published less than 7 days ago. + * This protects against supply chain attacks via compromised fresh packages. + */ + + import { execSync } from 'node:child_process' + import { readFileSync } from 'node:fs' + + const pkg = JSON.parse(readFileSync('package.json', 'utf8')) + const allDeps = { + ...pkg.dependencies, + ...pkg.devDependencies, + } + + const ONE_WEEK_MS = 7 * 24 * 60 * 60 * 1000 + const now = Date.now() + const violations = [] + + for (const [name, version] of Object.entries(allDeps)) { + try { + const raw = execSync(`npm view ${name} time --json`, { encoding: 'utf8', stdio: ['pipe', 'pipe', 'pipe'] }) + const times = JSON.parse(raw) + const resolvedVersion = version.replace(/^[\^~>=<]*/g, '') + const publishDate = times[resolvedVersion] + if (publishDate) { + const age = now - new Date(publishDate).getTime() + if (age < ONE_WEEK_MS) { + const days = Math.floor(age / (24 * 60 * 60 * 1000)) + violations.push(`${name}@${resolvedVersion} — published ${days} day(s) ago`) + } + } + } catch { + // Skip packages that can't be looked up (private, local, etc.) + } + } + + if (violations.length > 0) { + console.error('Dependencies newer than 1 week detected:') + violations.forEach((v) => console.error(` - ${v}`)) + process.exit(1) + } else { + console.log('All dependencies are older than 1 week.') + } + ``` + +2. Make it executable: `chmod +x scripts/check-deps-freshness.js` + +3. Add to package.json scripts: + ```json + "check:deps-freshness": "node scripts/check-deps-freshness.js" + ``` + +4. The Makefile target was already added in Phase 1b (`deps-freshness`). + +5. Add to CI pipeline (Phase 6 will wire this in, but document the step here): + ```yaml + - name: Check dependency freshness + run: pnpm run check:deps-freshness + ``` + +## Verification + +- `pnpm audit` shows zero critical/high CVEs +- `pnpm run check:deps-freshness` passes +- All quality gates still pass after upgrades diff --git a/.claude/skills/productionalize-node/references/phase-5-testing.md b/.claude/skills/productionalize-node/references/phase-5-testing.md new file mode 100644 index 0000000..e0c81b6 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-5-testing.md @@ -0,0 +1,144 @@ +# Phase 5: Testing + +Depends on Phase 2 (Quality Infra) and Phase 3 (Hardening). + +## 5a: Test Setup + +### 1. Install Vitest (or confirmed framework) + +```bash +pnpm add -D vitest @vitest/coverage-v8 +``` + +### 2. Configure Vitest + +Create or update `vitest.config.ts`: + +```typescript +import { defineConfig } from 'vitest/config' + +export default defineConfig({ + test: { + globals: true, + coverage: { + provider: 'v8', + reporter: ['text', 'lcov', 'html'], + include: ['src/**/*.ts'], + exclude: ['src/**/*.test.ts', 'src/**/*.d.ts'], + }, + }, +}) +``` + +### 3. Verify test infrastructure + +Create a smoke test to verify the setup works: + +```typescript +// src/smoke.test.ts +import { describe, it, expect } from 'vitest' + +describe('test setup', () => { + it('works', () => { + expect(true).toBe(true) + }) +}) +``` + +Run: `pnpm test -- --run`. Delete the smoke test after verifying. + +### 4. Colocate tests + +Follow the convention of placing test files next to source files: +``` +src/ + users/ + users.ts + users.test.ts + orders/ + orders.ts + orders.test.ts +``` + +## 5b: Write Tests to Reach Coverage Target + +### 1. Run coverage baseline + +```bash +pnpm test -- --run --coverage +``` + +Record the current coverage percentage. This is the baseline. + +### 2. Prioritize what to test + +Focus on the highest-value tests first: + +1. **Public API / route handlers** — these are the system boundary, most likely to break +2. **Business logic / domain functions** — core algorithms and decision-making +3. **Error paths** — verify error handling works (custom errors, validation rejection) +4. **Edge cases** — null/undefined inputs, empty arrays, boundary values +5. **Integration points** — database queries, external API calls (mock these) + +Do NOT waste time testing: +- Simple getters/setters +- Framework boilerplate +- Third-party library internals +- Trivial type conversions + +### 3. Writing tests + +If a TDD workflow skill is available, invoke it for writing new tests. Otherwise, follow this pattern: + +For each module: +1. Write a test that exercises the happy path +2. Write tests for each error/validation path +3. Write tests for edge cases +4. Mock external dependencies (database, APIs, file system) + +Test structure: +```typescript +describe('createUser', () => { + it('creates a user with valid input', async () => { + // Arrange + const input = { email: 'test@example.com', name: 'Test' } + // Act + const result = await createUser(input) + // Assert + expect(result).toMatchObject({ email: 'test@example.com' }) + }) + + it('throws ValidationError for invalid email', async () => { + await expect(createUser({ email: 'invalid', name: 'Test' })) + .rejects.toThrow(ValidationError) + }) + + it('throws NotFoundError when referenced resource missing', async () => { + // ... + }) +}) +``` + +### 4. Reach coverage target + +Run coverage after each batch of tests: +```bash +pnpm test -- --run --coverage +``` + +Keep writing tests until the confirmed coverage target is reached. Focus on the uncovered files/lines shown in the coverage report. + +### 5. Coverage in CI + +Add coverage flag to the test script for CI (Phase 6 will wire this): +```json +"test:ci": "vitest run --coverage --coverage.reporter=text" +``` + +## Verification + +- `pnpm test -- --run` passes with zero failures +- `pnpm test -- --run --coverage` shows ≥ target coverage +- Tests cover: happy paths, error paths, edge cases +- No flaky tests (run twice to confirm) +- Test files are colocated with source diff --git a/.claude/skills/productionalize-node/references/phase-6-ci.md b/.claude/skills/productionalize-node/references/phase-6-ci.md new file mode 100644 index 0000000..b90110e --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-6-ci.md @@ -0,0 +1,116 @@ +# Phase 6: CI/CD + +Depends on Phase 2 (Quality Infra) and Phase 4 (Dependencies). + +## 6a: GitHub Actions Pipeline + +### 1. Create workflow file + +Create `.github/workflows/ci.yml`: + +```yaml +name: CI + +on: + push: + branches: [main] + pull_request: + +concurrency: + group: ${{ github.workflow }}-${{ github.head_ref || github.ref }} + cancel-in-progress: true + +permissions: + contents: read + +jobs: + check: + name: Check + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - uses: pnpm/action-setup@v4 + + - uses: actions/setup-node@v4 + with: + node-version: 22 + cache: pnpm + + - name: Install dependencies + run: pnpm install --frozen-lockfile + + - name: Check formatting + run: pnpm run format:check + + - name: Lint + run: pnpm run lint + + - name: Type check + run: pnpm run check:types + + - name: Test + run: pnpm test -- --run --coverage --coverage.reporter=text + + - name: Build + run: pnpm run build + + - name: Audit dependencies + run: pnpm audit --audit-level=moderate + + - name: Check dependency freshness + run: pnpm run check:deps-freshness + + complete: + name: CI Complete + if: always() + needs: [check] + runs-on: ubuntu-latest + steps: + - name: Check result + run: | + if [ "${{ needs.check.result }}" != "success" ]; then + echo "CI failed" + exit 1 + fi +``` + +### 2. Adapt to project + +- If the project uses npm instead of pnpm, replace `pnpm` commands and remove `pnpm/action-setup` +- If the project uses yarn, use `yarn --frozen-lockfile` and the appropriate cache setup +- Adjust Node.js version to match the project's minimum supported version +- If the project has a `pnpm-lock.yaml`, ensure `--frozen-lockfile` is used +- If the project doesn't use pnpm, remove the `pnpm/action-setup` step + +### 3. Pipeline mirrors local + +The CI pipeline must run the exact same checks as `make check` (or the local equivalent). Verify the order matches: + +1. Format check +2. Lint +3. Type check +4. Test (with coverage) +5. Build +6. Dependency audit +7. Dependency freshness + +### 4. Sentinel job + +The `complete` job acts as a required status check. Configure branch protection to require this job to pass before merging to main. + +### 5. Security best practices for CI + +- Pin action versions to full SHAs (e.g., `actions/checkout@11bd7190...`) for supply chain security. At minimum use version tags. +- Use `permissions: contents: read` (least privilege) +- Use `--frozen-lockfile` to prevent lockfile manipulation +- Enable `cancel-in-progress` to save runner minutes +- Never store secrets in workflow files — use GitHub Secrets + +## Verification + +- `.github/workflows/ci.yml` exists and is valid YAML +- Pipeline runs the same checks in the same order as `make check` +- `complete` sentinel job fails if any upstream job fails +- Permissions are minimized +- Lockfile is frozen in CI diff --git a/.claude/skills/productionalize-node/references/phase-7-documentation.md b/.claude/skills/productionalize-node/references/phase-7-documentation.md new file mode 100644 index 0000000..5945d0c --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-7-documentation.md @@ -0,0 +1,123 @@ +# Phase 7: Documentation + +Depends on all previous phases. This is the last phase before final review. + +## 7a: README Quality Check + Fixes + +### 1. Evaluate the existing README + +Check for these sections (all required for production): + +- [ ] **Project title and description** — one-paragraph summary of what the project does +- [ ] **Installation** — exact commands to install and set up +- [ ] **Quick start / Usage** — minimal example to get running +- [ ] **Environment variables** — table of all env vars, types, defaults, descriptions (must match `env.ts`) +- [ ] **API documentation** — if it's an API, document endpoints, request/response shapes +- [ ] **Scripts / Commands** — table of available npm scripts and what they do +- [ ] **Development** — how to set up for development, run tests, lint +- [ ] **Architecture** (optional but recommended) — high-level overview for contributors +- [ ] **License** — license type and file reference + +### 2. Fix gaps + +For each missing section, add it. Pull information from: +- `package.json` (scripts, description, license) +- `env.ts` (environment variables) +- Source code (API routes, exported functions) +- Existing inline comments + +### 3. Verify accuracy + +Every command in the README must actually work: +- Copy each code block and run it +- Verify install instructions produce a working setup +- Verify the quick start example runs + +### 4. Remove stale content + +- Remove references to deleted features or old APIs +- Remove TODO placeholders +- Remove "coming soon" sections with no content + +## 7b: CHANGELOG + Commit Conventions (Optional) + +Only execute if user opted in during Phase 0. + +### 1. CHANGELOG + +Create `CHANGELOG.md` following [Keep a Changelog](https://keepachangelog.com/en/1.1.0/): + +```markdown +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [Unreleased] + +### Added +- TypeScript strict mode +- Vitest test suite with X% coverage +- ESLint + Prettier configuration +- GitHub Actions CI pipeline +- Input validation with Zod +- Environment validation with t3-env +- Custom error hierarchy +- Structured logging with pino +- Security hardening (helmet, rate limiting) +- Dependency freshness gate +``` + +### 2. Commit conventions + +Add a section to the README or a `CONTRIBUTING.md`: + +```markdown +## Commit Convention + +This project uses [Conventional Commits](https://www.conventionalcommits.org/): + +- `feat:` — new feature +- `fix:` — bug fix +- `refactor:` — code change that neither fixes a bug nor adds a feature +- `docs:` — documentation only +- `test:` — adding or updating tests +- `chore:` — maintenance (deps, CI, tooling) +``` + +Do NOT install commitlint or husky unless the user explicitly asks. Keep it convention-based, not enforced. + +## 7c: Spec Compliance Check (if --spec provided) + +### 1. Read the spec + +If `--spec` is a URL, fetch it. If it's a file path, read it. + +### 2. Cross-reference + +For each requirement in the spec: +- Find the corresponding implementation in the codebase +- Verify the behavior matches +- Document any deviations + +### 3. Report + +Present a compliance table: + +``` +| Spec Section | Requirement | Status | Notes | +|---|---|---|---| +| 3.1 | Must return 402 for unpaid requests | PASS | Handled in middleware | +| 3.2 | Must include X-Payment header | FAIL | Header not implemented | +``` + +Fix any FAIL items or document them as known gaps for the user to address. + +## Verification + +- README has all required sections +- All README commands actually work when executed +- CHANGELOG exists and is accurate (if opted in) +- Spec compliance table shows no unresolved FAILs (if --spec) diff --git a/.claude/skills/productionalize-node/references/phase-8-final-review.md b/.claude/skills/productionalize-node/references/phase-8-final-review.md new file mode 100644 index 0000000..4bb6583 --- /dev/null +++ b/.claude/skills/productionalize-node/references/phase-8-final-review.md @@ -0,0 +1,95 @@ +# Phase 8: Final Review + +Depends on all previous phases. This is the last phase. + +## 8a: Code Review + +If a code review skill is available, invoke it with: +- What was implemented: "Productionalization of a vibe-coded Node.js project" +- Plan: reference `docs/vibe-to-production-plan.md` +- Scope: all files changed since the skill started + +### Inline fallback (if no code review skill available) + +Review the codebase for: + +1. **Consistency** — naming conventions, file structure, import patterns are consistent across the project +2. **SOLID violations** — classes/modules doing too much, tight coupling, missing abstractions +3. **Code smells** — duplicated code, long functions (>50 lines), deep nesting (>3 levels), magic numbers +4. **Dead code** — unused imports, unreachable branches, commented-out code +5. **Error handling completeness** — all async paths have error handling, errors propagate with context +6. **Type safety** — no `any` types (except justified cases), no type assertions without explanation +7. **Test quality** — tests are meaningful (not just "it exists"), cover edge cases, use proper assertions + +Fix any issues found. + +## 8b: Security Review + +If a security review skill is available, invoke it with a focus on: +- OWASP Top 10 compliance +- Dependency supply chain security +- Secrets management +- Input validation coverage + +### Inline fallback (if no security review skill available) + +Run through this security checklist: + +- [ ] **Injection** — no SQL/NoSQL/command injection vectors. All queries parameterized. No `eval()`, no `exec()` with user input. +- [ ] **Broken Authentication** — auth tokens have expiry, passwords hashed (bcrypt/argon2), no credentials in logs +- [ ] **Sensitive Data Exposure** — secrets in env vars (not code), .env in .gitignore, no PII in logs, HTTPS enforced +- [ ] **XXE** — XML parsing disabled or configured securely (if applicable) +- [ ] **Broken Access Control** — authorization checks on every protected endpoint +- [ ] **Security Misconfiguration** — helmet configured, debug mode off in production, error messages don't leak internals +- [ ] **XSS** — output encoding, CSP headers via helmet +- [ ] **Insecure Deserialization** — no `JSON.parse` on untrusted input without validation (Zod handles this) +- [ ] **Known Vulnerabilities** — `pnpm audit` clean, no deps newer than 1 week +- [ ] **Insufficient Logging** — security events logged (auth failures, access denied, validation failures) + +Fix any issues found. + +## Completion Criteria + +Evaluate all 13 criteria. ALL must pass before declaring the transformation complete. + +| # | Criterion | How to verify | +|---|---|---| +| 1 | TypeScript strict — zero type errors | `pnpm run check:types` exits 0 | +| 2 | ESLint — zero errors | `pnpm run lint` exits 0 | +| 3 | Prettier — fully formatted | `pnpm run format:check` exits 0 | +| 4 | Tests pass with ≥ target coverage | `pnpm test -- --run --coverage` | +| 5 | Build succeeds | `pnpm run build` exits 0 | +| 6 | No CVEs in dependencies | `pnpm audit --audit-level=moderate` exits 0 | +| 7 | No dependency newer than 1 week | `pnpm run check:deps-freshness` exits 0 | +| 8 | All process.env replaced with validated env | `grep -rn "process\.env\." src/` returns 0 results (except env.ts) | +| 9 | Security review passed | All security checklist items checked | +| 10 | Code review passed | All code review items resolved | +| 11 | README complete | Has install, usage, env vars, scripts, API docs | +| 12 | Spec compliance (if --spec) | Compliance table shows no FAILs | +| 13 | CI pipeline complete | `.github/workflows/ci.yml` runs all gates | + +## Final Report + +Present this table to the user: + +``` +| # | Criterion | Status | Notes | +|---|----------------------------------|-----------|--------------------------------| +| 1 | TypeScript strict | PASS/FAIL | | +| 2 | ESLint — zero errors | PASS/FAIL | | +| 3 | Prettier — fully formatted | PASS/FAIL | | +| 4 | Tests ≥ target coverage | PASS/FAIL | Actual: XX% | +| 5 | Build succeeds | PASS/FAIL | | +| 6 | No CVEs | PASS/FAIL | | +| 7 | No deps newer than 1 week | PASS/FAIL | | +| 8 | Env validation complete | PASS/FAIL | | +| 9 | Security review passed | PASS/FAIL | | +| 10| Code review passed | PASS/FAIL | | +| 11| README complete | PASS/FAIL | | +| 12| Spec compliance | PASS/FAIL | N/A if no spec | +| 13| CI pipeline complete | PASS/FAIL | | +``` + +If any criterion fails, remediate and re-evaluate. Only declare the transformation complete when all items pass. + +The plan document at `docs/vibe-to-production-plan.md` remains in the repo as an artifact of the transformation. diff --git a/.claude/skills/productionalize-node/references/tooling-defaults.md b/.claude/skills/productionalize-node/references/tooling-defaults.md new file mode 100644 index 0000000..de424f8 --- /dev/null +++ b/.claude/skills/productionalize-node/references/tooling-defaults.md @@ -0,0 +1,35 @@ +# Default Tooling Stack + +Present this table to the user for confirmation before installing anything. + +## Proposed Tools + +| Concern | Tool | Why | +|---|---|---| +| Language | TypeScript (strict mode) | Type safety, catch errors at compile time | +| Test framework | Vitest | Fast, native ESM support, compatible with Jest API | +| Linter | ESLint (flat config, v9+) | Industry standard, typescript-eslint for TS support | +| Formatter | Prettier | Consistent formatting, no debates | +| Package manager | pnpm | Fast, disk-efficient, strict dependency resolution | +| CI | GitHub Actions | Most common, free for open source | +| Input validation | Zod | TypeScript-first, runtime validation with type inference | +| Env validation | t3-env (@t3-oss/env-core) | Type-safe env vars, Zod-based, prevents runtime crashes from missing env | +| Security headers | helmet | Express middleware for secure HTTP headers | +| Rate limiting | express-rate-limit | Prevent abuse, configurable per-route | + +## Alternatives (if user prefers) + +| Concern | Alternatives | +|---|---| +| Test framework | Jest, node:test | +| Linter | Biome (lint + format combined) | +| Package manager | npm, yarn | +| CI | GitLab CI, CircleCI | +| Input validation | Joi, Yup, ArkType | +| Env validation | envalid, env-var | + +## Notes + +- If the project already uses a tool (e.g., Jest for testing), prefer keeping it unless there's a strong reason to migrate. +- helmet and express-rate-limit only apply to Express-based projects. For Fastify, Koa, or other frameworks, suggest the equivalent middleware. +- t3-env requires Zod as a peer dependency, which is already included for input validation. diff --git a/.claude/skills/verification-before-completion/SKILL.md b/.claude/skills/verification-before-completion/SKILL.md new file mode 100644 index 0000000..2f14076 --- /dev/null +++ b/.claude/skills/verification-before-completion/SKILL.md @@ -0,0 +1,139 @@ +--- +name: verification-before-completion +description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always +--- + +# Verification Before Completion + +## Overview + +Claiming work is complete without verification is dishonesty, not efficiency. + +**Core principle:** Evidence before claims, always. + +**Violating the letter of this rule is violating the spirit of this rule.** + +## The Iron Law + +``` +NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE +``` + +If you haven't run the verification command in this message, you cannot claim it passes. + +## The Gate Function + +``` +BEFORE claiming any status or expressing satisfaction: + +1. IDENTIFY: What command proves this claim? +2. RUN: Execute the FULL command (fresh, complete) +3. READ: Full output, check exit code, count failures +4. VERIFY: Does output confirm the claim? + - If NO: State actual status with evidence + - If YES: State claim WITH evidence +5. ONLY THEN: Make the claim + +Skip any step = lying, not verifying +``` + +## Common Failures + +| Claim | Requires | Not Sufficient | +|-------|----------|----------------| +| Tests pass | Test command output: 0 failures | Previous run, "should pass" | +| Linter clean | Linter output: 0 errors | Partial check, extrapolation | +| Build succeeds | Build command: exit 0 | Linter passing, logs look good | +| Bug fixed | Test original symptom: passes | Code changed, assumed fixed | +| Regression test works | Red-green cycle verified | Test passes once | +| Agent completed | VCS diff shows changes | Agent reports "success" | +| Requirements met | Line-by-line checklist | Tests passing | + +## Red Flags - STOP + +- Using "should", "probably", "seems to" +- Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.) +- About to commit/push/PR without verification +- Trusting agent success reports +- Relying on partial verification +- Thinking "just this once" +- Tired and wanting work over +- **ANY wording implying success without having run verification** + +## Rationalization Prevention + +| Excuse | Reality | +|--------|---------| +| "Should work now" | RUN the verification | +| "I'm confident" | Confidence ≠ evidence | +| "Just this once" | No exceptions | +| "Linter passed" | Linter ≠ compiler | +| "Agent said success" | Verify independently | +| "I'm tired" | Exhaustion ≠ excuse | +| "Partial check is enough" | Partial proves nothing | +| "Different words so rule doesn't apply" | Spirit over letter | + +## Key Patterns + +**Tests:** +``` +✅ [Run test command] [See: 34/34 pass] "All tests pass" +❌ "Should pass now" / "Looks correct" +``` + +**Regression tests (TDD Red-Green):** +``` +✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass) +❌ "I've written a regression test" (without red-green verification) +``` + +**Build:** +``` +✅ [Run build] [See: exit 0] "Build passes" +❌ "Linter passed" (linter doesn't check compilation) +``` + +**Requirements:** +``` +✅ Re-read plan → Create checklist → Verify each → Report gaps or completion +❌ "Tests pass, phase complete" +``` + +**Agent delegation:** +``` +✅ Agent reports success → Check VCS diff → Verify changes → Report actual state +❌ Trust agent report +``` + +## Why This Matters + +From 24 failure memories: +- your human partner said "I don't believe you" - trust broken +- Undefined functions shipped - would crash +- Missing requirements shipped - incomplete features +- Time wasted on false completion → redirect → rework +- Violates: "Honesty is a core value. If you lie, you'll be replaced." + +## When To Apply + +**ALWAYS before:** +- ANY variation of success/completion claims +- ANY expression of satisfaction +- ANY positive statement about work state +- Committing, PR creation, task completion +- Moving to next task +- Delegating to agents + +**Rule applies to:** +- Exact phrases +- Paraphrases and synonyms +- Implications of success +- ANY communication suggesting completion/correctness + +## The Bottom Line + +**No shortcuts for verification.** + +Run the command. Read the output. THEN claim the result. + +This is non-negotiable. diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml new file mode 100644 index 0000000..3c22e5f --- /dev/null +++ b/.github/workflows/claude-review.yml @@ -0,0 +1,178 @@ +# Claude Multi-Agent Review +# +# Runs a swarm of specialized Claude review agents on each PR, one per +# concern (PostgreSQL, TypeScript, Docker, security, documentation). +# Each agent runs in parallel under its own job and posts inline + +# summary comments on the PR. All agents run on `claude-haiku-4-5` to +# keep cost low. +# +# Security model is borrowed from stellar/actions PR #103 +# (https://github.com/stellar/actions/pull/103). This repo doesn't accept +# fork-based contributions, so we use the safer `pull_request` trigger. +# Switch to `pull_request_target` only if that changes — and re-read the +# threat model in that PR first. +# +# To add or change an aspect, edit the `matrix.include` list below. Each +# entry needs `id` (kebab-case, used in concurrency + comment headers), +# `title` (shown to humans), and `focus` (the prompt body for that +# agent). +name: Claude Multi-Agent Review + +on: + pull_request: + types: [opened, ready_for_review, synchronize, reopened] + +# Concurrency is set per-(PR, matrix aspect) inside the job below. +# A workflow-level concurrency group can't reference `matrix.*`, so it +# would collapse all five matrix entries into one group and cancel +# them. The job-level group below uses `matrix.id`, which is unique +# per agent. + +permissions: {} + +jobs: + review: + if: github.event.pull_request.draft == false + runs-on: ubuntu-latest + # Per-(PR, aspect) so a new push cancels stale runs of the same + # aspect, but parallel aspects on the same PR don't cancel each + # other. Must live on the job (not workflow) because `matrix.*` is + # only resolvable here. + concurrency: + group: claude-review-${{ github.event.pull_request.number }}-${{ matrix.id }} + cancel-in-progress: true + permissions: + contents: read + pull-requests: write + id-token: write + strategy: + fail-fast: false + matrix: + include: + - id: postgres + title: "PostgreSQL & Prisma" + skills: "postgres-optimization, database-migrations" + focus: | + Review the PR strictly for database concerns. Ignore everything else. + - Prisma schema design, indexes, relations, cascade rules + - Query patterns: N+1, missing `select`/`include` scoping, unbounded `findMany` + - Migration safety: backfills, nullability changes, locking on large tables + - Transaction boundaries, isolation, race conditions + - Connection pooling (Cloud SQL connector usage), prepared statements + - Pagination correctness (cursor vs offset) + Skip findings about formatting, naming, or non-DB code. + - id: typescript + title: "TypeScript Best Practices" + skills: "productionalize-node" + focus: | + Review the PR strictly for TypeScript and Node/Express idioms. + - Type narrowing, `any`/`unknown` use, unsafe casts, missing return types on exported fns + - Zod schema correctness and parse vs safeParse usage + - Async/await: unhandled promises, missing error handling at boundaries + - Express 5 patterns: route handler signatures, middleware order, error middleware + - Module boundaries, barrel-file abuse, circular imports + - Idiomatic Node 22+ usage (no legacy callback patterns where async exists) + Do not flag style/format issues handled by prettier/eslint. + - id: docker + title: "Docker & Bundle Size" + skills: "docker-patterns" + focus: | + Review the PR strictly for container image size and build hygiene. + Look at `Dockerfile`, `.dockerignore`, `Makefile`, and any added deps. + - Multi-stage layering: are dev deps leaking into the runtime stage? + - `node_modules` size: any heavy deps added that have lighter alternatives? + - Cache-friendliness of layer ordering (lockfile copy before source) + - `.dockerignore` coverage (tests, docs, .env, creds) + - Use of `alpine` base, `--ignore-scripts` for prod install, `USER node` + - Unused files copied into the runtime image + Skip non-container concerns. + - id: security + title: "Security Review" + skills: "security-audit" + focus: | + Review the PR strictly for security concerns. + - Secrets/credentials: hardcoded values, leaked env vars, `creds.json` handling + - Input validation: missing zod parsing at HTTP boundaries, SQL/NoSQL injection + - AuthN/AuthZ gaps, missing rate-limit / helmet config changes + - SSRF, open redirects, prototype pollution + - Dep additions: known CVEs, abandoned packages, supply-chain risk + - Logging of sensitive data (PII, tokens) via pino + - CORS / proxy-addr / trust-proxy misconfig + Cite OWASP categories where relevant. Skip non-security issues. + - id: docs + title: "Documentation Freshness" + skills: "" + focus: | + Review the PR strictly for documentation drift. + - Does `README.md` still reflect the changed behavior, env vars, scripts, endpoints? + - Are new env vars added to `.env.example`? + - Are new pnpm scripts or Makefile targets documented? + - Are JSDoc/TSDoc blocks on changed exported functions still accurate? + - Does `docs/` need updates for changed routes/serializers/pagination? + Do not propose new documentation pages — only flag drift introduced by this PR. + steps: + # Trusted base ref. We use `pull_request` (not `_target`), so this + # is effectively the same as the PR head from a permissions + # standpoint — secrets are only available for same-repo PRs. We + # still check out at the workspace root so tool/config files are + # picked up cleanly. Fetch full history for `git log` / `git blame`. + - uses: actions/checkout@v6 + with: + fetch-depth: 0 + + - uses: anthropics/claude-code-action@20c8abf165d5f85ab3fc970db9498436377dc9d1 # v1 + with: + anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} + track_progress: true + prompt: | + REPO: ${{ github.repository }} + PR NUMBER: ${{ github.event.pull_request.number }} + REVIEW ASPECT: ${{ matrix.title }} (${{ matrix.id }}) + + You are one of several specialist review agents running in + parallel on this PR. Stay strictly in your lane — other + agents cover other concerns. If something is outside your + scope, do not comment on it. + + ## Required: load your skill(s) first + + Before reading any PR code, invoke the Skill tool for each + of these skills (committed at `.claude/skills//SKILL.md`): + **${{ matrix.skills }}** + + If that field is empty, skip this step. Treat the skill + content as authoritative guidance for your review — your + findings should reflect the checks it prescribes. + + ## Your focus + + ${{ matrix.focus }} + + ## How to deliver findings + + 1. Inline comments for concrete issues. Prefix the body with + `[${{ matrix.title }}]` so reviewers can tell which agent + raised it. + 2. ONE summary PR comment at the end via + `gh pr comment ${{ github.event.pull_request.number }} --body "..."`. + Start it with `## ${{ matrix.title }} Review` so all five + agents' summaries form a coherent thread. + 3. If nothing notable, post a single one-line summary saying + "No ${{ matrix.title }} issues found." Do not post inline + comments in that case. + + ## Rules + + - Do not duplicate findings between inline and summary — + summary should reference inline counts, not repeat them. + - Cite file:line for every claim. No vague advice. + - Use `gh pr diff ${{ github.event.pull_request.number }}` / + `gh pr view ${{ github.event.pull_request.number }}` to + read the PR — pass the PR number explicitly, don't rely on + cwd. + - Be concise. This is one of five reviews — reviewers will + be reading them stacked. + claude_args: | + --model claude-haiku-4-5 + --max-turns 25 + --allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment ${{ github.event.pull_request.number }}:*),Bash(gh pr diff ${{ github.event.pull_request.number }}:*),Bash(gh pr view ${{ github.event.pull_request.number }}:*),Bash(git log:*),Bash(git blame:*),Bash(git show:*),Bash(rg:*),Bash(cat:*),Bash(ls:*),Bash(find:*)"