Skip to content

Post-cutover: migrate body-heavy entities to gitsheets v1.2 content-typed (markdown) records #44

@themightychris

Description

@themightychris

gitsheets v1.2.0 added content-typed records — sheets opt into format.type = 'markdown' to store records as .md files with TOML frontmatter and a designated body field. Plus lazy body loading via query({ withBody: false }).

This is the biggest one-time upgrade we'd take from the gitsheets 1.x line. Not urgent — defer to after cutover-prep ships so we're not refactoring entities mid-migration.

Why migrate

  • Snapshot is actually-readable. Contributors cloning codeforphilly-data-snapshot see real .md files in any markdown viewer — instead of parsing TOML records to find the prose.
  • Authoring via PR. Staff / maintainers can edit a project overview in any markdown editor and PR it; currently they roundtrip through the API.
  • Listing performance. queryAll({ withBody: false }) on hot paths: projects-index, activity feed, FTS seeding, snapshot scrub.
  • Indexes stay fast. Index builds use body-less reads natively in v1.2.

What changes

Entities with substantial body content:

  • Projectoverview (markdown body), summary (short markdown) → migrate overview as the body field, keep summary in frontmatter
  • ProjectUpdatebody (markdown) → migrate body as the body field
  • ProjectBuzzsummary (markdown) → migrate as body
  • Personbio (markdown) → migrate as body
  • HelpWantedRoledescription (markdown) → migrate as body
  • Tagdescription (markdown, short) → optional; cheaper to leave as TOML field

The migration is bounded; entities without long bodies (ProjectMembership, SlugHistory, Revocation, TagAssignment, HelpWantedInterestExpression) stay as TOML records.

Tasks

  1. Schema reshape in packages/shared/src/schemas/ — one designated body field per content-typed entity (rename or restructure the existing overview / body / bio / description / summary fields).
  2. Update .gitsheets/<sheet>.toml configs with [gitsheet.format] type = 'markdown' body = '<fieldName>'.
  3. In-memory loader in apps/api/src/store/memory/loader.ts — use { withBody: false } for index-building reads; lazy-load via Sheet.loadBody(record) when serving record detail responses.
  4. Serializers in apps/api/src/services/serializers/*Html / *Excerpt derived from the body field instead of the legacy string field.
  5. FTS pipeline in apps/api/src/store/fts.ts — body included in the indexed text via lazy-load batch.
  6. apps/api/scripts/import-laddr.ts — write the new markdown format for migrated entities.
  7. apps/api/scripts/scrub-data.ts — the snapshot now contains real .md files; verify the scrub still strips PII correctly across the new file shape.
  8. The data repo's existing TOML records need migration once — write a one-shot apps/api/scripts/migrate-to-content-typed.ts that reads existing records and rewrites as .md per the new format.
  9. Update specs/behaviors/markdown-rendering.md and specs/data-model.md to reflect content-typed entities.

Why defer

  • cutover-prep is next and depends on every other plan; this would invalidate frozen plans (storage-foundation, read-api, write-api, laddr-import, public-snapshot-scrub).
  • The benefit is real but landing is post-cutover work, not pre-cutover refactor.

Out of scope

  • gitsheets check pre-commit hooks belong in the data repo, not this code repo.
  • The bundled Claude Code skill at node_modules/gitsheets/skills/gitsheets/ is available once we bump the dep range; future plans touching gitsheets can load it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions