From 0161cb0ddba3034426d60b75fae96054a80715f0 Mon Sep 17 00:00:00 2001 From: Tom Larkworthy Date: Mon, 1 Jun 2026 20:41:20 +0200 Subject: [PATCH] Slack-Colibri bridge: update proposal for v0 production status Reflect that the bridge has shipped and is operational. Tighten the narrative from "design we propose" to "this is live, here's what's running and where the code lives". Status: v0 forward path live since 2026-05-31 on feelingofcomputing.bsky.social. Inverse-pattern channel ownership (community owner pre-creates channels under their DID, bot authors messages referencing those channel rkeys) - the workaround the original proposal described under "Inverse pattern: community owner pre-creates channels" is what shipped. Major changes: - Status banner reflects live deployment with bot DID (4gcxakknd6hxtnhf33miwsob) and separate community-owner DID (j7nm3lrd5h7fm3sfhcv3lhfv). - "What's live" bullet list of operational features: messages, edits, deletions, reactions, mentions (with DID resolution + inline pre-pass to populate the name cache), threaded replies, file attachments (image / video / audio inline, 5 MB cap, oversize placeholder), and lossless `com.feelingofcomputing.bridge.slackRaw` capture. - "Code" section linking the slack-sync monorepo (worker + backfill + Slack manifest); auto-deploy via Cloudflare Workers Builds on push to main. - Lexicon namespace renamed `com.feelingof.bridge.*` -> `com.feelingofcomputing.bridge.*` (community owns feelingofcomputing.com/.org/.net). - Architecture diagram updated: no D1 cache in v0; slackRaw is the de-facto idempotency boundary; D1 binding is reserved for v0.1+ if cache reads become useful. - slackRaw marked live; slackOrigin / slackChannel / slackUser marked designed-for-v0.1 (worker uses in-source maps today). - "Asks of Colibri" + "Open questions" reframed as "What would improve v0 next" + "Open" with already-shipped items removed (reactions, edits, deletes, file attachments now in "What's live"). New "Open" item: backfill in progress (4 weeks bridged as of 2026-06-01); unmapped-channel gap (6 channels, ~4600 msgs) pending community-owner action. Co-Authored-By: Claude Opus 4.7 (1M context) --- pages/slack-colibri-bridge.md | 377 ++++++++-------------------------- 1 file changed, 86 insertions(+), 291 deletions(-) diff --git a/pages/slack-colibri-bridge.md b/pages/slack-colibri-bridge.md index 99dd077..4d9e5b9 100644 --- a/pages/slack-colibri-bridge.md +++ b/pages/slack-colibri-bridge.md @@ -1,176 +1,99 @@ --- -title: Slack → Colibri Bridge (Proposal) +title: Slack → Colibri Bridge contributors: Tom Larkworthy --- -> **Status:** Draft proposal. Looking for feedback. +> **Status:** v0 forward path live since 2026-05-31. Bot identity [`@feelingofcomputing.bsky.social`](https://bsky.app/profile/feelingofcomputing.bsky.social) (`did:plc:4gcxakknd6hxtnhf33miwsob`). Community owned by a separate identity (`did:plc:j7nm3lrd5h7fm3sfhcv3lhfv`). Auto-deployed on push. -A one-way sync that mirrors Slack messages from the FoC Slack workspace into the [Colibri](https://colibri.social) atproto network, so the conversation data ends up public and consumable via the atproto firehose. +One-way sync from the FoC Slack workspace into the [Colibri](https://colibri.social) atproto network. Every bridged message, reaction, and attachment is a public record on the bot's bsky.social PDS; every raw Slack event is archived losslessly under a `com.feelingofcomputing.bridge.*` lexicon on the same repo. The bot authors messages into channels owned by a separate community-owner DID — the *inverse pattern* described under [Identity](#identity). -## Why this shape +## What's live -- **Lowest-risk path to atproto.** The [social.colibri](https://lexicon.garden/browse/social.colibri) lexicon is the most fleshed-out Slack-shaped lexicon already on atproto. Reusing it lets us playtest Colibri as a Slack replacement without designing a new schema. -- **Slack stays canonical.** One-way (Slack → atproto). No write-back. -- **Public by default.** Records on atproto are public. +Forward path (Slack → atproto), end-to-end: -## Architecture +- **Messages** — rich text → Colibri facets (bold, italic, strikethrough, code, link, channel); 2048-char cap; fallback to plain-text + URL regex for legacy non-`blocks` messages +- **Mentions** — `@user` resolves to a `social.colibri.richtext.facet#mention` against the user's claimed DID via the [in-source map](https://github.com/tomlarkworthy/slack-sync/blob/main/packages/worker/src/slack-to-did.ts); plain text fallback for unmapped users +- **Threaded replies** — Slack `thread_ts` → Colibri `parent` +- **Reactions** add + remove — deterministic per-emoji rkey on the target message +- **Message edits** — re-derive in place, `putRecord` overwrites, `edited: true` set per the message lexicon +- **Message deletes** — `deleteRecord` on the derived rkey +- **File attachments** — fetch `url_private` with bot token → `com.atproto.repo.uploadBlob` → reference in `attachments[]`; 5 MB cap, oversize gets a `[file 'name' too large]` placeholder in the message text +- **Lossless raw-event archive** — every `event_callback` envelope persisted to `com.feelingofcomputing.bridge.slackRaw` *before* derivation, keyed by `event_id` (idempotent on Slack redelivery) + +Not in scope: -1. **Slack Events API** pushes message events to a Cloudflare Worker (the *producer*). It verifies Slack's HMAC signature, enqueues the event, and acks within Slack's 3-second budget. -2. **Cloudflare Queue** holds events durably. A consumer Worker pulls batches and publishes to atproto. Queue retries cover PDS slowness and rate limits. -3. **atproto is the source of truth.** The bot's bsky.social repo holds the published Colibri messages and our own sidecar records. The PDS is the dedupe authority — see Race resolution. -4. **Cloudflare D1 (SQLite)** holds: - - A single `cache` table — dumb projection of atproto records from any repo, keyed by `(repo, collection, rkey)`, body stored opaquely as JSON. Wipe it and the firehose rebuilds it. - - Separate private tables (e.g. `oauth_tokens` in v2) for credentials that can't go on atproto. +- Reverse path (Colibri → Slack) +- Private channels, DMs +- Per-Slack-user authorship — every bridged record is authored by the bot; the original speaker appears as `@user:` in the message text +- `slackRaw` backfill — historical days (pre-2026-05-31) have derived `social.colibri.message` + `social.colibri.reaction` only; the raw archive only exists for traffic the live bridge has seen -The consumer writes records in a fixed order per event: **first** `slackRaw` (lossless capture of the payload as received from Slack), **then** the derived `social.colibri.message`, **then** `slackOrigin` linking the two. If derivation crashes or is later improved, the raw record is still on atproto and the Colibri view can be regenerated without re-pulling Slack. See [slackRaw](#new-comfeelingofbridgeslackraw). +## Code -### Sequence diagram +- **Repo**: [tomlarkworthy/slack-sync](https://github.com/tomlarkworthy/slack-sync) — Bun workspace monorepo +- **Worker**: [`packages/worker/src/index.ts`](https://github.com/tomlarkworthy/slack-sync/blob/main/packages/worker/src/index.ts) — single Cloudflare Worker; producer (`fetch`) + consumer (`queue`) in one script +- **Backfill**: [`packages/backfill/src/index.ts`](https://github.com/tomlarkworthy/slack-sync/blob/main/packages/backfill/src/index.ts) — Bun CLI for re-publishing day-files from [Mariano's archive](https://github.com/marianoguerra/Feeling-of-Computing) +- **Slack app manifest**: [`manifest/slack-app.yaml`](https://github.com/tomlarkworthy/slack-sync/blob/main/manifest/slack-app.yaml) +- **Deploy**: Cloudflare Workers Builds, push-to-main → auto-deploy + +## Architecture ```mermaid sequenceDiagram autonumber actor U as User in Slack participant S as Slack Events API - participant P as Producer Worker + participant P as Producer (fetch) participant Q as CF Queue - participant C as Consumer Worker - participant DB as D1 cache + participant C as Consumer (queue) participant PDS as bsky.social PDS
(bot's repo) - participant App as Colibri app /
firehose readers + participant App as Colibri / firehose readers - U->>S: send message / reply + U->>S: send / edit / delete / react S->>P: POST /slack/events - P->>P: verify X-Slack-Signature - P->>Q: enqueue { event } + P->>P: HMAC-verify X-Slack-Signature + P->>Q: enqueue { event_callback } P-->>S: 200 OK (within 3s) C->>Q: pull batch - C->>DB: SELECT cache
WHERE repo=bot
AND collection='…slackOrigin'
AND rkey='channel-ts' - alt cache hit - Note over C: already bridged — skip - else cache miss - C->>PDS: createRecord social.colibri.message - PDS-->>C: { uri, cid } - C->>PDS: createRecord slackOrigin
(deterministic rkey) - alt PDS 200 - C->>DB: INSERT INTO cache - else PDS 409 (concurrent dup) - C->>PDS: getRecord slackOrigin - PDS-->>C: { record } - C->>DB: INSERT INTO cache - end + C->>PDS: putRecord slackRaw (rkey = event_id) + alt event.type = message + C->>PDS: putRecord / deleteRecord
social.colibri.message + else event.type = reaction_added / removed + C->>PDS: putRecord / deleteRecord
social.colibri.reaction end + C->>Q: ack (or retry → DLQ) App->>PDS: firehose / fetch App-->>U: public conversation visible ``` -ASCII fallback: - -``` - Slack user - │ - ▼ - Slack Events API ──▶ Producer Worker (verify, ack <3s) - │ - ▼ - CF Queue - │ - ▼ - Consumer Worker - │ - SELECT cache (repo, collection, rkey) - │ - hit ──┼── miss - ▼ ▼ - skip createRecord(message, slackOrigin) - │ - PDS 200 │ PDS 409 (race) - ▼ ▼ - INSERT cache getRecord, INSERT cache - │ - ▼ - firehose ──▶ Colibri app -``` +Notes on the live shape vs. the proposal that preceded it: -## Storage model - -atproto holds the public truth. D1 contains: - -- A single **`cache`** table — a polymorphic key-value mirror of atproto records from any repo, keyed by `(repo, collection, rkey)` with the record body stored as JSON. Dumb projection: no bridge bookkeeping, no fields that aren't already on atproto. Wipe it and the firehose rebuilds it. -- Separate **private** tables for state that can't go on atproto (v2 OAuth tokens). - -Queries that need particular fields use SQLite's JSON1 functions: - -```sql --- "what's the colibri channel for slack channel C123?" -SELECT json_extract(record, '$.colibriChannelUri') FROM cache -WHERE repo = 'did:plc:' - AND collection = 'com.feelingof.bridge.slackChannel' - AND rkey = 'C123'; -``` - -### Schema sketch - -```sql -CREATE TABLE cache ( - repo TEXT NOT NULL, -- DID of the repo this record lives on - collection TEXT NOT NULL, -- e.g. 'com.feelingof.bridge.slackOrigin' - rkey TEXT NOT NULL, - record TEXT NOT NULL, -- JSON; lexicon-conformant record body - cached_at INTEGER NOT NULL, - PRIMARY KEY (repo, collection, rkey) -); - --- v2; private; never on atproto -CREATE TABLE oauth_tokens ( - slack_user_id TEXT PRIMARY KEY, - did TEXT NOT NULL, - access_token_enc BLOB, - refresh_token_enc BLOB NOT NULL, - expires_at INTEGER NOT NULL, - scope TEXT, - updated_at INTEGER NOT NULL -); -``` - -### Race resolution - -The PDS is the dedupe authority via the deterministic rkey on `com.feelingof.bridge.slackOrigin`. The consumer: - -1. Check `cache` for `(repo=bot-did, collection='…slackOrigin', rkey='channel-ts')`. Hit → skip. -2. Miss → `createRecord` on PDS. 200 means we won the race. 409 means another consumer already published; `getRecord` fetches the canonical record. -3. Either path → INSERT into `cache`. - -Cache staleness is the residual risk: an upstream edit on the PDS leaves our row out of date until it's evicted. Our sidecar records are mostly write-once so this is rare in practice; v0 ignores it. +- **No D1 cache in v0.** The worker writes straight to the PDS and leans on deterministic rkeys + `putRecord` overwrite for idempotency. The D1 binding is reserved in `wrangler.toml` for v0.1+ if cache reads become useful. +- **slackRaw is the de-facto idempotency boundary.** rkey = sanitised Slack `event_id`; Slack redeliveries hit the same rkey and overwrite identical content. +- **Message rkey** = `tidFromSlackTs(message.ts)` — edits overwrite in place, deletes target the same rkey. +- **Reaction rkey** = `tidFromSlackTs(target.ts, hash10('react:' + emoji))` — same emoji on the same message always lands on the same record, so `reaction_removed` can `deleteRecord` without lookup. ## Lexicons -### Reused: `social.colibri.message` - -- `text` ← Slack text (truncated to 2048 chars, prefixed `**@user:** ` for attribution — see Identity) -- `channel` ← via the `com.feelingof.bridge.slackChannel` sidecar -- `parent` ← parent's `slackOrigin.messageUri` → rkey -- `createdAt` ← Slack `ts` -- `facets` ← mentions, links (v0.1) -- `attachments` ← deferred (Slack files need blob re-upload) +### Reused from Colibri -### New: `com.feelingof.bridge.slackRaw` +- [`social.colibri.message`](https://lexicon.garden/browse/social.colibri.message) — text, facets, createdAt, channel, parent, attachments, edited +- [`social.colibri.reaction`](https://lexicon.garden/browse/social.colibri.reaction) — emoji, targetMessage +- [`social.colibri.richtext.facet`](https://lexicon.garden/browse/social.colibri.richtext.facet) — bold, italic, strikethrough, code, link, mention, channel -Lossless archival of the raw Slack event payload. Written **before** any derivation, so the bridge never drops information it doesn't yet know how to render — reactions, edits, attachments, blocks, mrkdwn nuances — even if the v0 Colibri-message derivation ignores most of them. The bot's atproto repo becomes a public, replicable Slack archive that anyone can re-derive a Colibri view from. +### Owned: `com.feelingofcomputing.bridge.*` -`key: "any"` so the rkey matches the `slackOrigin` rkey for join-free lookup: +Namespace owned because FoC owns `feelingofcomputing.com/.org/.net`. The archive's lifetime, schema, and ownership belong to FoC, not Colibri — explicit boundary in the lexicon namespace de-risks Colibri lexicon churn (it has a substantial rework in flight on `feat/rework`) and de-risks Colibri disappearing entirely. If either happens, the raw archive on atproto is untouched and re-derivable. -``` -rkey = `${slackChannelId}-${slackTs.replace('.','-')}` -``` +#### Live in v0: `com.feelingofcomputing.bridge.slackRaw` -Edits and reactions update the same record (`putRecord`). v0.1 could split this into per-event-type records (`channelId-ts-eventType-seq`) if we need full event history rather than latest-known state. +Lossless capture of the full `event_callback` envelope, written *before* any derivation. ```json { "lexicon": 1, - "id": "com.feelingof.bridge.slackRaw", + "id": "com.feelingofcomputing.bridge.slackRaw", "defs": { "main": { "type": "record", @@ -181,8 +104,8 @@ Edits and reactions update the same record (`putRecord`). v0.1 could split this "properties": { "slackChannelId": { "type": "string" }, "slackTs": { "type": "string" }, - "eventType": { "type": "string", "description": "Slack event subtype: 'message', 'message_changed', 'message_deleted', 'reaction_added', etc." }, - "payload": { "type": "unknown", "description": "Raw Slack message object as received (minus auth tokens)." }, + "eventType": { "type": "string", "description": "Slack event type: message, message_changed, message_deleted, reaction_added, reaction_removed, etc." }, + "payload": { "type": "unknown", "description": "Raw Slack event_callback envelope as received." }, "capturedAt": { "type": "string", "format": "datetime" } } } @@ -191,202 +114,74 @@ Edits and reactions update the same record (`putRecord`). v0.1 could split this } ``` -Slack file attachments referenced in `payload.files[]` are not blobbed in v0; their `url_private` is captured but the bytes stay on Slack. v0.1 fetches and re-uploads as atproto blobs, referencing them from the derived `social.colibri.message`. - -#### Why this lives under `com.feelingof.*`, not `social.colibri.*` - -`slackRaw` is the *FoC community's* archive of its own Slack history. Its lifetime, schema, and ownership belong to FoC — not to Colibri — and we want that boundary explicit in the lexicon namespace. Three practical consequences: - -- **De-risks Colibri lexicon churn.** Colibri is actively reworking its lexicons on the `feat/rework` branch (the v1 `src/utils/atproto/lexicons.ts` is deleted there; a new `apps/website/src/utils/atproto/lexicons.ts` is in flight, along with new appview spec, streaming event types, and a refactored Message component tree). If a future Colibri version renames `social.colibri.message`, splits the facet model, or changes the channel/community ownership semantics that drove the constraints in this proposal, the `slackRaw` archive is untouched. We re-run the derivation against the new lexicons and republish — no Slack re-pull, no loss. -- **De-risks Colibri disappearing.** If Colibri is abandoned, the FoC archive is still complete, public, and addressable on atproto. A different reader (or a static-site generator off `foc-server`-style infrastructure) can render it. -- **Avoids polluting Colibri's namespace.** Bridge-specific concepts (Slack `ts`, `slack_user_id`, `subtype`) have no business inside `social.colibri.*`. Other Slack-on-Colibri bridges (different communities, different workspaces) would invent their own `com..bridge.slackRaw` analogues; that's the right shape. - -### New: `com.feelingof.bridge.slackOrigin` - -Provenance + dedupe authority. One-to-one with a `social.colibri.message`. `key: "any"` so we control the rkey: - -``` -rkey = `${slackChannelId}-${slackTs.replace('.','-')}` -``` - -A redelivered Slack event hits a 409 at the PDS — that's what makes the bridge idempotent regardless of cache state. We can't put the deterministic rkey on the message itself: `social.colibri.message` uses `key: "tid"`, which expects monotonically-increasing TIDs per repo — backfilling old Slack history after live messages would be rejected. The sidecar sidesteps this; the message gets a fresh PDS-minted TID, the sidecar's `messageUri` carries the bridge. +rkey = sanitised Slack `event_id`. Slack file attachments referenced in `payload.files[]` are captured by URL only; the bytes are re-uploaded as atproto blobs by the derived `social.colibri.message` path (see [Code → Worker](https://github.com/tomlarkworthy/slack-sync/blob/main/packages/worker/src/index.ts)). -```json -{ - "lexicon": 1, - "id": "com.feelingof.bridge.slackOrigin", - "defs": { - "main": { - "type": "record", - "key": "any", - "record": { - "type": "object", - "required": ["messageUri", "slackChannelId", "slackTs", "createdAt"], - "properties": { - "messageUri": { "type": "string", "format": "at-uri" }, - "slackChannelId": { "type": "string" }, - "slackTs": { "type": "string" }, - "slackUserId": { "type": "string" }, - "createdAt": { "type": "string", "format": "datetime" } - } - } - } - } -} -``` +## Maintenance -### New: `com.feelingof.bridge.slackChannel` +The bridge's per-deployment configuration lives in source, not in lexicon records. Two maps: -Slack → Colibri channel mapping. Rkey = Slack channel ID. Created lazily on first sighting of a new Slack channel, after auto-creating the corresponding `social.colibri.channel`. +- **Slack channel → Colibri channel rkey** — `vendor/slack-sync/packages/worker/src/index.ts` (constant `CHANNEL_MAP`) for the live worker, and [`tools/slack-to-colibri-channel.json`](https://github.com/tomlarkworthy/slack-sync) for the backfill CLI. The two must agree. +- **Slack user id → claimed atproto DID** — [`packages/worker/src/slack-to-did.ts`](https://github.com/tomlarkworthy/slack-sync/blob/main/packages/worker/src/slack-to-did.ts) for the live worker, `tools/slack-to-did.json` for the backfill. The two must agree. -```json -{ - "lexicon": 1, - "id": "com.feelingof.bridge.slackChannel", - "defs": { - "main": { - "type": "record", - "key": "any", - "record": { - "type": "object", - "required": ["slackChannelId", "colibriChannelUri", "createdAt"], - "properties": { - "slackChannelId": { "type": "string" }, - "slackChannelName": { "type": "string" }, - "colibriChannelUri": { "type": "string", "format": "at-uri" }, - "createdAt": { "type": "string", "format": "datetime" } - } - } - } - } -} -``` +To add a new bridged channel: community owner creates a `social.colibri.channel` record under their DID inside the existing community + category, hands the rkey to the bridge maintainer, who edits both files, commits, and pushes — Cloudflare Workers Builds redeploys the worker automatically on push. -### New: `com.feelingof.bridge.slackUser` +To map a new user to their DID: get their bsky handle, resolve to a DID with `com.atproto.identity.resolveHandle`, add the entry to both files, commit, push. -Identity record. Rkey = sanitised Slack user ID. `claimedDid` starts unset; populated via the claim flow. The OAuth half (v2) does not appear here — those credentials live in a separate D1 table, never on atproto. +## Identity -```json -{ - "lexicon": 1, - "id": "com.feelingof.bridge.slackUser", - "defs": { - "main": { - "type": "record", - "key": "any", - "record": { - "type": "object", - "required": ["slackUserId", "createdAt"], - "properties": { - "slackUserId": { "type": "string" }, - "slackHandle": { "type": "string" }, - "displayName": { "type": "string" }, - "claimedDid": { "type": "string", "format": "did" }, - "claimedAt": { "type": "string", "format": "datetime" }, - "createdAt": { "type": "string", "format": "datetime" } - } - } - } - } -} -``` +Two DIDs are involved: -## Identity +- **Community owner** (`did:plc:j7nm3lrd5h7fm3sfhcv3lhfv`) owns the [Feeling of Computing community](https://colibri.social) and every category + channel under it. +- **Bot** (`did:plc:4gcxakknd6hxtnhf33miwsob`, handle `feelingofcomputing.bsky.social`) is a member with a `social.colibri.membership` record; it authors every bridged message into channels owned by the community owner. -One bot DID on `bsky.social` (e.g. `feelingof.bsky.social`). The bot owns the Colibri community, every category in it, every channel under those categories, and every bridged message. Attribution for the original Slack speaker lives in the message text body as `@user: ...` (rendered with a mention facet once the speaker has claimed a DID). +Attribution for the original Slack speaker lives in the message text as `@user: ...` — rendered as a Colibri mention facet once the speaker has claimed a DID. ### Authorship is immutable -The author of an atproto record is the DID of the repo it lives on. `putRecord` edits content; `deleteRecord` removes a record; nothing reassigns authorship. Consequences: - -- All bridged messages stay authored by the bot. Forever. -- "Post from user's DID" (v2 below) only applies to *new* messages after claim. Hybrid history. -- Retroactive delete + republish would change the at-uri, breaking links / threads / firehose state. Not recommended. +The author of an atproto record is the DID of the repo it lives on. `putRecord` edits content; `deleteRecord` removes a record; nothing reassigns authorship. All bridged messages stay authored by the bot. Retroactive delete + republish would change the at-uri, breaking links, threads, and firehose state, so we don't. ### Channels live in the bot's community -The Colibri appview hard-couples channel ownership to community ownership by author DID. From `jetstream.rs` channel handler: +The Colibri appview hard-couples channel ownership to community ownership by author DID. From `jetstream.rs`: ```rust let community_uri = format!("at://{}/social.colibri.community/{}", did, record.community); ``` -The community URI an indexed channel belongs to is constructed from the **channel record's author DID** plus the channel's `community` rkey field. A bot-authored channel record can only resolve to a community on the bot's own repo; the appview will not index a bot-authored channel into a community owned by someone else's DID. - -Consequences: - -- The bot bootstraps and owns its own `social.colibri.community` record. Bridged channels cannot be inserted into a pre-existing third-party community by the bot. -- Discovery of the bot-owned space happens by linking to the bot's community URI, not by appearing inside an existing community's sidebar. -- The bot's community + at least one category must exist before any channel lazy-creation. Categories' `channelOrder` is a read-modify-write append per new channel. +The community URI for an indexed channel is constructed from the **channel record's author DID** plus the channel's `community` rkey field. A bot-authored channel record can only resolve to a community on the bot's own repo; the appview will not index a bot-authored channel into a community owned by a different DID. -#### Inverse pattern: community owner pre-creates channels - -The constraint is on the channel record's author DID, not on who writes messages into the channel. So a cooperative community owner can pre-create the bridged channels on their own repo, under their own community + category, and the bot publishes messages referencing those channel rkeys. Validated against the FoC Colibri instance: six Slack channels (`present-company`, `share-your-work`, `thinking-together`, `of-ai`, `devlog-together`, `linking-together`) created by the community owner on `did:plc:j7nm3lrd5h7fm3sfhcv3lhfv` under the Feelingsof community; `trendingnotebooks.bsky.social` (with a `social.colibri.membership` for that community) published a 9-message backfill (1 top-level + 8 thread replies) into `#present-company`. Messages appeared with correct threading. - -In this mode the bridge only needs the slack-channel → colibri-channel-rkey mapping; it does no channel-creation, no `social.colibri.community` ownership, no `channelOrder` mutation. The trade-off is one-time manual setup by the community owner and an ongoing convention that the owner adds a Colibri channel whenever a new Slack channel should be bridged. +v0 went with the **inverse pattern**: the community owner pre-creates the community + every bridged channel under their own DID; the bot only authors messages referencing the channel rkeys. The bot does *not* own the community, channels, or categories. Trade-off: one-time manual setup by the community owner and an ongoing convention that they add a Colibri channel whenever a new Slack channel should be bridged — but the bot's repo stays a pure archival identity, and ownership of the community is decoupled from the bridge's operational lifetime (we could replace the bot identity tomorrow without affecting the community). ### Per-message avatar / displayName -Avatar and displayName come from the actor's profile record — one per DID. With one bot DID, every bridged message renders with the bot's avatar. The Colibri message lexicon has no override fields. Per-user ghost DIDs (Bridgy Fed's approach) would solve this but we reject them for v0/v1: bsky.social account-creation rate limits, and creating DIDs for users without consent. - -Upstream ask of Colibri: extend `social.colibri.message` with optional render-time author overrides. - -```json -"displayAuthor": { - "type": "object", - "description": "Override author render for bridged messages.", - "properties": { - "name": { "type": "string", "maxLength": 64 }, - "avatar": { "type": "blob", "accept": ["image/jpeg", "image/png"] } - } -} -``` - -### Claim flow - -A user posts `I am did:plc:...` in any bridged Slack channel. The bridge extracts the DID and writes it to the `slackUser.claimedDid` atproto record (and the cache mirrors it). No cryptographic verification in v1 — posting it from their Slack account is the trust signal, and the claim is publicly visible for anyone to challenge. - -v2 requires a counter-claim record on the claimed DID's repo, verifiable from atproto alone. - -### Posting from the claimed DID (v2) - -Once `claimedDid` is set, future messages could be authored from the user's DID. Requires an OAuth credential delegated to the bridge, against the user's PDS — atproto OAuth specifically, not Bluesky app-passwords (which are a bsky.social UX, not portable across PDSes). Refresh tokens are medium-lived; the bridge re-prompts via Slack DM near expiry. Tokens live in D1's `oauth_tokens` table (separate from the atproto cache), encrypted at rest, keyed by Slack user ID. - -### Credential-free alternative: user-driven backfill - -A claimed user can republish their own messages onto their own repo at any time without granting the bridge anything. The bot's repo is a public archive — pull the `slackOrigin` records matching their `slackUserId`, republish the corresponding messages from their own DID. We ship a small CLI. No trust delegation, full data ownership. +Avatar and displayName come from the actor's profile record — one per DID. With one bot DID, every bridged message renders with the bot's avatar regardless of who sent it on Slack. The Colibri message lexicon has no override fields, so attribution lives in the message text body. This is the biggest visual gap and it isn't fixable without either upstream changes to the Colibri lexicon or per-user atproto identities (rejected: bsky.social account-creation rate limits and creating DIDs for users without their consent). -## Asks of Colibri +## Operational notes -These are upstream changes the bridge benefits from but does not block on. Until they land, `slackRaw` preserves enough state to re-derive when they do. +- **Backfill is in progress.** Sourced from Mariano's weekly archive dumps (`history/YYYY/MM/DD.json`). 4 weeks (2026-05-05 → 2026-06-01) bridged as of 2026-06-01. Rkeys are deterministic, so re-running a day overwrites in place; map updates get picked up on the next pass. +- **6 Slack channels are not bridged** (`#of-end-user-programming`, `#of-graphics`, `#of-music`, `#of-logic-programming`, `#reading-together`, `#of-functional-programming`; ~4600 messages of historical traffic combined). The backfill hard-fails on any day that references them. To bridge them, the community owner creates the corresponding Colibri channels and the maps in *Maintenance* get updated. +- **Lexicon records show "not validated"** in atproto-browser. `com.feelingofcomputing.bridge.slackRaw` isn't published anywhere yet; doing so would require either a `_lexicon` TXT record on `feelingofcomputing.com` or a `com.atproto.lexicon.schema` record. The bridge functions either way — validation is purely a tooling/discoverability signal. -- **Per-record author override** — optional `displayAuthor: { name, avatar? }` on `social.colibri.message` and `social.colibri.reaction`. Without it every bridged message and every aggregated reaction renders as the bot, with attribution hacked into the message text body as `@user: ` and reactions collapsed to a single "@bot reacted" entry per emoji. Single biggest UX win; unblocks proper reaction multi-reactor counts too. -- **Collapsed / nested thread rendering** — Colibri's current UI is Discourse-flat (every reply is a top-level row referencing a `parent` rkey). Slack's threaded conversations don't survive the trip: a 30-reply thread on one Slack message becomes 30 sibling rows in the channel scroll. Inspected `feat/rework` (substantial monorepo + lexicon rewrite in flight) and the new Message component still renders flat with `parent_message` as a jump-link, not as a collapsed sub-thread. Worth raising as a v2 UX direction. -- **Cross-repo channel ownership** — the appview hard-codes `community_uri = at://{channel_author}/social.colibri.community/{rkey}`. The bot cannot create channels in a community it does not own. Today's workaround is "community owner pre-creates channels"; cleaner is either a `communityRepo` field on `social.colibri.channel`, or a `social.colibri.delegation` record granting channel-creation to a specific DID. -- **Quote facet feature** — Slack's `rich_text_quote` blocks render as `> `-prefixed plain text today because Colibri's facet feature set covers bold/italic/strikethrough/code/mention/link/channel but not quote. -- **Confirm TID-on-rkey monotonicity expectations** — `social.colibri.message` uses `key: "tid"`. `bsky.social` tolerates non-monotonic TIDs on rkeys (otherwise our backfill would 409 against live messages). PDSes that *do* enforce monotonicity would break the bridge. Worth a one-line "we don't require monotonic rkeys" assurance in the lexicon docs, or a switch to `key: "any"`. -- **Attachment shape clarity** — examples / docs for `social.colibri.message.attachments[]` would unblock our v0.1 file-attachment work. +## Known gaps -## Open questions +Things observably missing from v0. Listed as facts, not commitments. -- Backfill from `dump-history.js` snapshot, or forward-only? All-channels backfill is significant volume. -- Channel / category layout: single community with flat siblings, or map Slack groupings to Colibri categories? Sidecar is agnostic. -- Private channels and DMs — out of scope. Bot joins public channels only. -- Reactions, edits, deletes — v0 publishes one `social.colibri.reaction` per (target_message, emoji) on the bot's repo; multi-reactor counts are preserved losslessly in `slackRaw` and become recoverable once per-record author override lands. Edits: `putRecord` on both `slackRaw` and the message. Deletes: tombstone the message, retain the `slackRaw` for audit. -- Slack file attachments — `payload.files[]` is preserved in `slackRaw` (including `url_private`) from v0; v0.1 fetches and re-uploads as atproto blobs. bsky.social blob size limits (~1 MB images, ~50 MB video) will force large attachments to external hosting or a more permissive PDS. -- False DID claims. v1 unverified; v2 requires two-sided counter-claim. -- OAuth re-auth UX (v2): frequency cap, fallback when user ignores the prompt. +- **Per-record author override on Colibri's lexicon.** Every bridged message and reaction renders as the bot. Optional `displayAuthor: { name, avatar? }` on `social.colibri.message` and `social.colibri.reaction` would be the cleanest fix; would also unblock proper reaction multi-reactor counts. +- **Thread rendering.** Slack's threads flatten to sibling rows in Colibri because Colibri's UI is Discourse-flat — a 30-reply Slack thread becomes 30 top-level rows referencing the same `parent_message`. Colibri's `feat/rework` branch still renders flat. +- **Cross-repo channel ownership in Colibri's appview.** The appview hard-codes `community_uri = at://{channel_author}/social.colibri.community/{rkey}`, so a single DID has to own both the community and every channel under it. The bridge's workaround is "community owner pre-creates channels under their own DID, bot authors messages referencing those channel rkeys" — requires manual coordination per new bridged channel. +- **Quote facet.** Slack `rich_text_quote` blocks render as `> `-prefixed plain text because Colibri's facet feature set has no quote. +- **TID-on-rkey monotonicity assurance from Colibri.** `social.colibri.message` uses `key: "tid"`. bsky.social tolerates non-monotonic TIDs (without which backfill of old Slack history would 409 against live messages). A PDS that strictly enforced monotonicity would break the bridge; a one-line assurance in the lexicon docs or a switch to `key: "any"` would lock this in. ## Prior art -- **[Bridgy Fed](https://fed.brid.gy)** — ActivityPub ↔ atproto. Not applicable directly (Slack isn't ActivityPub) but informs the rejected per-user ghost-DID approach and our `slackOrigin` provenance pattern. +- **[Bridgy Fed](https://fed.brid.gy)** — ActivityPub ↔ atproto. Not directly applicable (Slack isn't ActivityPub) but informs the rejected per-user ghost-DID approach. - **[matrix-appservice-slack](https://github.com/matrix-org/matrix-appservice-slack)** — closest sibling. Same Slack-webhook → ghost-users → federated-protocol shape, targeting Matrix. -- **Mariano's `scripts/dump-history.js` + `foc-server`** ([repo](https://github.com/marianoguerra/Feeling-of-Computing)) — the existing FoC Slack pipeline runs in a different shape: a Node CLI that pulls `conversations.history` + `conversations.replies` via Slack's REST API on a manual / weekly cadence, writes JSON to `history/YYYY/MM/DD{,.replies}.json`, indexes it into LanceDB with sentence-transformer embeddings, and serves search via a Rust `axum` binary deployed on Ubuntu under systemd behind nginx (see `foc-server/docs/systemd.md`). Pull, not push; ingest-and-reindex, not bridge. Our `slackRaw` lexicon is the atproto-native analog of those committed `history/*.json` dumps — same archival role, public over the firehose instead of `git push`. +- **Mariano's `scripts/dump-history.js` + `foc-server`** ([repo](https://github.com/marianoguerra/Feeling-of-Computing)) — the existing FoC pipeline pulls `conversations.history` + `conversations.replies` via Slack REST on a weekly cadence, writes JSON to `history/YYYY/MM/DD{,.replies}.json`, indexes into LanceDB with sentence-transformer embeddings, serves search via a Rust `axum` binary on Ubuntu under systemd behind nginx. Pull, not push; ingest-and-reindex, not bridge. The bridge's `slackRaw` lexicon is the atproto-native analog of those `history/*.json` dumps — same archival role, public over the firehose instead of `git push`. The backfill CLI consumes these JSON dumps directly. ## Related - [[Projects]] — Colibri and FoC are both listed. - [Colibri lexicons](https://lexicon.garden/browse/social.colibri) - [Colibri source](https://github.com/colibri-social/colibri.social) -- [FoC repo](https://github.com/marianoguerra/Feeling-of-Computing) — see `scripts/dump-history.js` for the current Slack puller. +- [FoC repo](https://github.com/marianoguerra/Feeling-of-Computing)