From b6a46df4055c47b6616fe76ea1779aa6c5aa10f2 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Wed, 4 Mar 2026 13:44:54 -0300 Subject: [PATCH 01/10] rfc: mutability & encryption for forge --- rfc/forge-mutability-encryption.md | 424 +++++++++++++++++++++++++++++ 1 file changed, 424 insertions(+) create mode 100644 rfc/forge-mutability-encryption.md diff --git a/rfc/forge-mutability-encryption.md b/rfc/forge-mutability-encryption.md new file mode 100644 index 0000000..1945d3a --- /dev/null +++ b/rfc/forge-mutability-encryption.md @@ -0,0 +1,424 @@ +# RFC: Mutability and Encryption for Forge (Guppy/Piri) + +**Status: Proposed Standard** + +## Authors + +- [Felipe Forbeck](https://github.com/fforbeck), [Storacha Network](https://storacha.network/) + +## Language + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Introduction + +This RFC proposes the implementation of mutability and content encryption features for Forge enterprise customers using the Guppy client and Piri storage nodes. These features enable: + +1. **Mutability**: Stable, human-readable names that point to the latest version of uploaded content +2. **Content Encryption**: Client-side encryption of file content before upload to the Storacha network +3. **On-Network State Index**: Pail-based key-value store for tracking uploaded content across clients + +Both features are already implemented in the TypeScript ecosystem (`@storacha/ucn`, `@storacha/encrypt-upload-client`). This RFC proposes porting these capabilities to Go for use in Guppy and Piri. + +## Motivation + +Enterprise Forge customers need: + +- **Encryption at rest**: Data protection for sensitive content stored on the network +- **Mutable references**: Backup workflows need stable names that update to point to the latest backup version +- **Cross-client sync**: Multiple Guppy instances should be able to resolve the same named reference +- **On-network state**: Track what has been uploaded without relying on local-only databases + +## Scope + +### In Scope + +- Content encryption (file bytes are encrypted) +- Infra for folder-level access control via `nb.prefix` on decrypt delegations +- Mutable naming via UCN (User Controlled Names) +- On-network state index via Pail (key-value store) +- Go implementation for Guppy client +- Compatibility with existing TypeScript implementations + +### Out of Scope + +- Metadata/path encryption (filenames and directory structure remain visible) +- Lit Protocol integration (KMS-only encryption) +- Cryptographic folder isolation (different encryption keys per folder) + +## Industry Context: Content-Only Encryption + +The proposed encryption model encrypts **file content only**, leaving metadata (paths, filenames, sizes) visible. This is the standard approach used by major cloud storage providers: + +| Provider | Content Encrypted | Paths/Names Encrypted | +|----------|-------------------|----------------------| +| AWS S3 (SSE-S3, SSE-KMS) | ✅ | ❌ | +| Azure Blob Storage | ✅ | ❌ | +| Google Cloud Storage | ✅ | ❌ | +| Dropbox | ✅ | ❌ | + +This model satisfies most enterprise "encryption at rest" requirements while maintaining the ability to list and traverse files without decryption. Full metadata encryption (as seen in zero-knowledge services like Proton Drive) is significantly more complex and breaks standard tooling compatibility. + +## Proposal + +### 1. Content Encryption + +#### 1.1 Encryption Model + +Guppy MUST implement envelope encryption: + +1. **Data Encryption Key (DEK) + IV**: A random 256-bit AES key and 128-bit IV generated per file +2. **Key Wrapping**: The combined DEK+IV is wrapped using RSA-OAEP with a space-specific public key from the KMS +3. **Metadata Block**: Wrapped key + KMS info stored as CBOR block alongside encrypted content + +``` +┌─────────────────────────────────────────────────────────────┐ +│ ENCRYPTION FLOW │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ 1. Source File + optional file metadata │ +│ │ │ +│ ▼ │ +│ 2. Generate DEK + IV │ +│ │ │ +│ ▼ │ +│ 3. Encrypt file content (AES-256-CTR) → encrypted stream │ +│ │ │ +│ ▼ │ +│ 4. KMS: Wrap DEK+IV with RSA-OAEP (KMS) → encryptedSymmetricKey │ +│ │ │ +│ ▼ │ +│ 5. Encode encrypted stream as UnixFS DAG → encryptedDataCID │ +│ │ │ +│ ▼ │ +│ 6. Create metadata block (CBOR): │ +│ - encryptedDataCID (from step 5) │ +│ - encryptedSymmetricKey (from step 4) │ +│ - space DID │ +│ - KMS provider info │ +│ │ │ +│ ▼ │ +│ 7. Upload CAR with encrypted content + metadata block │ +│ → returns root CID (metadata block CID) │ +│ │ +│ ─────────────── POST-UPLOAD (UCN + Pail) ─────────────── │ +│ │ +│ 8. Record in Pail: put(filePath, rootCID) │ +│ → returns new Pail root CID │ +│ │ │ +│ ▼ │ +│ 9. Publish to UCN: Name.publish(pailRootCID) │ +│ → mutable name now points to updated index │ +│ │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Steps 1-7**: Encryption and upload (per file) +**Steps 8-9**: Index update and publish (after upload completes) + +#### 1.2 KMS Integration + +Guppy interacts with the existing `ucan-kms` service via UCAN invocations: + +**Encryption (key wrapping):** +1. Invoke `space/encryption/setup` to get the space's RSA public key +2. Wrap DEK+IV locally using RSA-OAEP with the public key +3. Store wrapped key in metadata block + +**Decryption (key unwrapping):** +1. Create `space/content/decrypt` delegation with appropriate proofs +2. Invoke `space/encryption/key/decrypt` on ucan-kms service +3. Service validates delegation, unwraps DEK+IV, returns plaintext key +4. Decrypt content locally with unwrapped key + +| UCAN Capability | Purpose | +|-----------------|---------| +| `space/encryption/setup` | Get space's RSA public key for wrapping | +| `space/encryption/key/decrypt` | Unwrap DEK using space's private key | +| `space/content/decrypt` | Authorization proof for decryption | + +Guppy MUST be configured with: +- `UCAN_KMS_URL` — ucan-kms service endpoint +- `UCAN_KMS_DID` — ucan-kms service DID for UCAN audience + +#### 1.3 Streaming Support + +Encryption MUST support streaming for large files (1TB+) with O(1) memory usage. + +Go's standard library provides native support for this via: +- `crypto/aes` — AES block cipher +- `crypto/cipher` — CTR mode stream cipher +- `io.Reader` wrapper pattern — encrypt/decrypt chunks on-the-fly without buffering entire file + +#### 1.4 Metadata Format + +The metadata block MUST be compatible with `@storacha/encrypt-upload-client`: + +```typescript +interface KMSMetadata { + encryptedDataCID: CID // CID of encrypted content + encryptedSymmetricKey: string // Base64-encoded wrapped DEK+IV + space: SpaceDID // Space the content belongs to + path?: string // File path (e.g., "/backups/server1/backup.tar") + kms: { + provider: string // e.g., "storacha" + keyId: string // KMS key identifier + algorithm: string // e.g., "RSA-OAEP-256" + } +} +``` + +The `path` field is RECOMMENDED for all new uploads. It enables folder-level access control via `nb.prefix` delegations (see §1.6). Files without `path` are treated as root (`/`) for access control purposes. + +#### 1.5 Decryption Flow + +``` +┌─────────────────────────────────────────────────────────────┐ +│ DECRYPTION FLOW │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ ─────────────── RESOLVE (UCN + Pail) ─────────────────── │ +│ │ +│ 1. Resolve UCN Name → get current Pail root CID │ +│ │ │ +│ ▼ │ +│ 2. Query Pail for filePath → get content root CID │ +│ │ +│ ─────────────── DECRYPT ──────────────────────────────── │ +│ │ +│ │ │ +│ ▼ │ +│ 3. Fetch CAR from gateway using content CID │ +│ │ │ +│ ▼ │ +│ 4. Extract metadata block from CAR (CBOR) │ +│ - encryptedDataCID │ +│ - encryptedSymmetricKey │ +│ - KMS info │ +│ │ │ +│ ▼ │ +│ 5. Extract encrypted content from CAR using encryptedDataCID│ +│ │ │ +│ ▼ │ +│ 6. KMS: Unwrap DEK+IV via UCAN proof (space/content/decrypt)│ +│ │ │ +│ ▼ │ +│ 7. Split combined key → DEK (256-bit) + IV (128-bit) │ +│ │ │ +│ ▼ │ +│ 8. Decrypt content stream (AES-256-CTR) │ +│ │ │ +│ ▼ │ +│ 9. Extract embedded file metadata (optional) │ +│ → returns decrypted stream + file metadata │ +│ │ +└─────────────────────────────────────────────────────────────┘ +``` + +User provides **UCN name + file path** to retrieve content: +1. **Resolve** — UCN name → Pail root → file path → content CID +2. **Decrypt** — fetch, unwrap key, decrypt stream + +Decryption requires a `space/content/decrypt` delegation proof. + +#### 1.6 Folder-Level Access Control + +Access control is enforced via the `nb.prefix` caveat on `space/content/decrypt` delegations: + +```typescript +// Grant access to all files under /backups/server1/ +space/content/decrypt + with: did:key:zSpace + nb: { prefix: "/backups/server1/" } + audience: did:key:zRecipient +``` + +**Validation flow:** + +1. User requests decryption with delegation containing `nb.prefix` +2. KMS extracts `path` from encrypted metadata block +3. KMS validates: `path.startsWith(delegation.nb.prefix)` +4. If valid → unwrap DEK; if invalid → reject + +**Access control examples:** + +| Delegation `nb.prefix` | File `path` | Access | +|------------------------|-------------|--------| +| `/backups/` | `/backups/server1/backup.tar` | ✅ Allowed | +| `/backups/server1/` | `/backups/server1/backup.tar` | ✅ Allowed | +| `/backups/server2/` | `/backups/server1/backup.tar` | ❌ Denied | +| (none) | `/backups/server1/backup.tar` | ✅ Space-level access | + +**Backward compatibility:** + +- Delegations without `nb.prefix` grant space-level access (all files) +- Files uploaded without `path` field are treated as root (`/`) and accessible with any space-level delegation + +**Implementation requirements:** + +| Component | Change | +|-----------|--------| +| `@storacha/capabilities` | Add `prefix` field to `space/content/decrypt` schema | +| `ucan-kms` | Validate prefix in decrypt handler | +| `encrypt-upload-client` | Store `path` in encrypted metadata (already done) | +| `guppy` | CLI command to mint scoped delegations | + +### 2. Mutability (UCN) + +#### 2.1 Overview + +UCN (User Controlled Names) provides mutable references using: + +- **Merkle Clock**: CRDT for conflict-free multi-writer updates +- **UCAN Delegation**: Access control for read/write permissions +- **Network Sync**: Publish/resolve via clock service + +#### 2.2 Service Layer Approach + +Rather than porting the full UCN implementation to Go, Guppy SHOULD use a service layer: + +``` +┌─────────────────────────────────────────────────────────────┐ +│ UCN SERVICE LAYER │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ Guppy (Go) ──▶ UCN Service (TS) ──▶ Clock Service │ +│ │ │ │ +│ │ ▼ │ +│ │ Name.create() │ +│ │ Name.publish(cid) │ +│ │ Name.resolve() → cid │ +│ │ Name.grant(recipient) │ +│ │ │ +└─────────────────────────────────────────────────────────────┘ +``` + +#### 2.3 UCN Capabilities + +The following UCAN capabilities are used: + +| Capability | Description | +|------------|-------------| +| `clock/head` | Read current value of a Name | +| `clock/advance` | Publish new value to a Name | + +#### 2.4 Guppy Integration + +After upload completion, Guppy MUST: + +1. Obtain the root CID of the uploaded content +2. Invoke the UCN service to publish the new CID to the configured Name +3. Provide UCAN proofs authorizing the `clock/advance` capability + +This updates the mutable name to point to the latest upload, enabling clients to discover the current version without out-of-band CID coordination. + +#### 2.5 Name Resolution + +To retrieve the latest version, Guppy MUST: + +1. Invoke the UCN service to resolve the Name to its current CID +2. Provide UCAN proofs authorizing the `clock/head` capability +3. Fetch the content from the gateway using the resolved CID +4. Optionally decrypt if the content is encrypted + +### 3. On-Network State Index (Pail) + +#### 3.1 Overview + +Pail is a content-addressed key-value store implemented as a sharded prefix trie (Merkle DAG). It enables: + +- **Path → CID mapping**: Track which files have been uploaded and their CIDs +- **Cross-client sync**: Multiple Guppy instances can share the same index +- **Version history**: Every mutation produces a new root CID, preserving history + +#### 3.2 How Pail Works + +``` +┌─────────────────────────────────────────────────────────────┐ +│ PAIL STRUCTURE │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ Root CID (v3) │ +│ │ │ +│ ├── /backups/server1/2025-03-01.tar.gz → bafy...abc │ +│ ├── /backups/server1/2025-03-02.tar.gz → bafy...def │ +│ ├── /backups/server2/2025-03-01.tar.gz → bafy...ghi │ +│ └── ... │ +│ │ +│ Operations: put(key, cid), get(key), del(key), entries() │ +│ │ +│ Each mutation → new root CID (immutable history) │ +│ │ +└─────────────────────────────────────────────────────────────┘ +``` + +#### 3.3 Pail + UCN Integration + +Pail and UCN work together: + +1. **Pail** stores the key-value index (path → CID mappings) +2. **UCN** provides a mutable name pointing to the current Pail root + +``` +┌─────────────────────────────────────────────────────────────┐ +│ PAIL + UCN │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ UCN Name: "my-backup-index" │ +│ │ │ +│ ▼ │ +│ Current Pail Root: bafy...xyz │ +│ │ │ +│ ├── /server1/backup.tar → bafy...encrypted1 │ +│ ├── /server2/backup.tar → bafy...encrypted2 │ +│ └── ... │ +│ │ +│ Workflow: │ +│ 1. Resolve UCN → get current Pail root │ +│ 2. Query Pail for path → get content CID │ +│ 3. Fetch and decrypt content │ +│ │ +└─────────────────────────────────────────────────────────────┘ +``` + +#### 3.4 Service Layer Approach + +Like UCN, Pail SHOULD be accessed via a service layer rather than a full Go port. + +After each upload, Guppy MUST: + +1. Record the file path and content CID in the Pail index via the service API +2. Provide UCAN proofs authorizing the Pail mutation +3. Obtain the new Pail root CID from the service +4. Publish the new Pail root to the UCN Name (as described in §2.4) + +This ensures the on-network index stays synchronized with local upload state. + +## Security Considerations + +1. **Client-side encryption**: Plaintext MUST never leave the Guppy client +2. **Key management**: DEKs are wrapped with space-specific KEKs managed by KMS +3. **UCAN-gated decryption**: `space/content/decrypt` delegation required to unwrap keys +4. **No key reuse**: Each file MUST use a unique DEK +5. **Secure random**: Keys and IVs MUST be generated using cryptographically secure random + +## Compatibility + +- **Metadata format**: MUST be compatible with `@storacha/encrypt-upload-client` +- **UCN protocol**: MUST be compatible with `@storacha/ucn` +- **Pail format**: MUST be compatible with `@web3-storage/pail` +- **Existing uploads**: Unencrypted uploads continue to work unchanged + +## Future Work + +- **Metadata encryption**: Optional path/filename encryption for higher privacy requirements +- **Offline-first Pail**: Local Pail operations with sync when online + +## References + +- [Mutability & Privacy in Storacha — Strategy Document](https://www.notion.so/storacha/Mutability-Privacy-in-Storacha-Strategy-Document-3125305b5524807fb4a1ce6a3c9201e8) (internal) +- [Storacha UCN Package](https://github.com/storacha/upload-service/tree/main/packages/ucn) +- [Storacha Encrypt Upload Client](https://github.com/storacha/upload-service/tree/main/packages/encrypt-upload-client) +- [Storacha Pail Package](https://github.com/storacha/pail) +- [AWS S3 Server-Side Encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html) +- [UCAN Specification](https://github.com/ucan-wg/spec) From a3a48cdc35fec004880c306fffaf7064b6a5eeae Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Wed, 11 Mar 2026 14:16:25 -0300 Subject: [PATCH 02/10] rfc: forge encryption --- rfc/forge-encryption.md | 417 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 417 insertions(+) create mode 100644 rfc/forge-encryption.md diff --git a/rfc/forge-encryption.md b/rfc/forge-encryption.md new file mode 100644 index 0000000..cf0985e --- /dev/null +++ b/rfc/forge-encryption.md @@ -0,0 +1,417 @@ +# RFC: Encryption for Forge (Guppy/Piri) + +**Status: Draft** + +## Authors + +- [Felipe Forbeck](https://github.com/fforbeck), [Storacha Network](https://storacha.network/) + +## Editors +- [Alan Shaw](https://github.com/alanshaw), [Storacha Network](https://storacha.network/) +- [Hannah Howard](https://github.com/hannahhoward), [Storacha Network](https://storacha.network/) +- [Alex Kinstler](https://github.com/prodalex), [Storacha Network](https://storacha.network/) + +## Language + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Introduction + +This RFC proposes the implementation of content encryption for Forge enterprise customers using the Guppy client. This enables: + +1. **Encryption at Rest**: Client-side encryption of file content before upload +2. **UCAN-Gated Decryption**: Access control via delegations + +## Motivation + +Enterprise Forge customers need: + +- **Data protection**: Sensitive content must be encrypted before leaving the client +- **Key management**: Integration with Storacha KMS or customer-managed KMS +- **Access control**: Fine-grained control over who can decrypt content + +--- + +## Industry Context + +The proposed encryption model encrypts **file content only**, leaving metadata (paths, filenames, sizes) visible. This is the standard approach used by major cloud storage providers: + +| Provider | Content Encrypted | Paths/Names Encrypted | +|----------|-------------------|----------------------| +| AWS S3 (SSE-S3, SSE-KMS) | ✅ | ❌ | +| Azure Blob Storage | ✅ | ❌ | +| Google Cloud Storage | ✅ | ❌ | + +## Implementation Status + +Based on Alan's POC ([storacha/guppy#376](https://github.com/storacha/guppy/pull/376)). + +| Component | Status | Notes | +|-----------|--------|-------| +| AES-256-CTR encryption | ✅ Done | `pkg/encryption/aes_256_ctr.go` | +| Random IV per block | ✅ Done | Generated in `EncryptAES256CTR()` | +| IV stored in metadata | ✅ Done | `pkg/preparation/dags/nodemeta/aes_256_ctr.go` | +| Block-level splitter | ✅ Done | `pkg/preparation/dags/aes_256_ctr_splitter.go` | +| Decryption | ✅ Done | `DecryptAES256CTR()` function | +| KMS integration | ❌ Pending | `space/encryption/setup`, key unwrap | +| Metadata format | ❌ Pending | `EncryptedMetadata` CBOR block | +| UCAN delegation | ❌ Pending | `space/content/decrypt` handling | +| Folder access control | ❌ Pending | `nb.prefix` validation | + +## Encryption Approach: Block-Level + +### How It Works + +Block-level encryption combines file-scoped DEKs with block-scoped IVs: + +- **1 DEK per file**: Each file gets a unique 256-bit AES key, generated randomly +- **Random IV per block**: Each block within the file gets a unique 16-byte IV +- **AES-256-CTR**: Counter mode encryption with unique keystream per block +- **Incremental uploads**: Only changed blocks need re-encryption + +### DEK Lifecycle (Per File) + +``` +For each file to upload: + +1. Generate random DEK (256-bit AES key) + │ + ▼ +2. For each block in file: + - Generate random IV (16 bytes) + - Encrypt block with DEK + IV + - Store IV in block metadata + │ + ▼ +3. Wrap DEK with space's public key (RSA-OAEP) + │ + ▼ +4. Store wrapped DEK in file metadata block + │ + ▼ +5. Upload encrypted blocks + metadata +``` + +**Key insight:** The DEK is the same for all blocks within a file, but each block has a unique IV. This allows: +- Incremental uploads (only changed blocks re-encrypted with same DEK) +- File-level access control (revoke access to specific files) +- TS client compatibility (same pattern as `@storacha/encrypt-upload-client`) + +### Security: IV Requirements + +**MUST** +- Generate a new random IV (16 bytes) for every block +- Never reuse an IV with the same DEK +- Use cryptographically secure random number generator for IV generation + +These requirements apply to all encryption operations, including initial uploads and incremental re-uploads. + +### Why Block-Level (vs File-Level) + +| Approach | Incremental Uploads | KMS Calls | Metadata | +|----------|---------------------|-----------|----------| +| **File-level** | ❌ Re-upload entire file | 1 per file | 1 per file | +| **Block-level** | ✅ Only changed blocks | 1 per session | IV per block | + +Block-level enables Guppy's existing incremental upload capability to work with encrypted content. + + +## Metadata Format + +Encrypted content MUST include a metadata block compatible with `@storacha/encrypt-upload-client`: + +```typescript +interface EncryptedMetadata { + encryptedDataCID: CID // CID of encrypted content + encryptedSymmetricKey: string // Base64-encoded wrapped DEK + space: SpaceDID // Space the content belongs to + path?: string // File path (e.g., "/backups/server1/backup.tar") + kms: { + provider: string // e.g., "storacha" + keyId: string // KMS key identifier + algorithm: string // e.g., "RSA-OAEP-256" + } +} +``` + +The `path` field is RECOMMENDED for all new uploads. It enables folder-level access control via `nb.prefix` delegations (see Folder-Level Access Control section). + +## Streaming Support + +Encryption MUST support streaming for large files (1TB+) with O(1) memory usage. + +Go's standard library provides native support via: +- `crypto/aes`: AES block cipher +- `crypto/cipher`: CTR mode stream cipher +- `io.Reader` wrapper pattern: encrypt/decrypt chunks on-the-fly without buffering entire file + +## Encryption Flow + +``` +1. guppy upload + │ + ▼ +2. Get space public key from KMS (space/encryption/setup) + → 1 KMS call per upload session + │ + ▼ +3. For each file: + a. Generate random DEK (256-bit AES key) + b. For each file chunk: + - Generate random IV (16 bytes) + - Encrypt chunk with DEK + IV (AES-256-CTR) + - Store IV in block metadata + c. Build UnixFS DAG from encrypted blocks + d. Wrap DEK with space public key (RSA-OAEP) + e. Create metadata block with wrapped DEK + file path + │ + ▼ +4. Upload encrypted blocks + metadata (blob/add) +``` + +**POC Status** +- Step 2 (KMS): ❌ Pending +- Step 3a (DEK generation): ❌ Pending (POC uses manually provided key) +- Step 3b (IV + encryption): ✅ Done +- Step 3c (UnixFS DAG): ✅ Done +- Step 3d (DEK wrapping): ❌ Pending +- Step 3e (metadata block): ❌ Pending +- Step 4 (upload): ✅ Existing Guppy functionality + +## Decryption Flow + +Two decryption modes are supported: + +### Option A: Gateway-Side Decryption (Local Key Mode) + +For local/development use, the gateway can decrypt content on-the-fly: + +```bash +guppy gateway serve --decryption-key /path/to/key.bin +``` + +``` +1. Client requests: http://localhost:3000/ipfs/ + │ + ▼ +2. Gateway fetches encrypted blocks from network + │ + ▼ +3. Gateway decrypts each block using provided key + - Read IV from block (first 16 bytes) + - Decrypt with key + IV (AES-256-CTR) + │ + ▼ +4. Gateway serves decrypted content to client +``` + +**POC Status:** ✅ Done (`--decryption-key` flag implemented) + +### Option B: Client-Side Decryption (KMS Mode) + +For production with access control, decryption happens client-side: + +``` +1. Fetch encrypted content via gateway + - Public: https://w3s.link/ipfs/ + - Local: guppy gateway serve → http://localhost:3000/ipfs/ + │ + ▼ +2. Extract file metadata block + │ + ▼ +3. Extract wrapped DEK from metadata + │ + ▼ +4. Unwrap DEK via KMS (space/encryption/key/decrypt) + → Provide UCAN proof with space/content/decrypt delegation + → KMS validates nb.prefix against file path + │ + ▼ +5. For each encrypted block: + - Read IV from block metadata + - Decrypt with unwrapped DEK + IV (AES-256-CTR) + │ + ▼ +6. Reassemble file from decrypted blocks +``` + +**POC Status:** +- Step 1 (gateway fetch): ✅ Existing infrastructure +- Step 2-4 (metadata + KMS): ❌ Pending +- Step 5 (IV + decryption): ✅ Done +- Step 6 (reassemble): ✅ Standard UnixFS + +## Folder-Level Access Control + +Access control is enforced via the `nb.prefix` caveat on `space/content/decrypt` delegations: + +```typescript +// Grant access to all files under /backups/server1/ +space/content/decrypt + with: did:key:zSpace + nb: { prefix: "/backups/server1/" } + audience: did:key:zRecipient +``` + +**Validation flow** + +1. User requests decryption with delegation containing `nb.prefix` +2. KMS extracts `path` from encrypted metadata block +3. KMS validates: `path.startsWith(delegation.nb.prefix)` +4. If valid → unwrap DEK; if invalid → reject + +**Access control examples** + +| Delegation `nb.prefix` | File `path` | Access | +|------------------------|-------------|--------| +| `/backups/` | `/backups/server1/backup.tar` | ✅ Allowed | +| `/backups/server1/` | `/backups/server1/backup.tar` | ✅ Allowed | +| `/backups/server2/` | `/backups/server1/backup.tar` | ❌ Denied | +| (none) | `/backups/server1/backup.tar` | ✅ Space-level access | + +**Backward compatibility** + +- Delegations without `nb.prefix` grant space-level access (all files) +- Files uploaded without `path` field are treated as root (`/`) and accessible with any space-level delegation + +## Capabilities Needed + +| Capability | Status | Notes | +|------------|--------|-------| +| `space/encryption/setup` | Exists | Get space public key for wrapping | +| `space/encryption/key/decrypt` | Exists | Unwrap space DEK | +| `space/content/decrypt` | Exists | Authorization proof for decryption | + +## Key Management + +Guppy SHOULD support two key management modes. + +### Local Key Mode (Development/Testing) + +For development and testing, Guppy MAY use a locally-provided key: + +```yaml +# ~/.storacha/guppy/config.yaml +encryption: + enabled: true + mode: local + key: "base64-encoded-256-bit-key" # Or path to key file +``` + +This mode does NOT provide access control — anyone with the key can decrypt. + +### KMS Mode (Production/Enterprise) + +For production, Guppy SHOULD use the Storacha KMS: + +```yaml +encryption: + enabled: true + mode: kms + kms_url: "https://kms.storacha.network" + kms_did: "did:web:kms.storacha.network" +``` + +| Config | Description | +|--------|-------------| +| `UCAN_KMS_URL` | ucan-kms service endpoint | +| `UCAN_KMS_DID` | ucan-kms service DID for UCAN audience | + +KMS mode enables: +- UCAN-gated decryption +- Folder-level access control via `nb.prefix` +- Key rotation (see below) + +## Key Rotation + +Key rotation requires the **Mutability** feature (see [forge-mutability.md](./forge-mutability.md)) because: +- Rotation creates new metadata CIDs (wrapped DEK changes) +- The catalog/UCN must be updated to point to the new metadata CID +- Without mutability, clients cannot discover the rotated metadata + +Guppy SHOULD support two types of key rotation via CLI commands (KMS mode only): + +### KEK Rotation (Space Key) + +Rotates the space-level Key Encryption Key without re-encrypting content: + +```bash +guppy encryption rotate-kek --space +``` + +**Process:** +1. Generate new KEK in KMS +2. For each encrypted file in space: + - Unwrap DEK with old KEK + - Re-wrap DEK with new KEK + - Create new metadata block (new CID) + - Update catalog entry to point to new metadata CID +3. Content blocks remain unchanged + +**Use case:** Regular security hygiene, suspected KEK compromise + +### DEK Rotation (File Key) + +Rotates a file's Data Encryption Key by re-encrypting the file: + +```bash +guppy encryption rotate-dek --file +``` + +**Process:** +1. Download and decrypt file with old DEK +2. Generate new DEK +3. Re-encrypt all blocks with new DEK + new IVs +4. Create new metadata block with new wrapped DEK +5. Upload encrypted blocks + new metadata +6. Update catalog entry to point to new root CID + +**Use case:** Suspected file-level compromise, compliance requirements + +### Local Key Mode + +Key rotation is NOT supported in local key mode: +- No catalog/UCN to track metadata CID changes +- Users managing their own keys are responsible for rotation + +## Revocation + +Access revocation is handled by the `ucan-kms` service: + +1. User revokes a `space/content/decrypt` delegation by CID +2. KMS checks revocation status on every decrypt request +3. Revoked delegations are rejected — DEK is not unwrapped + +**Guppy CLI:** + +```bash +guppy delegation revoke +``` + +**Note:** Revocation only prevents future decryption. If a user already has the DEK in memory, they can still decrypt. For full revocation, combine with DEK rotation. + +## Implementation Requirements + +| Component | Change | +|-----------|--------| +| `@storacha/capabilities` | Add `prefix` field to `space/content/decrypt` schema | +| `ucan-kms` | Validate prefix in decrypt handler | +| `guppy` | CLI command to mint scoped delegations | + + +## Security Considerations + +1. **Client-side encryption**: Plaintext MUST never leave the Guppy client +2. **Key management**: DEK is wrapped with space-specific KEK managed by KMS +3. **UCAN-gated decryption**: `space/encryption/key/decrypt` delegation required +4. **Unique IV per block**: Each block MUST have a unique random IV +5. **Secure random**: IVs MUST be generated using cryptographically secure random +6. **No IV reuse**: Same DEK with same IV = keystream reuse attack + + +## References + +- [Block Encryption POC](https://github.com/storacha/guppy/pull/376) +- [Storacha Encrypt Upload Client](https://github.com/storacha/upload-service/tree/main/packages/encrypt-upload-client) +- [AWS S3 Client-Side Encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html) From 8cdaec790f43104b5e6e37686e3674ec6a60ea70 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Wed, 11 Mar 2026 14:16:40 -0300 Subject: [PATCH 03/10] rfc: forge mutability --- rfc/forge-mutability.md | 169 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100644 rfc/forge-mutability.md diff --git a/rfc/forge-mutability.md b/rfc/forge-mutability.md new file mode 100644 index 0000000..ff266cf --- /dev/null +++ b/rfc/forge-mutability.md @@ -0,0 +1,169 @@ +# RFC: Mutability for Forge (Guppy/Piri) + +**Status: Draft — Pending Alignment** + +## Authors + +- [Felipe Forbeck](https://github.com/fforbeck), [Storacha Network](https://storacha.network/) + +## Editors +- [Alan Shaw](https://github.com/alanshaw), [Storacha Network](https://storacha.network/) +- [Hannah Howard](https://github.com/hannahhoward), [Storacha Network](https://storacha.network/) +- [Alex Kinstler](https://github.com/prodalex), [Storacha Network](https://storacha.network/) + + +## Language + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Introduction + +This RFC proposes the implementation of mutability features for Forge enterprise customers using the Guppy client. These features enable: + +1. **Mutable References**: Stable pointers that update to the latest version of uploaded content +2. **Content Catalog**: Track uploaded content (path → CID mappings) across clients + +**Related:** [forge-encryption.md](./forge-encryption.md) — Encryption key rotation depends on mutability to track metadata CID changes. + +## Motivation + +Enterprise Forge customers need: + +- **Mutable references**: Backup workflows need stable names that update to point to the latest backup version +- **Cross-client sync**: Multiple Guppy instances should be able to resolve the same reference +- **On-network state**: Track what has been uploaded without relying on local-only databases + +## Approaches Under Consideration + +### Option A: Simple Catalog (Alex's Proposal) + +**How it works** + +- **Namespace:** Use the Space DID directly, no new naming system needed +- **Catalog:** A simple CBOR file with sorted entries mapping `path → CID`. Chunked for large spaces. +- **Mutability:** Use `clock/head` / `clock/advance` to point to current catalog CID (needs Go impl) +- **Multi-writer:** Optimistic retry: if conflict, re-read catalog and retry + +**What to build** +- Catalog format (CBOR, sorted entries, chunked for large spaces) +- `guppy upload` builds/updates catalog after uploading files +- `guppy ls` resolves catalog, lists entries +- `guppy gateway ` resolves catalog, finds entry, fetches content + +**What NOT to build** +- Pail +- UCN (not needed, space DID is the namespace) +- Merkle clock CRDT merge (not needed for CLI tool) +- Go port of any TS package (build Forge-native) + +**Catalog Format** + +The catalog is a CBOR-encoded block with sorted entries: + +```typescript +interface Catalog { + version: 1 + entries: CatalogEntry[] +} + +interface CatalogEntry { + path: string // File path (e.g., "/backups/server1/backup.tar") + root: CID // Entry point CID: + // - Encrypted files: metadata block CID (contains wrapped DEK + link to encrypted content) + // - Plaintext files: content root CID + size: number // File size in bytes + encrypted?: boolean // True if content is encrypted + updated: number // Unix timestamp of last update +} +``` + +**Chunking for large catalogs** +- If catalog exceeds 1MB, split into chunks +- Root catalog block contains links to chunk blocks +- Each chunk contains a sorted subset of entries + +```typescript +interface ChunkedCatalog { + version: 1 + chunks: CID[] // Links to CatalogChunk blocks + totalEntries: number +} + +interface CatalogChunk { + entries: CatalogEntry[] + startPath: string // First path in this chunk (for binary search) + endPath: string // Last path in this chunk +} +``` + +**Capabilities needed** + +| Capability | Status | Notes | +|------------|--------|-------| +| `clock/head` | Needs Go impl | Read catalog pointer (exists in TS, not in go-libstoracha) | +| `clock/advance` | Needs Go impl | Update catalog pointer (exists in TS, not in go-libstoracha) | +| `blob/add` | Exists | Upload catalog + content | + +**Trade-offs** + +| Aspect | Option A (Simple Catalog) | Option B (CRDT) | +|--------|---------------------------|-----------------| +| **Concurrency** | Last write wins | Automatic merge | +| **Use case** | Single writer (CLI) | Multi-writer (teams) | +| **Complexity** | Low (CBOR list) | High (Pail + CRDT) | +| **Implementation** | Native Go | Port TS libraries | +| **Catalog size** | Works for 100k+ files | Optimized for millions | +| **Conflict resolution** | Manual (user re-uploads) | Automatic (CRDT merge) | + +**Recommendation:** Start with Option A for Guppy CLI (single-user tool). Option B becomes valuable when enabling team collaboration with concurrent uploads from multiple clients. + +### Option B: Go Ports (Hannah's Proposal) + +**How it works** + +- **Namespace:** UCN Names - ed25519 keypairs that can be delegated and shared +- **State Index:** Pail - sharded Merkle trie for `path → CID` mappings +- **Concurrency:** Merkle clock CRDT enables automatic merge of concurrent writes + +**What to build** +- UCN Go port (Name creation, publish, resolve, grant) +- Pail Go port (put, get, del, entries, diff, merge) + +**What NOT to build** +- Service layer (all CID generation must happen on client) + +**Capabilities needed** + +| Capability | Status | Notes | +|------------|--------|-------| +| `clock/head` | Needs Go impl | Read current value of a Name (exists in TS) | +| `clock/advance` | Needs Go impl | Publish new value to a Name (exists in TS) | +| `blob/add` | Exists | Upload content | + +## Comparison + +| Aspect | Option A | Option B | +|--------|-----------------|-------------------| +| **Namespace** | Space DID (already exists) | UCN Name (new keypair per name) | +| **State Index** | CBOR catalog file | Pail trie (content-addressed KV) | +| **Mutability** | `clock/head` + `clock/advance` | UCN + Merkle clock | +| **Multi-writer** | Last-writer-wins + retry | CRDT merge (concurrent edits merge) | +| **TS ecosystem compat** | No | Yes | +| **Go ports needed** | None | UCN + Pail | + +## Key Questions for Alignment + +1. **Do Forge customers need fine-grained multi-writer?** + - If yes (real-time collaboration), then Option B + - If no (backup/archival, single writer), then Option A + +2. **Do we need TypeScript client compatibility?** + - If yes (Console, w3up-client interop), then Option B + - If no (Forge is standalone), then Option A + +## References + +- [Mutability & Privacy in Storacha — Strategy Document](https://www.notion.so/storacha/Mutability-Privacy-in-Storacha-Strategy-Document-3125305b5524807fb4a1ce6a3c9201e8) (internal) +- [Storacha UCN Package](https://github.com/storacha/upload-service/tree/main/packages/ucn) +- [Storacha Pail Package](https://github.com/storacha/pail) +- [UCAN Specification](https://github.com/ucan-wg/spec) From 4d94f49ae034ee998893681261ab89d262376a58 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Mon, 16 Mar 2026 13:49:08 -0300 Subject: [PATCH 04/10] minor updates in the forge-encryption rfc --- rfc/forge-encryption.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/rfc/forge-encryption.md b/rfc/forge-encryption.md index cf0985e..1e38f0d 100644 --- a/rfc/forge-encryption.md +++ b/rfc/forge-encryption.md @@ -389,13 +389,14 @@ Access revocation is handled by the `ucan-kms` service: guppy delegation revoke ``` -**Note:** Revocation only prevents future decryption. If a user already has the DEK in memory, they can still decrypt. For full revocation, combine with DEK rotation. +**Note:** Revocation only prevents future decryption. If a user already has the DEK in memory, they can still decrypt. For full revocation, combine with DEK rotation. *(This behavior should be documented in user-facing docs)* ## Implementation Requirements | Component | Change | |-----------|--------| -| `@storacha/capabilities` | Add `prefix` field to `space/content/decrypt` schema | +| `@storacha/capabilities` (TS) | Add `prefix` field to `space/content/decrypt` schema | +| `go-libstoracha` (Go) | Port `prefix` field to Go capabilities | | `ucan-kms` | Validate prefix in decrypt handler | | `guppy` | CLI command to mint scoped delegations | From 90f11be5cfb1a4f50b4c9a566d13adfedee6e9d2 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Tue, 17 Mar 2026 13:07:58 -0300 Subject: [PATCH 05/10] rfc: update mutability rfc --- rfc/forge-mutability.md | 281 +++++++++++++++++++++++----------------- 1 file changed, 165 insertions(+), 116 deletions(-) diff --git a/rfc/forge-mutability.md b/rfc/forge-mutability.md index ff266cf..2e9e257 100644 --- a/rfc/forge-mutability.md +++ b/rfc/forge-mutability.md @@ -1,6 +1,6 @@ # RFC: Mutability for Forge (Guppy/Piri) -**Status: Draft — Pending Alignment** +**Status: Draft - Pending Alignment** ## Authors @@ -21,9 +21,9 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S This RFC proposes the implementation of mutability features for Forge enterprise customers using the Guppy client. These features enable: 1. **Mutable References**: Stable pointers that update to the latest version of uploaded content -2. **Content Catalog**: Track uploaded content (path → CID mappings) across clients +2. **Content Catalog**: Track uploaded content (path to CID mappings) across clients -**Related:** [forge-encryption.md](./forge-encryption.md) — Encryption key rotation depends on mutability to track metadata CID changes. +**Related:** [forge-encryption.md](./forge-encryption.md) - Encryption key rotation depends on mutability to track metadata CID changes. ## Motivation @@ -33,133 +33,182 @@ Enterprise Forge customers need: - **Cross-client sync**: Multiple Guppy instances should be able to resolve the same reference - **On-network state**: Track what has been uploaded without relying on local-only databases -## Approaches Under Consideration +## Out of Scope -### Option A: Simple Catalog (Alex's Proposal) +- **All-file-paths indexing**: Storing every individual file path in Pail (e.g., `/backups/mydir/file1.txt`, `/backups/mydir/subdir/file2.txt`). This RFC only tracks the source root path to root CID mapping. Internal file structure remains in UnixFS. -**How it works** +## Approach: UCN + Pail -- **Namespace:** Use the Space DID directly, no new naming system needed -- **Catalog:** A simple CBOR file with sorted entries mapping `path → CID`. Chunked for large spaces. -- **Mutability:** Use `clock/head` / `clock/advance` to point to current catalog CID (needs Go impl) -- **Multi-writer:** Optimistic retry: if conflict, re-read catalog and retry +Forge will use **UCN** + **Pail** for mutable content tracking: -**What to build** -- Catalog format (CBOR, sorted entries, chunked for large spaces) -- `guppy upload` builds/updates catalog after uploading files -- `guppy ls` resolves catalog, lists entries -- `guppy gateway ` resolves catalog, finds entry, fetches content +- **UCN**: Lightweight wrapper around `clock/head` and `clock/advance` for publishing/resolving names +- **Pail**: Sharded Merkle trie for `path -> CID` mappings (scales to millions of files) +- **Clock service**: Already exists at `clock.web3.storage` -**What NOT to build** -- Pail -- UCN (not needed, space DID is the namespace) -- Merkle clock CRDT merge (not needed for CLI tool) -- Go port of any TS package (build Forge-native) +### Why Pail? -**Catalog Format** +Forge directories can contain thousands to millions of files. Pail's sharded structure means only changed shards are uploaded when a file is added or updated. -The catalog is a CBOR-encoded block with sorted entries: +**Note:** Guppy continues to upload content as UnixFS (for traditional retrieval patterns). Pail stores the mapping from source path to the UnixFS root CID, enabling path-based lookups without changing the upload format. -```typescript -interface Catalog { - version: 1 - entries: CatalogEntry[] -} +### Architecture -interface CatalogEntry { - path: string // File path (e.g., "/backups/server1/backup.tar") - root: CID // Entry point CID: - // - Encrypted files: metadata block CID (contains wrapped DEK + link to encrypted content) - // - Plaintext files: content root CID - size: number // File size in bytes - encrypted?: boolean // True if content is encrypted - updated: number // Unix timestamp of last update -} +```mermaid +graph LR + subgraph Guppy + A[Upload file] --> B[Update Pail] + B --> C[Publish via UCN] + end + + C -->|clock/advance| D[Clock Service] + B -->|store blocks| E[Storage Network] + + subgraph Resolve + F[name.Resolve] -->|clock/head| D + F --> G[crdt.Root] + G --> H[pail.Get] + end ``` -**Chunking for large catalogs** -- If catalog exceeds 1MB, split into chunks -- Root catalog block contains links to chunk blocks -- Each chunk contains a sorted subset of entries - -```typescript -interface ChunkedCatalog { - version: 1 - chunks: CID[] // Links to CatalogChunk blocks - totalEntries: number -} - -interface CatalogChunk { - entries: CatalogEntry[] - startPath: string // First path in this chunk (for binary search) - endPath: string // Last path in this chunk -} +### What to Build + +| Component | Status | +|-----------|--------| +| UCN wrapper (`clock/head`, `clock/advance`) | Needs Go impl (~few hundred lines) | +| Pail integration | Library exists: `github.com/storacha/go-pail` | +| Guppy commands | Update `upload`, `ls`, `retrieve` | + +## Integration with Guppy Flows + +### Current Guppy Upload Pipeline + +Guppy's `ExecuteUpload` runs a pipeline of workers: + +1. **Scan Worker** - Walks filesystem, creates FSEntry records +2. **DAG Scan Worker** - Creates DAG nodes from files, sets rootCID +3. **Sharding Worker** - Packs nodes into CAR shards +4. **Indexing Worker** - Creates indexes for shards +5. **Shard Upload Worker** - Uploads shards via `space/blob/add` +6. **Index Upload Worker** - Uploads indexes via `space/blob/add` +7. **Post-Process Workers** - Finalizes shards/indexes, calls `upload/add` + +Returns: `rootCID` (the content root) + +### Upload with Mutability (Proposed) + +```mermaid +sequenceDiagram + participant User + participant Guppy + participant Workers + participant Storacha + participant Pail + participant UCN + + User->>Guppy: guppy upload + Guppy->>Workers: ExecuteUpload(uploadID, spaceDID) + Workers->>Workers: Scan FS entries + Workers->>Workers: Create DAG nodes + Workers->>Workers: Pack into CAR shards + Workers->>Storacha: space/blob/add (shards) + Workers->>Storacha: space/blob/add (indexes) + Workers->>Storacha: upload/add + Workers-->>Guppy: rootCID + + Note over Guppy,UCN: NEW: Mutability integration + Guppy->>Pail: crdt.Put(sourcePath, rootCID) + Pail-->>Guppy: eventCID + Guppy->>UCN: name.Publish(eventCID) + UCN-->>Guppy: OK + Guppy-->>User: Uploaded (rootCID) ``` -**Capabilities needed** - -| Capability | Status | Notes | -|------------|--------|-------| -| `clock/head` | Needs Go impl | Read catalog pointer (exists in TS, not in go-libstoracha) | -| `clock/advance` | Needs Go impl | Update catalog pointer (exists in TS, not in go-libstoracha) | -| `blob/add` | Exists | Upload catalog + content | - -**Trade-offs** - -| Aspect | Option A (Simple Catalog) | Option B (CRDT) | -|--------|---------------------------|-----------------| -| **Concurrency** | Last write wins | Automatic merge | -| **Use case** | Single writer (CLI) | Multi-writer (teams) | -| **Complexity** | Low (CBOR list) | High (Pail + CRDT) | -| **Implementation** | Native Go | Port TS libraries | -| **Catalog size** | Works for 100k+ files | Optimized for millions | -| **Conflict resolution** | Manual (user re-uploads) | Automatic (CRDT merge) | - -**Recommendation:** Start with Option A for Guppy CLI (single-user tool). Option B becomes valuable when enabling team collaboration with concurrent uploads from multiple clients. - -### Option B: Go Ports (Hannah's Proposal) - -**How it works** - -- **Namespace:** UCN Names - ed25519 keypairs that can be delegated and shared -- **State Index:** Pail - sharded Merkle trie for `path → CID` mappings -- **Concurrency:** Merkle clock CRDT enables automatic merge of concurrent writes - -**What to build** -- UCN Go port (Name creation, publish, resolve, grant) -- Pail Go port (put, get, del, entries, diff, merge) - -**What NOT to build** -- Service layer (all CID generation must happen on client) - -**Capabilities needed** - -| Capability | Status | Notes | -|------------|--------|-------| -| `clock/head` | Needs Go impl | Read current value of a Name (exists in TS) | -| `clock/advance` | Needs Go impl | Publish new value to a Name (exists in TS) | -| `blob/add` | Exists | Upload content | - -## Comparison - -| Aspect | Option A | Option B | -|--------|-----------------|-------------------| -| **Namespace** | Space DID (already exists) | UCN Name (new keypair per name) | -| **State Index** | CBOR catalog file | Pail trie (content-addressed KV) | -| **Mutability** | `clock/head` + `clock/advance` | UCN + Merkle clock | -| **Multi-writer** | Last-writer-wins + retry | CRDT merge (concurrent edits merge) | -| **TS ecosystem compat** | No | Yes | -| **Go ports needed** | None | UCN + Pail | - -## Key Questions for Alignment - -1. **Do Forge customers need fine-grained multi-writer?** - - If yes (real-time collaboration), then Option B - - If no (backup/archival, single writer), then Option A +### Encrypted Upload with Mutability (Proposed) + +```mermaid +sequenceDiagram + participant User + participant Guppy + participant Workers + participant Storacha + participant KMS + participant Pail + participant UCN + + User->>Guppy: guppy upload --encrypt + Guppy->>Guppy: Generate DEK + Guppy->>Workers: ExecuteUpload with encryption + Workers->>Workers: Encrypt blocks with DEK + Workers->>Storacha: space/blob/add (encrypted shards) + Workers->>Storacha: space/blob/add (indexes) + Workers->>Storacha: upload/add + Workers-->>Guppy: encryptedRootCID + + Guppy->>KMS: Wrap DEK with KEK + KMS-->>Guppy: wrappedDEK + Guppy->>Guppy: Create metadata block (wrappedDEK + encryptedRootCID) + Guppy->>Storacha: space/blob/add (metadata block) + Storacha-->>Guppy: metadataCID + + Note over Guppy,UCN: Mutability stores metadataCID + Guppy->>Pail: crdt.Put(sourcePath, metadataCID) + Pail-->>Guppy: eventCID + Guppy->>UCN: name.Publish(eventCID) + UCN-->>Guppy: OK + Guppy-->>User: Uploaded (encrypted) +``` -2. **Do we need TypeScript client compatibility?** - - If yes (Console, w3up-client interop), then Option B - - If no (Forge is standalone), then Option A +### Retrieve with Mutability - Unified Flow (Proposed) + +```mermaid +sequenceDiagram + participant User + participant Guppy + participant UCN + participant Pail + participant Locator + participant Storage + participant KMS + + User->>Guppy: guppy retrieve + + alt Path is CID (e.g. bafy...) + Guppy->>Guppy: Use CID directly as rootCID + else Path is file path (e.g. /backups/db.tar) + Note over Guppy,Pail: NEW: Resolve path via UCN + Pail + Guppy->>UCN: name.Resolve(spaceDID) + UCN-->>Guppy: head events + Guppy->>Storage: Fetch Pail blocks for head + Storage-->>Guppy: Pail blocks + Guppy->>Pail: crdt.Root(head, blocks) + Pail-->>Guppy: pailRoot + Guppy->>Pail: pail.Get(pailRoot, path) + Pail-->>Guppy: rootCID + end + + Guppy->>Locator: Query indexer for rootCID + Locator-->>Guppy: provider locations + Guppy->>Storage: Fetch root block + Storage-->>Guppy: block data + + alt Block is EncryptedMetadata format + Note over Guppy,KMS: Encrypted content detected + Guppy->>Guppy: Extract wrappedDEK + encryptedRootCID + Guppy->>KMS: Unwrap DEK with KEK + KMS-->>Guppy: DEK + Guppy->>Locator: Query indexer for encryptedRootCID + Locator-->>Guppy: provider locations + Guppy->>Storage: Fetch encrypted blocks + Storage-->>Guppy: encrypted blocks + Guppy->>Guppy: Decrypt with DEK + else Block is plaintext UnixFS + Note over Guppy,Storage: Plaintext content + Guppy->>Storage: Fetch remaining blocks + Storage-->>Guppy: file blocks + end + + Guppy-->>User: Write file to output +``` ## References From 4ea02170ead9a33dadf578fff2413a8da1ce70d9 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Wed, 18 Mar 2026 11:17:16 -0300 Subject: [PATCH 06/10] rfc: update privacy rfc --- rfc/forge-encryption.md | 127 ++++++++++++++++++++++++++++------------ 1 file changed, 89 insertions(+), 38 deletions(-) diff --git a/rfc/forge-encryption.md b/rfc/forge-encryption.md index 1e38f0d..c0ad9ad 100644 --- a/rfc/forge-encryption.md +++ b/rfc/forge-encryption.md @@ -22,6 +22,8 @@ This RFC proposes the implementation of content encryption for Forge enterprise 1. **Encryption at Rest**: Client-side encryption of file content before upload 2. **UCAN-Gated Decryption**: Access control via delegations +**Scope:** File-level encryption only. Directory/folder encryption is not supported (each file is encrypted individually with its own DEK). + ## Motivation Enterprise Forge customers need: @@ -58,6 +60,27 @@ Based on Alan's POC ([storacha/guppy#376](https://github.com/storacha/guppy/pull | UCAN delegation | ❌ Pending | `space/content/decrypt` handling | | Folder access control | ❌ Pending | `nb.prefix` validation | +## Terminology + +| Term | Description | +|------|-------------| +| **DEK** | Data Encryption Key 256-bit AES key used to encrypt file content | +| **KEK** | Key Encryption Key space-level RSA key managed by KMS, used to wrap DEKs | +| **IV** | Initialization Vector 16-byte random value, unique per content block | +| **Content block** | A chunk of encrypted file data (e.g., 1MB), contains IV inline | +| **Encrypted metadata block** | CBOR block containing wrapped DEK, path, KMS info. One per file, its CID is the "metadataCID" | +| **Wrapping** | Encrypting a DEK with the space's KEK (RSA-OAEP) | + +## Encryption vs Access Control Granularity + +| Aspect | Granularity | Description | +|--------|-------------|-------------| +| **Encryption (DEK)** | Per file | Each file has its own DEK | +| **Key wrapping (KEK)** | Per space | All DEKs in a space are wrapped with the same KEK | +| **Access control** | Per file OR per folder | CID-based (`nb.resource`) or path-based (`nb.path`) | + +**Key insight:** Even though encryption is file-level, access control can be folder-level. A delegation with `nb.path: "/backups/"` grants access to decrypt **all files** whose path starts with `/backups/`, each with their own DEK. + ## Encryption Approach: Block-Level ### How It Works @@ -84,9 +107,10 @@ For each file to upload: │ ▼ 3. Wrap DEK with space's public key (RSA-OAEP) + - "Wrapping" = encrypting the DEK with the space's KEK (Key Encryption Key) │ ▼ -4. Store wrapped DEK in file metadata block +4. Store wrapped DEK in encrypted metadata block │ ▼ 5. Upload encrypted blocks + metadata @@ -106,26 +130,30 @@ For each file to upload: These requirements apply to all encryption operations, including initial uploads and incremental re-uploads. +**Storage overhead:** Each block stores a 16-byte IV. For 1MB blocks, this is ~0.0015% overhead, negligible in practice. + ### Why Block-Level (vs File-Level) -| Approach | Incremental Uploads | KMS Calls | Metadata | -|----------|---------------------|-----------|----------| -| **File-level** | ❌ Re-upload entire file | 1 per file | 1 per file | -| **Block-level** | ✅ Only changed blocks | 1 per session | IV per block | +| Approach | Incremental Uploads | KMS Calls | DEK | IV | +|----------|---------------------|-----------|-----|-----| +| **File-level** | ❌ Re-upload entire file | 1 per file | 1 per file | 1 per file | +| **Block-level** | ✅ Only changed blocks | 1 per session | 1 per file | 1 per block | -Block-level enables Guppy's existing incremental upload capability to work with encrypted content. +Block-level encryption uses a single DEK per file (stored in the encrypted metadata block), but each block has its own IV (stored inline with the block). This enables Guppy's existing incremental upload capability to work with encrypted content. ## Metadata Format -Encrypted content MUST include a metadata block compatible with `@storacha/encrypt-upload-client`: +Each encrypted **file** has its own encrypted metadata block (1 per file, not per upload). This allows file-level access control and independent key rotation. + +Encrypted content MUST include an encrypted metadata block compatible with `@storacha/encrypt-upload-client`: ```typescript interface EncryptedMetadata { encryptedDataCID: CID // CID of encrypted content - encryptedSymmetricKey: string // Base64-encoded wrapped DEK + encryptedSymmetricKey: string // Base64-encoded wrapped blob (contains path + DEK) space: SpaceDID // Space the content belongs to - path?: string // File path (e.g., "/backups/server1/backup.tar") + path?: string // File path for client display (e.g., "/backups/db.tar") kms: { provider: string // e.g., "storacha" keyId: string // KMS key identifier @@ -134,7 +162,13 @@ interface EncryptedMetadata { } ``` -The `path` field is RECOMMENDED for all new uploads. It enables folder-level access control via `nb.prefix` delegations (see Folder-Level Access Control section). +**Wrapped blob format:** The `encryptedSymmetricKey` contains: +- `wrap(KEK, { path, dek })` — if path is provided +- `wrap(KEK, { dek })` — if no path (backward compatible) + +**Note:** When `path` is provided, it appears in two places: +- **In encrypted metadata block** (plaintext CBOR): For client display and Pail indexing +- **In wrapped blob** (encrypted): For KMS validation, the client can't lie about the path ## Streaming Support @@ -163,7 +197,7 @@ Go's standard library provides native support via: - Store IV in block metadata c. Build UnixFS DAG from encrypted blocks d. Wrap DEK with space public key (RSA-OAEP) - e. Create metadata block with wrapped DEK + file path + e. Create encrypted metadata block with wrapped DEK + file path │ ▼ 4. Upload encrypted blocks + metadata (blob/add) @@ -175,7 +209,7 @@ Go's standard library provides native support via: - Step 3b (IV + encryption): ✅ Done - Step 3c (UnixFS DAG): ✅ Done - Step 3d (DEK wrapping): ❌ Pending -- Step 3e (metadata block): ❌ Pending +- Step 3e (encrypted metadata block): ❌ Pending - Step 4 (upload): ✅ Existing Guppy functionality ## Decryption Flow @@ -209,23 +243,23 @@ guppy gateway serve --decryption-key /path/to/key.bin ### Option B: Client-Side Decryption (KMS Mode) -For production with access control, decryption happens client-side: +For production with access control, decryption happens client-side via `guppy retrieve`: ``` -1. Fetch encrypted content via gateway - - Public: https://w3s.link/ipfs/ - - Local: guppy gateway serve → http://localhost:3000/ipfs/ +1. guppy retrieve + - Fetches encrypted content via gateway or directly from network │ ▼ -2. Extract file metadata block +2. Extract encrypted metadata block │ ▼ 3. Extract wrapped DEK from metadata │ ▼ 4. Unwrap DEK via KMS (space/encryption/key/decrypt) + → Client sends wrapped DEK to KMS (KMS does NOT fetch content) → Provide UCAN proof with space/content/decrypt delegation - → KMS validates nb.prefix against file path + → KMS validates nb.resource matches the metadata CID │ ▼ 5. For each encrypted block: @@ -244,36 +278,47 @@ For production with access control, decryption happens client-side: ## Folder-Level Access Control -Access control is enforced via the `nb.prefix` caveat on `space/content/decrypt` delegations: +The existing `space/content/decrypt` capability uses `nb.resource` (CID-based). We propose enhancing it with an optional `nb.path` field for path-based access control: ```typescript -// Grant access to all files under /backups/server1/ space/content/decrypt with: did:key:zSpace - nb: { prefix: "/backups/server1/" } + nb: { + resource: CID, // Required: CID of the encrypted metadata block + path: "/backups/" // Optional: directory path (must end with /) + } audience: did:key:zRecipient ``` **Validation flow** -1. User requests decryption with delegation containing `nb.prefix` -2. KMS extracts `path` from encrypted metadata block -3. KMS validates: `path.startsWith(delegation.nb.prefix)` -4. If valid → unwrap DEK; if invalid → reject +1. User requests decryption with `nb.resource` (the metadata CID) +2. KMS validates `nb.resource`: + - `invocation.nb.resource === delegation.nb.resource` +3. KMS unwraps the encrypted blob to get `{ path, dek }` + - The `path` is cryptographically bound to the DEK at encryption time + - KMS doesn't need to fetch content — path is embedded in the wrapped blob +4. If `nb.path` present in delegation, KMS validates: + - Check: `path.startsWith(delegation.nb.path)` + - `nb.path` MUST end with `/` to ensure directory matching (e.g., `/priv/` not `/priv`) +5. If all validations pass → return DEK; otherwise → reject **Access control examples** -| Delegation `nb.prefix` | File `path` | Access | -|------------------------|-------------|--------| -| `/backups/` | `/backups/server1/backup.tar` | ✅ Allowed | -| `/backups/server1/` | `/backups/server1/backup.tar` | ✅ Allowed | -| `/backups/server2/` | `/backups/server1/backup.tar` | ❌ Denied | -| (none) | `/backups/server1/backup.tar` | ✅ Space-level access | +| `nb.resource` | `nb.path` | File `path` | Access | +|---------------|-----------|-------------|--------| +| `bafy...abc` | (none) | `/backups/db.tar` | ✅ CID-only access | +| `bafy...abc` | `/backups/` | `/backups/db.tar` | ✅ Path under directory | +| `bafy...abc` | `/priv/` | `/priv/secret.txt` | ✅ Path under directory | +| `bafy...abc` | `/priv/` | `/priv.txt` | ❌ Not under /priv/ directory | +| `bafy...abc` | `/logs/` | `/backups/db.tar` | ❌ Path mismatch | +| `bafy...abc` | `/backups/` | (none) | ❌ No path in file metadata | **Backward compatibility** -- Delegations without `nb.prefix` grant space-level access (all files) -- Files uploaded without `path` field are treated as root (`/`) and accessible with any space-level delegation +If a delegation specifies `nb.path`, the file's metadata MUST contain a `path` field to validate against. Old files without `path` in metadata cannot be accessed using path-scoped delegations, so use a CID-only delegation instead (like we already do today). + +**Design decision:** `nb.path` is a single string, not an array. To grant access to multiple paths, create separate delegations. This enables granular revocation. Revoking access to one path doesn't affect others. ## Capabilities Needed @@ -342,12 +387,18 @@ guppy encryption rotate-kek --space **Process:** 1. Generate new KEK in KMS -2. For each encrypted file in space: +2. List encrypted files using `pail.Entries()` (see [go-pail](https://github.com/storacha/go-pail)) + - Each encrypted file has one Pail entry: `path → metadataCID` + - Since encryption is file-level only, each entry corresponds to one encrypted file +3. For each encrypted file in space: - Unwrap DEK with old KEK - Re-wrap DEK with new KEK - - Create new metadata block (new CID) - - Update catalog entry to point to new metadata CID -3. Content blocks remain unchanged + - Create new encrypted metadata block (new CID) + - Update Pail entry to point to new metadata CID +4. Delete old KEK from KMS +5. Content blocks remain unchanged + +**Security note:** Old encrypted metadata blocks remain on the network (IPFS is immutable), but the old wrapped DEKs inside them are useless. The old KEK is deleted from KMS, so they cannot be unwrapped. **Use case:** Regular security hygiene, suspected KEK compromise @@ -363,7 +414,7 @@ guppy encryption rotate-dek --file 1. Download and decrypt file with old DEK 2. Generate new DEK 3. Re-encrypt all blocks with new DEK + new IVs -4. Create new metadata block with new wrapped DEK +4. Create new encrypted metadata block with new wrapped DEK 5. Upload encrypted blocks + new metadata 6. Update catalog entry to point to new root CID From ade8ef0f68dcf884042427ec75bce86cbb219457 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Thu, 19 Mar 2026 09:58:24 -0300 Subject: [PATCH 07/10] minor update in encryption rfc --- rfc/forge-encryption.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfc/forge-encryption.md b/rfc/forge-encryption.md index c0ad9ad..86995ff 100644 --- a/rfc/forge-encryption.md +++ b/rfc/forge-encryption.md @@ -89,7 +89,7 @@ Block-level encryption combines file-scoped DEKs with block-scoped IVs: - **1 DEK per file**: Each file gets a unique 256-bit AES key, generated randomly - **Random IV per block**: Each block within the file gets a unique 16-byte IV -- **AES-256-CTR**: Counter mode encryption with unique keystream per block +- **AES-256-CTR**: Counter mode encryption with unique keystream per block. AES-256-CTR doesn't provide authenticated encryption on its own, but in our case its fine since we use CIDs. Any tampering with the encrypted data changes the block's CID, which propagates up and changes the root CID, so it won't go unnoticed. - **Incremental uploads**: Only changed blocks need re-encryption ### DEK Lifecycle (Per File) From 3884eb71bf9785f7d7f7da5da6d8e9cfaa64ac44 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Thu, 19 Mar 2026 11:38:39 -0300 Subject: [PATCH 08/10] using a new command to handle mutability & privacy in guppy --- rfc/forge-encryption.md | 24 +++- rfc/forge-mutability.md | 292 +++++++++++++++++++++++++++++----------- 2 files changed, 228 insertions(+), 88 deletions(-) diff --git a/rfc/forge-encryption.md b/rfc/forge-encryption.md index 86995ff..0620ac2 100644 --- a/rfc/forge-encryption.md +++ b/rfc/forge-encryption.md @@ -181,8 +181,10 @@ Go's standard library provides native support via: ## Encryption Flow +For encrypted buckets, the flow is triggered by `guppy bucket put` (see [forge-mutability.md](./forge-mutability.md#guppy-bucket-put)): + ``` -1. guppy upload +1. guppy bucket put [] │ ▼ 2. Get space public key from KMS (space/encryption/setup) @@ -243,10 +245,10 @@ guppy gateway serve --decryption-key /path/to/key.bin ### Option B: Client-Side Decryption (KMS Mode) -For production with access control, decryption happens client-side via `guppy retrieve`: +For production with access control, decryption happens client-side via `guppy bucket get` (see [forge-mutability.md](./forge-mutability.md#guppy-bucket-get)): ``` -1. guppy retrieve +1. guppy bucket get / --delegation - Fetches encrypted content via gateway or directly from network │ ▼ @@ -334,8 +336,14 @@ Guppy SHOULD support two key management modes. ### Local Key Mode (Development/Testing) -For development and testing, Guppy MAY use a locally-provided key: +For development and testing, Guppy MAY use a locally-provided key. This can be configured either: +**Via `bucket create` flag:** +```bash +guppy bucket create --local-key ./dev-key.bin +``` + +**Or via config.yaml:** ```yaml # ~/.storacha/guppy/config.yaml encryption: @@ -346,6 +354,8 @@ encryption: This mode does NOT provide access control — anyone with the key can decrypt. +See [guppy#376](https://github.com/storacha/guppy/pull/376) for the POC implementation. + ### KMS Mode (Production/Enterprise) For production, Guppy SHOULD use the Storacha KMS: @@ -372,8 +382,8 @@ KMS mode enables: Key rotation requires the **Mutability** feature (see [forge-mutability.md](./forge-mutability.md)) because: - Rotation creates new metadata CIDs (wrapped DEK changes) -- The catalog/UCN must be updated to point to the new metadata CID -- Without mutability, clients cannot discover the rotated metadata +- Pail entries must be updated to point to the new metadata CID +- UCN publishes the updated Pail head so clients can discover the rotated metadata Guppy SHOULD support two types of key rotation via CLI commands (KMS mode only): @@ -466,4 +476,4 @@ guppy delegation revoke - [Block Encryption POC](https://github.com/storacha/guppy/pull/376) - [Storacha Encrypt Upload Client](https://github.com/storacha/upload-service/tree/main/packages/encrypt-upload-client) -- [AWS S3 Client-Side Encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html) +- [AWS S3 Client-Side Encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html) \ No newline at end of file diff --git a/rfc/forge-mutability.md b/rfc/forge-mutability.md index 2e9e257..6c9d702 100644 --- a/rfc/forge-mutability.md +++ b/rfc/forge-mutability.md @@ -76,7 +76,173 @@ graph LR |-----------|--------| | UCN wrapper (`clock/head`, `clock/advance`) | Needs Go impl (~few hundred lines) | | Pail integration | Library exists: `github.com/storacha/go-pail` | -| Guppy commands | Update `upload`, `ls`, `retrieve` | +| Guppy commands | New `bucket` subcommand (`create`, `put`, `get`, `ls`) | + +## CLI Command Reference + +The `bucket` subcommand provides a **mutable storage interface** using Pail + UCN. It tracks `path → CID` mappings and enables path-based content resolution. Encryption is **optional** and controlled via the `--encrypt` flag. + +**Reference:** [w3cli-plugin-bucket](https://github.com/alanshaw/w3cli-plugin-bucket) - Alan's CLI extension that inspired this design. + +### `guppy bucket create` + +Create a bucket by linking a local folder to a space. + +``` +guppy bucket create [--encrypt] [--local-key ] +``` + +**Arguments:** +| Argument | Description | +|----------|-------------| +| `` | Space DID (e.g., `did:key:z6Mk...`) | +| `` | Bucket name in Pail (e.g., `backups`, `db-snapshots`) | +| `` | Local filesystem folder to track | + +**Flags:** +| Flag | Description | +|------|-------------| +| `--encrypt` | Enable encryption using KMS (requires KMS config in `config.yaml`) | +| `--local-key ` | Enable encryption using a standalone key file (dev/testing, no access control) | + +**Example:** +```bash +# Create plaintext bucket (mutability only) +guppy bucket create did:key:z6MkExample backups /Users/alice/my-backups + +# Create encrypted bucket with KMS (production) +guppy bucket create did:key:z6MkExample secrets /Users/alice/secrets --encrypt + +# Create encrypted bucket with local key (dev/testing) +guppy bucket create did:key:z6MkExample dev-secrets /Users/alice/test-data --local-key ./dev-key.bin +``` + +**Behavior:** +1. Links bucket `` to `` in local config +2. If `--encrypt`: reads KMS config from `~/.storacha/guppy/config.yaml` and calls `space/encryption/setup` to get/create KEK (see [forge-encryption.md](./forge-encryption.md#key-management)) +3. If `--local-key`: uses the provided key file directly (no KMS, no access control - see [guppy#376](https://github.com/storacha/guppy/pull/376)) +4. Stores bucket settings locally (including encryption mode) + +### `guppy bucket put` + +Upload files to the bucket. If the bucket has encryption enabled, each file is encrypted individually. + +``` +guppy bucket put [] +``` + +**Arguments:** +| Argument | Description | +|----------|-------------| +| `` | Space DID | +| `` | Bucket name (from `bucket create`) | +| `` | Optional. Specific file to upload. If omitted, uploads all files in bucket folder. | + +**Example:** +```bash +# Upload a single file +guppy bucket put did:key:z6MkExample backups ./db.tar + +# Upload all files in bucket folder +guppy bucket put did:key:z6MkExample backups +``` + +**Behavior (per file):** + +*Plaintext bucket:* +1. Build UnixFS DAG from file blocks +2. Upload via `space/blob/add` +3. Store `/` → `rootCID` in Pail +4. Publish updated Pail head via UCN (`clock/advance`) + +*Encrypted bucket:* +1. Generate random DEK (256-bit AES key) +2. For each block: generate IV, encrypt with DEK+IV (AES-256-CTR) +3. Build UnixFS DAG from encrypted blocks +4. Wrap DEK with KEK (RSA-OAEP), include file path in wrapped blob +5. Create encrypted metadata block → `metadataCID` +6. Upload encrypted blocks + metadata via `space/blob/add` +7. Store `/` → `metadataCID` in Pail +8. Publish updated Pail head via UCN (`clock/advance`) + +### `guppy bucket get` + +Retrieve a file from the bucket. For encrypted buckets, decryption requires a delegation. + +``` +guppy bucket get / [--delegation ] +``` + +**Arguments:** +| Argument | Description | +|----------|-------------| +| `` | Space DID | +| `/` | Full path in bucket (e.g., `backups/db.tar`) | +| `` | Local filesystem path to write output | + +**Flags:** +| Flag | Description | +|------|-------------| +| `--delegation ` | Path to delegation file authorizing decryption (required for encrypted buckets) | + +**Example:** +```bash +# Retrieve from plaintext bucket +guppy bucket get did:key:z6MkExample backups/db.tar ./restored-db.tar + +# Retrieve and decrypt from encrypted bucket +guppy bucket get did:key:z6MkExample secrets/config.tar ./config.tar --delegation ./my-delegation.ucan +``` + +**Behavior:** + +*Plaintext bucket:* +1. Resolve `/` via UCN + Pail → get `rootCID` +2. Fetch content blocks +3. Write file to `` + +*Encrypted bucket:* +1. Resolve `/` via UCN + Pail → get `metadataCID` +2. Fetch encrypted metadata block +3. Send delegation + `metadataCID` to KMS (`space/encryption/key/decrypt`) +4. KMS validates delegation, unwraps DEK +5. Fetch encrypted blocks, decrypt with DEK+IV +6. Write decrypted file to `` + +### `guppy bucket ls` + +List files in the bucket. + +``` +guppy bucket ls +``` + +**Arguments:** +| Argument | Description | +|----------|-------------| +| `` | Space DID | +| `` | Bucket name | + +**Example:** +```bash +guppy bucket ls did:key:z6MkExample backups +``` + +**Output:** +``` +FILE CID UPDATED +backups/db.tar bafyMeta1... 2026-03-19T10:00:00Z +backups/config.json bafyMeta2... 2026-03-19T09:30:00Z +``` + +## Multi-Writer Behavior + +When multiple Guppy clients write to the same space concurrently: + +- **Source-level granularity**: Pail tracks `sourcePath -> CID` mappings. Concurrent updates to *different* source paths merge cleanly via CRDT. +- **Same source path**: If two clients update the same source path simultaneously, **last-writer-wins** applies to the CID value. The directory structure within each UnixFS root is not merged. + +This is acceptable for Forge's backup use case where each client typically owns distinct source paths. ## Integration with Guppy Flows @@ -94,71 +260,47 @@ Guppy's `ExecuteUpload` runs a pipeline of workers: Returns: `rootCID` (the content root) -### Upload with Mutability (Proposed) +### Bucket Put Flow ```mermaid sequenceDiagram participant User participant Guppy - participant Workers - participant Storacha - participant Pail - participant UCN - - User->>Guppy: guppy upload - Guppy->>Workers: ExecuteUpload(uploadID, spaceDID) - Workers->>Workers: Scan FS entries - Workers->>Workers: Create DAG nodes - Workers->>Workers: Pack into CAR shards - Workers->>Storacha: space/blob/add (shards) - Workers->>Storacha: space/blob/add (indexes) - Workers->>Storacha: upload/add - Workers-->>Guppy: rootCID - - Note over Guppy,UCN: NEW: Mutability integration - Guppy->>Pail: crdt.Put(sourcePath, rootCID) - Pail-->>Guppy: eventCID - Guppy->>UCN: name.Publish(eventCID) - UCN-->>Guppy: OK - Guppy-->>User: Uploaded (rootCID) -``` - -### Encrypted Upload with Mutability (Proposed) - -```mermaid -sequenceDiagram - participant User - participant Guppy - participant Workers - participant Storacha participant KMS + participant Storacha participant Pail participant UCN - User->>Guppy: guppy upload --encrypt - Guppy->>Guppy: Generate DEK - Guppy->>Workers: ExecuteUpload with encryption - Workers->>Workers: Encrypt blocks with DEK - Workers->>Storacha: space/blob/add (encrypted shards) - Workers->>Storacha: space/blob/add (indexes) - Workers->>Storacha: upload/add - Workers-->>Guppy: encryptedRootCID + User->>Guppy: guppy bucket put [] - Guppy->>KMS: Wrap DEK with KEK - KMS-->>Guppy: wrappedDEK - Guppy->>Guppy: Create metadata block (wrappedDEK + encryptedRootCID) - Guppy->>Storacha: space/blob/add (metadata block) - Storacha-->>Guppy: metadataCID + loop For each file in bucket + alt Encrypted bucket + Guppy->>Guppy: Generate random DEK (256-bit) + Guppy->>Guppy: For each block: generate IV, encrypt with DEK+IV + Guppy->>Guppy: Build UnixFS DAG from encrypted blocks + Guppy->>Storacha: space/blob/add (encrypted shards) + + Guppy->>KMS: Wrap DEK with KEK (include file path) + KMS-->>Guppy: wrappedDEK + Guppy->>Guppy: Create metadata block + Guppy->>Storacha: space/blob/add (metadata block) + Storacha-->>Guppy: metadataCID + Guppy->>Pail: crdt.Put(name/filename, metadataCID) + else Plaintext bucket + Guppy->>Guppy: Build UnixFS DAG from blocks + Guppy->>Storacha: space/blob/add (shards) + Storacha-->>Guppy: rootCID + Guppy->>Pail: crdt.Put(name/filename, rootCID) + end + Pail-->>Guppy: eventCID + end - Note over Guppy,UCN: Mutability stores metadataCID - Guppy->>Pail: crdt.Put(sourcePath, metadataCID) - Pail-->>Guppy: eventCID Guppy->>UCN: name.Publish(eventCID) UCN-->>Guppy: OK - Guppy-->>User: Uploaded (encrypted) + Guppy-->>User: Uploaded ``` -### Retrieve with Mutability - Unified Flow (Proposed) +### Bucket Get Flow ```mermaid sequenceDiagram @@ -166,44 +308,31 @@ sequenceDiagram participant Guppy participant UCN participant Pail - participant Locator participant Storage participant KMS - User->>Guppy: guppy retrieve - - alt Path is CID (e.g. bafy...) - Guppy->>Guppy: Use CID directly as rootCID - else Path is file path (e.g. /backups/db.tar) - Note over Guppy,Pail: NEW: Resolve path via UCN + Pail - Guppy->>UCN: name.Resolve(spaceDID) - UCN-->>Guppy: head events - Guppy->>Storage: Fetch Pail blocks for head - Storage-->>Guppy: Pail blocks - Guppy->>Pail: crdt.Root(head, blocks) - Pail-->>Guppy: pailRoot - Guppy->>Pail: pail.Get(pailRoot, path) - Pail-->>Guppy: rootCID - end + User->>Guppy: guppy bucket get / [--delegation] - Guppy->>Locator: Query indexer for rootCID - Locator-->>Guppy: provider locations - Guppy->>Storage: Fetch root block - Storage-->>Guppy: block data + Note over Guppy,Pail: Resolve path via UCN + Pail + Guppy->>UCN: name.Resolve(spaceDID) + UCN-->>Guppy: head events + Guppy->>Storage: Fetch Pail blocks for head + Storage-->>Guppy: Pail blocks + Guppy->>Pail: crdt.Root(head, blocks) + Pail-->>Guppy: pailRoot + Guppy->>Pail: pail.Get(pailRoot, name/file-path) + Pail-->>Guppy: CID (rootCID or metadataCID) - alt Block is EncryptedMetadata format - Note over Guppy,KMS: Encrypted content detected - Guppy->>Guppy: Extract wrappedDEK + encryptedRootCID - Guppy->>KMS: Unwrap DEK with KEK + alt Encrypted bucket (--delegation provided) + Guppy->>Storage: Fetch metadata block + Storage-->>Guppy: EncryptedMetadata (wrappedDEK + encryptedRootCID) + Guppy->>KMS: space/encryption/key/decrypt (delegation + metadataCID) KMS-->>Guppy: DEK - Guppy->>Locator: Query indexer for encryptedRootCID - Locator-->>Guppy: provider locations Guppy->>Storage: Fetch encrypted blocks Storage-->>Guppy: encrypted blocks - Guppy->>Guppy: Decrypt with DEK - else Block is plaintext UnixFS - Note over Guppy,Storage: Plaintext content - Guppy->>Storage: Fetch remaining blocks + Guppy->>Guppy: Decrypt with DEK+IV + else Plaintext bucket + Guppy->>Storage: Fetch content blocks (rootCID) Storage-->>Guppy: file blocks end @@ -216,3 +345,4 @@ sequenceDiagram - [Storacha UCN Package](https://github.com/storacha/upload-service/tree/main/packages/ucn) - [Storacha Pail Package](https://github.com/storacha/pail) - [UCAN Specification](https://github.com/ucan-wg/spec) +- [w3cli-plugin-bucket](https://github.com/alanshaw/w3cli-plugin-bucket/blob/main/index.js) \ No newline at end of file From ca8049facde0066ed88d97b77ec99a41e11d2127 Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Thu, 19 Mar 2026 13:37:31 -0300 Subject: [PATCH 09/10] remove bucket cmd --- rfc/forge-encryption.md | 12 +-- rfc/forge-mutability.md | 197 ++++++++++++++++++---------------------- 2 files changed, 96 insertions(+), 113 deletions(-) diff --git a/rfc/forge-encryption.md b/rfc/forge-encryption.md index 0620ac2..7ddeb45 100644 --- a/rfc/forge-encryption.md +++ b/rfc/forge-encryption.md @@ -181,10 +181,10 @@ Go's standard library provides native support via: ## Encryption Flow -For encrypted buckets, the flow is triggered by `guppy bucket put` (see [forge-mutability.md](./forge-mutability.md#guppy-bucket-put)): +For encrypted sources, the flow is triggered by `guppy upload` after a source was registered with `--encrypt` (see [forge-mutability.md](./forge-mutability.md#guppy-upload-extended)): ``` -1. guppy bucket put [] +1. guppy upload [source-name...] │ ▼ 2. Get space public key from KMS (space/encryption/setup) @@ -245,10 +245,10 @@ guppy gateway serve --decryption-key /path/to/key.bin ### Option B: Client-Side Decryption (KMS Mode) -For production with access control, decryption happens client-side via `guppy bucket get` (see [forge-mutability.md](./forge-mutability.md#guppy-bucket-get)): +For production with access control, decryption happens client-side via `guppy retrieve` (see [forge-mutability.md](./forge-mutability.md#guppy-retrieve-extended)): ``` -1. guppy bucket get / --delegation +1. guppy retrieve --delegation - Fetches encrypted content via gateway or directly from network │ ▼ @@ -338,9 +338,9 @@ Guppy SHOULD support two key management modes. For development and testing, Guppy MAY use a locally-provided key. This can be configured either: -**Via `bucket create` flag:** +**Via `upload source add` flag:** ```bash -guppy bucket create --local-key ./dev-key.bin +guppy upload source add --name --local-key ./dev-key.bin ``` **Or via config.yaml:** diff --git a/rfc/forge-mutability.md b/rfc/forge-mutability.md index 6c9d702..23999a0 100644 --- a/rfc/forge-mutability.md +++ b/rfc/forge-mutability.md @@ -76,163 +76,143 @@ graph LR |-----------|--------| | UCN wrapper (`clock/head`, `clock/advance`) | Needs Go impl (~few hundred lines) | | Pail integration | Library exists: `github.com/storacha/go-pail` | -| Guppy commands | New `bucket` subcommand (`create`, `put`, `get`, `ls`) | +| Guppy changes | Extend existing `upload source add`, `upload`, `retrieve` commands | ## CLI Command Reference -The `bucket` subcommand provides a **mutable storage interface** using Pail + UCN. It tracks `path → CID` mappings and enables path-based content resolution. Encryption is **optional** and controlled via the `--encrypt` flag. +Mutability and encryption are integrated into existing Guppy commands with minimal changes. This approach leverages the existing `upload source add` → `upload` → `retrieve` flow. **Reference:** [w3cli-plugin-bucket](https://github.com/alanshaw/w3cli-plugin-bucket) - Alan's CLI extension that inspired this design. -### `guppy bucket create` +### `guppy upload source add` (Extended) -Create a bucket by linking a local folder to a space. +Register a source and initialize a Pail bucket. The `--name` flag becomes the bucket name in Pail. ``` -guppy bucket create [--encrypt] [--local-key ] +guppy upload source add [--name ] [--encrypt] [--local-key ] ``` **Arguments:** | Argument | Description | |----------|-------------| | `` | Space DID (e.g., `did:key:z6Mk...`) | -| `` | Bucket name in Pail (e.g., `backups`, `db-snapshots`) | -| `` | Local filesystem folder to track | +| `` | Local filesystem folder to track | **Flags:** | Flag | Description | |------|-------------| +| `--name ` | Bucket name in Pail (defaults to folder basename) | | `--encrypt` | Enable encryption using KMS (requires KMS config in `config.yaml`) | | `--local-key ` | Enable encryption using a standalone key file (dev/testing, no access control) | **Example:** ```bash -# Create plaintext bucket (mutability only) -guppy bucket create did:key:z6MkExample backups /Users/alice/my-backups +# Register source with mutability (plaintext) +guppy upload source add did:key:z6MkExample /Users/alice/my-backups --name backups -# Create encrypted bucket with KMS (production) -guppy bucket create did:key:z6MkExample secrets /Users/alice/secrets --encrypt +# Register source with mutability + encryption (KMS) +guppy upload source add did:key:z6MkExample /Users/alice/secrets --name secrets --encrypt -# Create encrypted bucket with local key (dev/testing) -guppy bucket create did:key:z6MkExample dev-secrets /Users/alice/test-data --local-key ./dev-key.bin +# Register source with mutability + encryption (local key, dev/testing) +guppy upload source add did:key:z6MkExample /Users/alice/test-data --name dev-secrets --local-key ./dev-key.bin ``` **Behavior:** -1. Links bucket `` to `` in local config -2. If `--encrypt`: reads KMS config from `~/.storacha/guppy/config.yaml` and calls `space/encryption/setup` to get/create KEK (see [forge-encryption.md](./forge-encryption.md#key-management)) -3. If `--local-key`: uses the provided key file directly (no KMS, no access control - see [guppy#376](https://github.com/storacha/guppy/pull/376)) -4. Stores bucket settings locally (including encryption mode) +1. Register source locally (existing behavior) +2. Initialize Pail bucket using `--name` (or folder basename) +3. If `--encrypt`: read KMS config from `~/.storacha/guppy/config.yaml` and call `space/encryption/setup` to get/create KEK (see [forge-encryption.md](./forge-encryption.md#key-management)) +4. If `--local-key`: use the provided key file directly (no KMS, no access control - see [guppy#376](https://github.com/storacha/guppy/pull/376)) +5. Store source settings locally (including encryption mode) -### `guppy bucket put` +### `guppy upload` (Extended) -Upload files to the bucket. If the bucket has encryption enabled, each file is encrypted individually. +Upload sources and store path→CID mappings in Pail. ``` -guppy bucket put [] +guppy upload [source-path-or-name...] ``` -**Arguments:** -| Argument | Description | -|----------|-------------| -| `` | Space DID | -| `` | Bucket name (from `bucket create`) | -| `` | Optional. Specific file to upload. If omitted, uploads all files in bucket folder. | +**Existing behavior preserved.** After upload completes, adds: -**Example:** -```bash -# Upload a single file -guppy bucket put did:key:z6MkExample backups ./db.tar - -# Upload all files in bucket folder -guppy bucket put did:key:z6MkExample backups -``` +**Post-upload behavior (per source):** -**Behavior (per file):** +*Plaintext source:* +1. After `ExecuteUpload()` returns `rootCID` +2. Store `` → `rootCID` in Pail +3. Publish updated Pail head via UCN (`clock/advance`) -*Plaintext bucket:* -1. Build UnixFS DAG from file blocks -2. Upload via `space/blob/add` -3. Store `/` → `rootCID` in Pail +*Encrypted source:* +1. During upload: generate DEK, encrypt blocks, wrap DEK with KEK +2. Create encrypted metadata block → `metadataCID` +3. Store `` → `metadataCID` in Pail 4. Publish updated Pail head via UCN (`clock/advance`) -*Encrypted bucket:* -1. Generate random DEK (256-bit AES key) -2. For each block: generate IV, encrypt with DEK+IV (AES-256-CTR) -3. Build UnixFS DAG from encrypted blocks -4. Wrap DEK with KEK (RSA-OAEP), include file path in wrapped blob -5. Create encrypted metadata block → `metadataCID` -6. Upload encrypted blocks + metadata via `space/blob/add` -7. Store `/` → `metadataCID` in Pail -8. Publish updated Pail head via UCN (`clock/advance`) +### `guppy retrieve` (Extended) -### `guppy bucket get` - -Retrieve a file from the bucket. For encrypted buckets, decryption requires a delegation. +Retrieve content by CID or by path (via Pail resolution). ``` -guppy bucket get / [--delegation ] +guppy retrieve [--delegation ] ``` **Arguments:** | Argument | Description | |----------|-------------| | `` | Space DID | -| `/` | Full path in bucket (e.g., `backups/db.tar`) | -| `` | Local filesystem path to write output | +| `` | CID or path (e.g., `backups` or `bafyRootCID`) | +| `` | Local filesystem path to write output | **Flags:** | Flag | Description | |------|-------------| -| `--delegation ` | Path to delegation file authorizing decryption (required for encrypted buckets) | +| `--delegation ` | Path to delegation file authorizing decryption (required for encrypted content) | **Example:** ```bash -# Retrieve from plaintext bucket -guppy bucket get did:key:z6MkExample backups/db.tar ./restored-db.tar +# Retrieve by CID (existing behavior) +guppy retrieve did:key:z6MkExample bafyRootCID ./output + +# Retrieve by path (new - resolves via Pail) +guppy retrieve did:key:z6MkExample backups ./restored-backups -# Retrieve and decrypt from encrypted bucket -guppy bucket get did:key:z6MkExample secrets/config.tar ./config.tar --delegation ./my-delegation.ucan +# Retrieve and decrypt encrypted content +guppy retrieve did:key:z6MkExample secrets ./decrypted-secrets --delegation ./my-delegation.ucan ``` **Behavior:** -*Plaintext bucket:* -1. Resolve `/` via UCN + Pail → get `rootCID` -2. Fetch content blocks -3. Write file to `` +*If `` is a CID:* +1. Existing behavior (fetch by CID) -*Encrypted bucket:* -1. Resolve `/` via UCN + Pail → get `metadataCID` -2. Fetch encrypted metadata block -3. Send delegation + `metadataCID` to KMS (`space/encryption/key/decrypt`) -4. KMS validates delegation, unwraps DEK -5. Fetch encrypted blocks, decrypt with DEK+IV -6. Write decrypted file to `` +*If `` is a path:* +1. Resolve via UCN + Pail → get CID (rootCID or metadataCID) +2. If `--delegation` provided (encrypted content): + - Fetch encrypted metadata block + - Send delegation + metadataCID to KMS (`space/encryption/key/decrypt`) + - KMS validates delegation, unwraps DEK + - Fetch encrypted blocks, decrypt with DEK+IV +3. Else (plaintext): + - Fetch content blocks directly +4. Write to `` -### `guppy bucket ls` +### `guppy upload source ls` (New) -List files in the bucket. +List sources and their current CIDs from Pail. ``` -guppy bucket ls +guppy upload source ls ``` -**Arguments:** -| Argument | Description | -|----------|-------------| -| `` | Space DID | -| `` | Bucket name | - **Example:** ```bash -guppy bucket ls did:key:z6MkExample backups +guppy upload source ls did:key:z6MkExample ``` **Output:** ``` -FILE CID UPDATED -backups/db.tar bafyMeta1... 2026-03-19T10:00:00Z -backups/config.json bafyMeta2... 2026-03-19T09:30:00Z +NAME CID ENCRYPTED +backups bafyRoot1... no +secrets bafyMeta2... yes ``` ## Multi-Writer Behavior @@ -260,7 +240,7 @@ Guppy's `ExecuteUpload` runs a pipeline of workers: Returns: `rootCID` (the content root) -### Bucket Put Flow +### Upload Flow (with Mutability + Optional Encryption) ```mermaid sequenceDiagram @@ -271,36 +251,35 @@ sequenceDiagram participant Pail participant UCN - User->>Guppy: guppy bucket put [] + User->>Guppy: guppy upload [source-name...] - loop For each file in bucket - alt Encrypted bucket + loop For each source + alt Encrypted source (--encrypt was set on source add) Guppy->>Guppy: Generate random DEK (256-bit) Guppy->>Guppy: For each block: generate IV, encrypt with DEK+IV Guppy->>Guppy: Build UnixFS DAG from encrypted blocks Guppy->>Storacha: space/blob/add (encrypted shards) - Guppy->>KMS: Wrap DEK with KEK (include file path) + Guppy->>KMS: Wrap DEK with KEK KMS-->>Guppy: wrappedDEK Guppy->>Guppy: Create metadata block Guppy->>Storacha: space/blob/add (metadata block) Storacha-->>Guppy: metadataCID - Guppy->>Pail: crdt.Put(name/filename, metadataCID) - else Plaintext bucket - Guppy->>Guppy: Build UnixFS DAG from blocks - Guppy->>Storacha: space/blob/add (shards) + Guppy->>Pail: crdt.Put(source-name, metadataCID) + else Plaintext source + Guppy->>Guppy: ExecuteUpload (existing pipeline) Storacha-->>Guppy: rootCID - Guppy->>Pail: crdt.Put(name/filename, rootCID) + Guppy->>Pail: crdt.Put(source-name, rootCID) end Pail-->>Guppy: eventCID end - Guppy->>UCN: name.Publish(eventCID) + Guppy->>UCN: clock/advance(eventCID) UCN-->>Guppy: OK Guppy-->>User: Uploaded ``` -### Bucket Get Flow +### Retrieve Flow (with Path Resolution + Optional Decryption) ```mermaid sequenceDiagram @@ -311,19 +290,23 @@ sequenceDiagram participant Storage participant KMS - User->>Guppy: guppy bucket get / [--delegation] + User->>Guppy: guppy retrieve [--delegation] - Note over Guppy,Pail: Resolve path via UCN + Pail - Guppy->>UCN: name.Resolve(spaceDID) - UCN-->>Guppy: head events - Guppy->>Storage: Fetch Pail blocks for head - Storage-->>Guppy: Pail blocks - Guppy->>Pail: crdt.Root(head, blocks) - Pail-->>Guppy: pailRoot - Guppy->>Pail: pail.Get(pailRoot, name/file-path) - Pail-->>Guppy: CID (rootCID or metadataCID) + alt content-path is a path (not CID) + Note over Guppy,Pail: Resolve path via UCN + Pail + Guppy->>UCN: clock/head(spaceDID) + UCN-->>Guppy: head events + Guppy->>Storage: Fetch Pail blocks for head + Storage-->>Guppy: Pail blocks + Guppy->>Pail: crdt.Root(head, blocks) + Pail-->>Guppy: pailRoot + Guppy->>Pail: pail.Get(pailRoot, path) + Pail-->>Guppy: CID (rootCID or metadataCID) + else content-path is a CID + Note over Guppy: Use CID directly + end - alt Encrypted bucket (--delegation provided) + alt --delegation provided (encrypted content) Guppy->>Storage: Fetch metadata block Storage-->>Guppy: EncryptedMetadata (wrappedDEK + encryptedRootCID) Guppy->>KMS: space/encryption/key/decrypt (delegation + metadataCID) @@ -331,7 +314,7 @@ sequenceDiagram Guppy->>Storage: Fetch encrypted blocks Storage-->>Guppy: encrypted blocks Guppy->>Guppy: Decrypt with DEK+IV - else Plaintext bucket + else Plaintext content Guppy->>Storage: Fetch content blocks (rootCID) Storage-->>Guppy: file blocks end From 057dd0c5a4a2e9b85c87803cb404594ff4b7b9de Mon Sep 17 00:00:00 2001 From: Felipe Forbeck Date: Thu, 26 Mar 2026 10:41:06 -0300 Subject: [PATCH 10/10] update mutability RFC to follow the bucket cmd proposal --- rfc/forge-mutability.md | 252 ++++++++++++++++++++++++---------------- 1 file changed, 151 insertions(+), 101 deletions(-) diff --git a/rfc/forge-mutability.md b/rfc/forge-mutability.md index 23999a0..b63745c 100644 --- a/rfc/forge-mutability.md +++ b/rfc/forge-mutability.md @@ -76,91 +76,85 @@ graph LR |-----------|--------| | UCN wrapper (`clock/head`, `clock/advance`) | Needs Go impl (~few hundred lines) | | Pail integration | Library exists: `github.com/storacha/go-pail` | -| Guppy changes | Extend existing `upload source add`, `upload`, `retrieve` commands | +| Guppy changes | New bucket commands: `put`, `get`, `ls`, `rm` | ## CLI Command Reference -Mutability and encryption are integrated into existing Guppy commands with minimal changes. This approach leverages the existing `upload source add` → `upload` → `retrieve` flow. +This RFC adopts **bucket semantics** for mutable content, inspired by [Alan Shaw's proposal](https://github.com/storacha/RFC/pull/84#issuecomment-4097051650) and [w3cli-plugin-bucket](https://github.com/alanshaw/w3cli-plugin-bucket). -**Reference:** [w3cli-plugin-bucket](https://github.com/alanshaw/w3cli-plugin-bucket) - Alan's CLI extension that inspired this design. +### Design Rationale -### `guppy upload source add` (Extended) +The bucket model (`put`/`get`/`ls`/`rm`) is preferred over extending `upload`/`retrieve` because: -Register a source and initialize a Pail bucket. The `--name` flag becomes the bucket name in Pail. +1. **Clear semantics**: "put" implies key-value storage with overwrite behavior +2. **Single command**: `put` combines source registration + upload + Pail update in one action +3. **Key not name**: The key is an explicit identifier, not a passive attribute +4. **Familiar pattern**: Matches AWS S3, GCS, and other object storage CLIs +5. **Mutability first-class**: Every `put` creates/updates a mutable reference + +### `guppy put` + +Upload content and store a mutable reference (key → CID) in Pail. ``` -guppy upload source add [--name ] [--encrypt] [--local-key ] +guppy put [--encrypt] [--local-key ] ``` **Arguments:** | Argument | Description | |----------|-------------| | `` | Space DID (e.g., `did:key:z6Mk...`) | -| `` | Local filesystem folder to track | +| `` | Mutable reference key (e.g., `backups`, `photos/2026`) | +| `` | Local filesystem path to upload | **Flags:** | Flag | Description | |------|-------------| -| `--name ` | Bucket name in Pail (defaults to folder basename) | | `--encrypt` | Enable encryption using KMS (requires KMS config in `config.yaml`) | -| `--local-key ` | Enable encryption using a standalone key file (dev/testing, no access control) | +| `--local-key ` | Enable encryption using a standalone key file (dev/testing, no access control) | **Example:** ```bash -# Register source with mutability (plaintext) -guppy upload source add did:key:z6MkExample /Users/alice/my-backups --name backups - -# Register source with mutability + encryption (KMS) -guppy upload source add did:key:z6MkExample /Users/alice/secrets --name secrets --encrypt +# Upload and create mutable reference (plaintext) +guppy put did:key:z6MkExample backups /Users/alice/my-backups -# Register source with mutability + encryption (local key, dev/testing) -guppy upload source add did:key:z6MkExample /Users/alice/test-data --name dev-secrets --local-key ./dev-key.bin -``` - -**Behavior:** -1. Register source locally (existing behavior) -2. Initialize Pail bucket using `--name` (or folder basename) -3. If `--encrypt`: read KMS config from `~/.storacha/guppy/config.yaml` and call `space/encryption/setup` to get/create KEK (see [forge-encryption.md](./forge-encryption.md#key-management)) -4. If `--local-key`: use the provided key file directly (no KMS, no access control - see [guppy#376](https://github.com/storacha/guppy/pull/376)) -5. Store source settings locally (including encryption mode) +# Upload with encryption (KMS) +guppy put did:key:z6MkExample secrets /Users/alice/secrets --encrypt -### `guppy upload` (Extended) +# Upload with encryption (local key, dev/testing) +guppy put did:key:z6MkExample dev-data /Users/alice/test-data --local-key ./dev-key.bin -Upload sources and store path→CID mappings in Pail. - -``` -guppy upload [source-path-or-name...] +# Hierarchical keys are supported +guppy put did:key:z6MkExample backups/daily/2026-03-25 /Users/alice/daily-backup ``` -**Existing behavior preserved.** After upload completes, adds: - -**Post-upload behavior (per source):** - -*Plaintext source:* -1. After `ExecuteUpload()` returns `rootCID` -2. Store `` → `rootCID` in Pail -3. Publish updated Pail head via UCN (`clock/advance`) +**Behavior:** +1. If `` is not already a registered source, create one automatically +2. Upload content via existing pipeline → `rootCID` +3. If `--encrypt`: + - Generate DEK, encrypt blocks, wrap DEK with KEK + - Create metadata block → `metadataCID` + - Store `` → `metadataCID` in Pail +4. Else (plaintext): + - Store `` → `rootCID` in Pail +5. Publish updated Pail head via UCN (`clock/advance`) -*Encrypted source:* -1. During upload: generate DEK, encrypt blocks, wrap DEK with KEK -2. Create encrypted metadata block → `metadataCID` -3. Store `` → `metadataCID` in Pail -4. Publish updated Pail head via UCN (`clock/advance`) +**Note:** If the key already exists, its value is **overwritten** with the new CID. -### `guppy retrieve` (Extended) +### `guppy get` -Retrieve content by CID or by path (via Pail resolution). +Retrieve content by key (resolves via Pail) or by CID. ``` -guppy retrieve [--delegation ] +guppy get [output] [--delegation ] ``` **Arguments:** | Argument | Description | |----------|-------------| | `` | Space DID | -| `` | CID or path (e.g., `backups` or `bafyRootCID`) | -| `` | Local filesystem path to write output | +| `` | Pail key (e.g., `backups`) or CID (e.g., `bafyRootCID`) | +| `[output]` | Local filesystem path to write output (optional, defaults to current dir) | **Flags:** | Flag | Description | @@ -169,60 +163,117 @@ guppy retrieve [--delegation ] **Example:** ```bash -# Retrieve by CID (existing behavior) -guppy retrieve did:key:z6MkExample bafyRootCID ./output +# Retrieve by key (resolves via Pail) +guppy get did:key:z6MkExample backups ./restored-backups -# Retrieve by path (new - resolves via Pail) -guppy retrieve did:key:z6MkExample backups ./restored-backups +# Retrieve by CID (direct, no Pail lookup) +guppy get did:key:z6MkExample bafyRootCID ./output # Retrieve and decrypt encrypted content -guppy retrieve did:key:z6MkExample secrets ./decrypted-secrets --delegation ./my-delegation.ucan +guppy get did:key:z6MkExample secrets ./decrypted-secrets --delegation ./my-delegation.ucan ``` **Behavior:** -*If `` is a CID:* -1. Existing behavior (fetch by CID) +*If `` is a key:* +1. Resolve via UCN (`clock/head`) + Pail (`crdt.Get`) → get CID +2. Fetch and write content -*If `` is a path:* -1. Resolve via UCN + Pail → get CID (rootCID or metadataCID) -2. If `--delegation` provided (encrypted content): - - Fetch encrypted metadata block - - Send delegation + metadataCID to KMS (`space/encryption/key/decrypt`) - - KMS validates delegation, unwraps DEK - - Fetch encrypted blocks, decrypt with DEK+IV -3. Else (plaintext): - - Fetch content blocks directly -4. Write to `` +*If `` is a CID:* +1. Fetch content directly by CID -### `guppy upload source ls` (New) +*If `--delegation` provided (encrypted content):* +1. Fetch encrypted metadata block +2. Send delegation + metadataCID to KMS (`space/encryption/key/decrypt`) +3. KMS validates delegation, unwraps DEK +4. Fetch encrypted blocks, decrypt with DEK+IV +5. Write to output -List sources and their current CIDs from Pail. +### `guppy ls` + +List all keys and their current CIDs from Pail. ``` -guppy upload source ls +guppy ls [prefix] ``` +**Arguments:** +| Argument | Description | +|----------|-------------| +| `` | Space DID | +| `[prefix]` | Optional key prefix to filter results | + **Example:** ```bash -guppy upload source ls did:key:z6MkExample +# List all keys +guppy ls did:key:z6MkExample + +# List keys with prefix +guppy ls did:key:z6MkExample backups/ ``` **Output:** ``` -NAME CID ENCRYPTED +KEY CID ENCRYPTED backups bafyRoot1... no -secrets bafyMeta2... yes +backups/daily/2026-03-25 bafyRoot2... no +secrets bafyMeta3... yes +``` + +### `guppy rm` + +Remove a key from Pail (does not delete the underlying content from storage). + ``` +guppy rm +``` + +**Arguments:** +| Argument | Description | +|----------|-------------| +| `` | Space DID | +| `` | Key to remove | + +**Example:** +```bash +guppy rm did:key:z6MkExample backups/daily/2026-03-25 +``` + +**Behavior:** +1. Remove `` from Pail (`crdt.Del`) +2. Publish updated Pail head via UCN (`clock/advance`) + +**Note:** This only removes the mutable reference. The content remains in storage and can still be accessed by CID. + +### Source Management (Optional) + +Sources are created automatically by `guppy put`. For power users who want to manage sources explicitly: + +```bash +# List local sources +guppy source ls + +# Remove a local source (cleanup) +guppy source rm +``` + +### Legacy Commands + +The following commands remain available for backward compatibility and direct CID-based operations: + +| Command | Use Case | +|---------|----------| +| `guppy upload ` | Upload without mutable reference | +| `guppy retrieve ` | Retrieve by CID directly | ## Multi-Writer Behavior When multiple Guppy clients write to the same space concurrently: -- **Source-level granularity**: Pail tracks `sourcePath -> CID` mappings. Concurrent updates to *different* source paths merge cleanly via CRDT. -- **Same source path**: If two clients update the same source path simultaneously, **last-writer-wins** applies to the CID value. The directory structure within each UnixFS root is not merged. +- **Key-level granularity**: Pail tracks `key -> CID` mappings. Concurrent updates to *different* keys merge cleanly via CRDT. +- **Same key**: If two clients update the same key simultaneously, **last-writer-wins** applies to the CID value. The directory structure within each UnixFS root is not merged. -This is acceptable for Forge's backup use case where each client typically owns distinct source paths. +This is acceptable for Forge's backup use case where each client typically owns distinct keys. ## Integration with Guppy Flows @@ -240,7 +291,7 @@ Guppy's `ExecuteUpload` runs a pipeline of workers: Returns: `rootCID` (the content root) -### Upload Flow (with Mutability + Optional Encryption) +### Put Flow (with Optional Encryption) ```mermaid sequenceDiagram @@ -251,35 +302,33 @@ sequenceDiagram participant Pail participant UCN - User->>Guppy: guppy upload [source-name...] + User->>Guppy: guppy put [--encrypt] - loop For each source - alt Encrypted source (--encrypt was set on source add) - Guppy->>Guppy: Generate random DEK (256-bit) - Guppy->>Guppy: For each block: generate IV, encrypt with DEK+IV - Guppy->>Guppy: Build UnixFS DAG from encrypted blocks - Guppy->>Storacha: space/blob/add (encrypted shards) - - Guppy->>KMS: Wrap DEK with KEK - KMS-->>Guppy: wrappedDEK - Guppy->>Guppy: Create metadata block - Guppy->>Storacha: space/blob/add (metadata block) - Storacha-->>Guppy: metadataCID - Guppy->>Pail: crdt.Put(source-name, metadataCID) - else Plaintext source - Guppy->>Guppy: ExecuteUpload (existing pipeline) - Storacha-->>Guppy: rootCID - Guppy->>Pail: crdt.Put(source-name, rootCID) - end - Pail-->>Guppy: eventCID + alt --encrypt flag provided + Guppy->>Guppy: Generate random DEK (256-bit) + Guppy->>Guppy: For each block: generate IV, encrypt with DEK+IV + Guppy->>Guppy: Build UnixFS DAG from encrypted blocks + Guppy->>Storacha: space/blob/add (encrypted shards) + + Guppy->>KMS: Wrap DEK with KEK + KMS-->>Guppy: wrappedDEK + Guppy->>Guppy: Create metadata block + Guppy->>Storacha: space/blob/add (metadata block) + Storacha-->>Guppy: metadataCID + Guppy->>Pail: crdt.Put(key, metadataCID) + else Plaintext + Guppy->>Guppy: ExecuteUpload (existing pipeline) + Storacha-->>Guppy: rootCID + Guppy->>Pail: crdt.Put(key, rootCID) end + Pail-->>Guppy: eventCID Guppy->>UCN: clock/advance(eventCID) UCN-->>Guppy: OK - Guppy-->>User: Uploaded + Guppy-->>User: Put complete: -> ``` -### Retrieve Flow (with Path Resolution + Optional Decryption) +### Get Flow (with Key Resolution + Optional Decryption) ```mermaid sequenceDiagram @@ -290,19 +339,19 @@ sequenceDiagram participant Storage participant KMS - User->>Guppy: guppy retrieve [--delegation] + User->>Guppy: guppy get [output] [--delegation] - alt content-path is a path (not CID) - Note over Guppy,Pail: Resolve path via UCN + Pail + alt key-or-cid is a key (not CID) + Note over Guppy,Pail: Resolve key via UCN + Pail Guppy->>UCN: clock/head(spaceDID) UCN-->>Guppy: head events Guppy->>Storage: Fetch Pail blocks for head Storage-->>Guppy: Pail blocks Guppy->>Pail: crdt.Root(head, blocks) Pail-->>Guppy: pailRoot - Guppy->>Pail: pail.Get(pailRoot, path) + Guppy->>Pail: crdt.Get(pailRoot, key) Pail-->>Guppy: CID (rootCID or metadataCID) - else content-path is a CID + else key-or-cid is a CID Note over Guppy: Use CID directly end @@ -324,6 +373,7 @@ sequenceDiagram ## References +- [Alan Shaw's Bucket Semantics Proposal](https://github.com/storacha/RFC/pull/84#issuecomment-4097051650) - Design rationale for `put`/`get`/`ls`/`rm` commands - [Mutability & Privacy in Storacha — Strategy Document](https://www.notion.so/storacha/Mutability-Privacy-in-Storacha-Strategy-Document-3125305b5524807fb4a1ce6a3c9201e8) (internal) - [Storacha UCN Package](https://github.com/storacha/upload-service/tree/main/packages/ucn) - [Storacha Pail Package](https://github.com/storacha/pail)