Skip to content

rfc: mutability & encryption for forge#84

Open
fforbeck wants to merge 10 commits into
mainfrom
rfc/privacy-mutability-forge
Open

rfc: mutability & encryption for forge#84
fforbeck wants to merge 10 commits into
mainfrom
rfc/privacy-mutability-forge

Conversation

@fforbeck
Copy link
Copy Markdown
Member

@fforbeck fforbeck commented Mar 4, 2026

Forge Mutability & Encryption - Summary

What We're Building

Two complementary features for Forge enterprise customers:

  1. Mutability - Track and update content with stable paths
  2. Encryption - Protect content with client-side encryption and access control

Mutability (📖 Forge Mutability RFC)

Problem: Content on IPFS is immutable (CIDs change when content changes). Enterprise backup workflows need stable references that update to the latest version.

Solution: Extend existing Guppy commands (upload source add, upload, retrieve) with Pail + UCN integration.

How it works:

  • Pail - A sharded data structure that maps paths to CIDs
  • UCN - Publishes updates so other clients can resolve the latest version

Commands:

guppy upload source add <space> <path> --name backups   # Register source, init Pail bucket
guppy upload <space> backups                             # Upload + store path→CID in Pail
guppy retrieve <space> backups ./output                  # Retrieve by path (resolves via Pail)

Encryption (📖 Forge Encryption RFC)

Problem: Enterprise customers need data protection before content leaves their systems.

Solution: Client-side encryption with UCAN-gated access control.

How it works:

  • Per-file encryption - Each file gets its own key (DEK)
  • KMS integration - Keys are wrapped and managed by Storacha KMS (KEK)
  • UCAN delegations - Fine-grained access control (per-file or per-folder)

Encryption modes

Mode Use Case Access Control
KMS (production) Enterprise UCAN-gated
Local key (dev) Testing Anyone with key

How They Work Together

Encryption is an opt-in flag on the existing upload source add command:

# Plaintext source (mutability only)
guppy upload source add did:key:z6Mk... /my-backups --name backups

# Encrypted source (mutability + encryption)
guppy upload source add did:key:z6Mk... /my-secrets --name secrets --encrypt

Key rotation requires mutability because rotating keys creates new metadata CIDs that need to be tracked in Pail.

Implementation Status

Component Status
Block-level encryption (AES-256-CTR) ✅ POC done
Pail library (Go) ✅ Exists
UCN wrapper 🔨 Needs Go impl
KMS integration 🔨 TODO
Guppy command extensions 🔨 TODO

Why This Approach?

  • Pail + UCN = battle-tested components, already used in Storacha
  • Per-file encryption = matches industry standard (AWS S3, Azure, GCP)
  • Minimal changes = extends existing commands, encryption is just a flag

Closes storacha/project-tracking#663

@fforbeck fforbeck requested a review from a team March 4, 2026 16:47
@fforbeck fforbeck self-assigned this Mar 4, 2026
Copy link
Copy Markdown
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is good, but there are some critical missing bits, that I think will emerge from working closely with @Peeja , @alanshaw and the existing go devs.

Specifically,

  1. "Service layer" -- is this a server? a secondary process process on the dev machine? I don't either of these are a good idea, and the server approach would break a number of design principles about our system (namely that all CIDs should be generated on the client). Personally I think a language port is gonna be WAY faster (especially with AI aided dev) and product a way less complex system, so I'd argue strongly for a full port. These aren't complex libraries and I think we could have them ported in a week or two with the AI helping. And then we have a single process for a single machine, way simpler to maintain and reason about.

  2. There's a bit of unspecified confusion about how Pail works in the Forge context, that I think you might need to embed with @Peeja on guppy to really grok. So Guppy has a notion of "sources" -- i.e. data sources (usually large, deep directories) that get uploaded within a space. Each space has 1..n sources, and when you upload within a space, after the first upload of a source, only the "delta" gets updated-- Guppy knows how to upload just blocks to make a new updated UnixFS root. So with mutabiltiy:

  • You have the list of sources which get updated, and you DEFINITELY want that to be represented by Pail + UCN.
  • You have the directory tree structure within the sources itself. This is currently UnixFS and is updated properly each incremental upload.
  • So the real question is about whether to use Pail for the whole directory tree, and I think that's a complicated question that merits further examination

Reasons not to use Pail:

  1. These are extremely big complicated directories and Pail hasn't been tested at a scale even remotely close to working with these directories
  2. The retrieval patters and general usage for Pail is totally different than for UnixFS -- so the downstream change implications of using Pail for the whole directory tree structure are unknown.

Reasons to use Pail:

  1. Much more fine grained "multi-writer" capabilities are unlocked if you use Pail for everything. If you used pail for just the sources list, then you'd essentially have a last-writer-wins on a per-source level -- if source X is in state A, and two different guppies make several changes to the directory tree structure, written as UnixFS, then the directory structure would by default ONLY get the changes of the last client to write. Note: we could apply a smarter merge outside of PAIL, similar to the way I merge Markdown files in Clawracha. I actually believe this wouldn't be TOO hard.

Final sidebar: Current Guppy is also smart enough to only upload diff blocks for Files when they change. Encryption will kill that ability I believe, unless there's some useful way to encode only changes that works for encrypted data. Worth a google.

Comment thread rfc/forge-mutability-encryption.md
Comment thread rfc/forge-mutability-encryption.md
@fforbeck
Copy link
Copy Markdown
Member Author

After the feedback received and the new POC completed by @alanshaw, I decided to break it into 2 RFCs:

Copy link
Copy Markdown
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty set on using Pail, especially since the go library exists, but if @alanshaw disagrees I'd be open to using the catalog approach.

Comment thread rfc/forge-encryption.md
Comment thread rfc/forge-encryption.md
Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-encryption.md
Comment thread rfc/forge-mutability.md Outdated
Comment thread rfc/forge-mutability.md Outdated
Comment thread rfc/forge-mutability.md Outdated
@fforbeck fforbeck requested a review from hannahhoward March 17, 2026 16:08
@fforbeck
Copy link
Copy Markdown
Member Author

@hannahhoward & @alanshaw - I've updated the Mutability RFC and addressed Hannah's suggestions.

Comment thread rfc/forge-encryption.md
Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-encryption.md
Comment thread rfc/forge-encryption.md
Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-encryption.md Outdated
1. User requests decryption with delegation containing `nb.prefix`
2. KMS extracts `path` from encrypted metadata block
3. KMS validates: `path.startsWith(delegation.nb.prefix)`
4. If valid → unwrap DEK; if invalid → reject
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the service know what the wrapped decryption key is? Does it have to fetch the content? Does it need a retrieval delegation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could combine the path with the DEK before wrapping it, and then use the wrapped DEK in KMS. WDYT?

Comment thread rfc/forge-encryption.md Outdated
Comment thread rfc/forge-mutability.md
1. **Mutable References**: Stable pointers that update to the latest version of uploaded content
2. **Content Catalog**: Track uploaded content (path to CID mappings) across clients

**Related:** [forge-encryption.md](./forge-encryption.md) - Encryption key rotation depends on mutability to track metadata CID changes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth being a bit more explicit about this in the encryption doc. Is it just this? It does not depend on mutability otherwise?

Comment thread rfc/forge-mutability.md Outdated
Comment thread rfc/forge-mutability.md Outdated
Comment thread rfc/forge-mutability.md Outdated
@fforbeck
Copy link
Copy Markdown
Member Author

@alanshaw & @hannahhoward - we talked about starting v1 storing only the root path in Pail to handle mutability, I think that makes sense for the non-encrypted files. However, for the encryption, we kind of need the full path to handle the key rotation process.

For encrypted uploads, we need filePath -> metadataCID entries because:

  • Each file has its own DEK (wrapped in its encrypted metadata block)
  • KEK rotation needs to iterate all encrypted files to re-wrap DEKs
  • Path-based access control (nb.path) needs to validate file paths

Proposal

Upload Type Pail Entry Rationale
Raw rootPath -> rootCID File structure in UnixFS, traverse from root
Encrypted filePath -> metadataCID Each file has its own DEK, needs individual entries for rotation & access control

Does this dual strategy make sense, or should we unify?

Comment thread rfc/forge-encryption.md Outdated
@alanshaw
Copy link
Copy Markdown
Member

alanshaw commented Mar 20, 2026

RFC: An alternate interface for mutable content in Guppy

My main concern with adding a "name" to a "source" is that it's not an explicit action. It's a passive attribute of the source that we've decided will be used after the data is uploaded to record a name for the uploaded DAG. The name is attached to a collection of static files AND it is being used as a mutable pointer that MAY point to the DAG created by said source.

Lets walk though the proposed interface:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/alans/stuff

When I type guppy upload did:key:zSpace what do I expect to happen? All my named sources get uploaded and appear in my guppy ls output:

$ guppy upload did:key:zSpace
bafyAlansStuff

$ guppy ls did:key:zSpace
My Stuff    bafyAlansStuff

I delegate access to did:key:zSpace to Felipe.

Felipe runs:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/felipes/stuff

Then Felipe runs guppy upload did:key:zSpace:

$ guppy upload did:key:zSpace
bafyFelipesStuff

$ guppy ls did:key:zSpace
My Stuff    bafyFelipesStuff

What happens when if I want to update "My Stuff" to point back at my stuff not Felipe's stuff? Can I run guppy upload did:key:zSpace? No, I believe guppy will just think everything is done already. Do I need to add a new source with the same name?

This is an odd situation.

We could make guppy upload did:key:zSpace always update the name to the value we think it should be...but then what happens with old sources that I don't want to update? i.e. Felipe's stuff is the right stuff, but I just typed guppy upload did:key:zSpace to upload a different source I added e.g. "My Pictures" and it overwrites "My Stuff" as well!

One possible solution to prevent issues like this is for guppy upload to ALWAYS take a file path/source directory.

However at this point I find myself questioning the exposure of sources in the CLI UX at all. Why do I need to create a source before uploading it? If I am always going to reference the file path in the upload command, why doesn't it just create or use the source for the directory I provide to guppy upload <space> <path>?

...but if we don't have sources where does the name go? Well, what if we provide a name to guppy upload?

$ guppy upload did:key:zSpace "My Stuff" /path/to/alans/stuff

This feels better, upload is an action and it looks like we're asking the name of this upload to be "My Stuff" and I feel it's more concievable that this will overwrite another upload with the same name. Note that this also allows the same source to be put to multiple paths. e.g.

$ guppy upload did:key:zSpace "Backups/My Stuff/2026-04-20" /path/to/alans/stuff

it's more concievable that this will overwrite another upload with the same name

IMHO it's better, and I could probably get behind this. However, the name for this upload is not an attribute, it's more like an ID, a path or a key i.e. if you upload with the same key it'll overwrite. The problem is we've mixed uploading files and managing buckets. The word "upload" makes it look like the name is an attribute not an identity. Also calling it "name" and not "key".

One way we could easily rectify this would be to switch to using bucket semantics for our CLI commands. i.e. instead of upload we prefer put, and instead of name we use key. I think this more accurately describes the action.

The magic is that when you provide a file path the CLI will automatically create a source and upload that data for you. This is expected, and is no different from say, the AWS CLI.

So then we can distil our command to simply:

$ guppy put <space> <key> <path>
  • It's incredibly short.
  • It accurately describes the action.
  • It combines what previously was two commands (guppy upload source add and guppy upload) into one.
  • It adds multabiliy as a first class citizen.

Side note, this also allows users to construct buckets that contain data that they didn't directly upload to a space e.g. we might in the future support <path-or-cid>.

Proposal Summary

To Summarize, my proposal would be:

  • Add guppy put <space> <key> <path>.
    • It'll add <path> as a source automatically if it does not exist.
    • It'll always set the value of name to the root CID of the DAG generated by <path>, even if the source thinks the DAG has been fully uploaded.
  • Change guppy ls to list the keys.
  • Add guppy get <space> <key> to retrieve a value for a key
  • Remove guppy retrieve - mutable references are the default.
  • Remove guppy upload - mutable references are the default.
  • Remove guppy upload source add command in favour of sources being created as a result of a put action.
  • Optionally retain source ls/rm to allow users to manually garbage collect their DB.
    • Move guppy upload source to guppy source.

@fforbeck
Copy link
Copy Markdown
Member Author

RFC: An alternate interface for mutable content in Guppy

My main concern with adding a "name" to a "source" is that it's not an explicit action. It's a passive attribute of the source that we've decided will be used after the data is uploaded to record a name for the uploaded DAG. The name is attached to a collection of static files AND it is being used as a mutable pointer that MAY point to the DAG created by said source.

Lets walk though the proposed interface:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/alans/stuff

When I type guppy upload did:key:zSpace what do I expect to happen? All my named sources get uploaded and appear in my guppy ls output:

$ guppy upload did:key:zSpace
bafyAlansStuff

$ guppy ls did:key:zSpace
My Stuff    bafyAlansStuff

I delegate access to did:key:zSpace to Felipe.

Felipe runs:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/felipes/stuff

Then Felipe runs guppy upload did:key:zSpace:

$ guppy upload did:key:zSpace
bafyFelipesStuff

$ guppy ls did:key:zSpace
My Stuff    bafyFelipesStuff

What happens when if I want to update "My Stuff" to point back at my stuff not Felipe's stuff? Can I run guppy upload did:key:zSpace? No, I believe guppy will just think everything is done already. Do I need to add a new source with the same name?

This is an odd situation.

We could make guppy upload did:key:zSpace always update the name to the value we think it should be...but then what happens with old sources that I don't want to update? i.e. Felipe's stuff is the right stuff, but I just typed guppy upload did:key:zSpace to upload a different source I added e.g. "My Pictures" and it overwrites "My Stuff" as well!

One possible solution to prevent issues like this is for guppy upload to ALWAYS take a file path/source directory.

However at this point I find myself questioning the exposure of sources in the CLI UX at all. Why do I need to create a source before uploading it? If I am always going to reference the file path in the upload command, why doesn't it just create or use the source for the directory I provide to guppy upload <space> <path>?

...but if we don't have sources where does the name go? Well, what if we provide a name to guppy upload?

$ guppy upload did:key:zSpace "My Stuff" /path/to/alans/stuff

This feels better, upload is an action and it looks like we're asking the name of this upload to be "My Stuff" and I feel it's more concievable that this will overwrite another upload with the same name. Note that this also allows the same source to be put to multiple paths. e.g.

$ guppy upload did:key:zSpace "Backups/My Stuff/2026-04-20" /path/to/alans/stuff

it's more concievable that this will overwrite another upload with the same name

IMHO it's better, and I could probably get behind this. However, the name for this upload is not an attribute, it's more like an ID, a path or a key i.e. if you upload with the same key it'll overwrite. The problem is we've mixed uploading files and managing buckets. The word "upload" makes it look like the name is an attribute not an identity. Also calling it "name" and not "key".

One way we could easily rectify this would be to switch to using bucket semantics for our CLI commands. i.e. instead of upload we prefer put, and instead of name we use key. I think this more accurately describes the action.

The magic is that when you provide a file path the CLI will automatically create a source and upload that data for you. This is expected, and is no different from say, the AWS CLI.

So then we can distil our command to simply:

$ guppy put <space> <key> <path>
  • It's incredibly short.
  • It accurately describes the action.
  • It combines what previously was two commands (guppy upload source add and guppy upload) into one.
  • It adds multabiliy as a first class citizen.

Side note, this also allows users to construct buckets that contain data that they didn't directly upload to a space e.g. we might in the future support <path-or-cid>.

Proposal Summary

To Summarize, my proposal would be:

  • Add guppy put <space> <key> <path>.

    • It'll add <path> as a source automatically if it does not exist.
    • It'll always set the value of name to the root CID of the DAG generated by <path>, even if the source thinks the DAG has been fully uploaded.
  • Change guppy ls to list the keys.

  • Add guppy get <space> <key> to retrieve a value for a key

  • Remove guppy retrieve - mutable references are the default.

  • Remove guppy upload - mutable references are the default.

  • Remove guppy upload source add command in favour of sources being created as a result of a put action.

  • Optionally retain source ls/rm to allow users to manually garbage collect their DB.

    • Move guppy upload source to guppy source.

@alanshaw - I really liked the idea of having new commands to handle the bucket operations, that looks more natural and easy to reason about, plus the key attribute solution is quite elegant. I do think we should move into this direction, and I am updating the RFC with this proposal. In addition to that, I like the fact that it will be more similar to the S3/GCS bucket operations, which adds less overhead and improves the DX.

@BravoNatalie
Copy link
Copy Markdown
Contributor

RFC: An alternate interface for mutable content in Guppy

My main concern with adding a "name" to a "source" is that it's not an explicit action. It's a passive attribute of the source that we've decided will be used after the data is uploaded to record a name for the uploaded DAG. The name is attached to a collection of static files AND it is being used as a mutable pointer that MAY point to the DAG created by said source.

Lets walk though the proposed interface:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/alans/stuff

When I type guppy upload did:key:zSpace what do I expect to happen? All my named sources get uploaded and appear in my guppy ls output:

$ guppy upload did:key:zSpace
bafyAlansStuff

$ guppy ls did:key:zSpace
My Stuff    bafyAlansStuff

I delegate access to did:key:zSpace to Felipe.

Felipe runs:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/felipes/stuff

Then Felipe runs guppy upload did:key:zSpace:

$ guppy upload did:key:zSpace
bafyFelipesStuff

$ guppy ls did:key:zSpace
My Stuff    bafyFelipesStuff

What happens when if I want to update "My Stuff" to point back at my stuff not Felipe's stuff? Can I run guppy upload did:key:zSpace? No, I believe guppy will just think everything is done already. Do I need to add a new source with the same name?

This is an odd situation.

We could make guppy upload did:key:zSpace always update the name to the value we think it should be...but then what happens with old sources that I don't want to update? i.e. Felipe's stuff is the right stuff, but I just typed guppy upload did:key:zSpace to upload a different source I added e.g. "My Pictures" and it overwrites "My Stuff" as well!

One possible solution to prevent issues like this is for guppy upload to ALWAYS take a file path/source directory.

However at this point I find myself questioning the exposure of sources in the CLI UX at all. Why do I need to create a source before uploading it? If I am always going to reference the file path in the upload command, why doesn't it just create or use the source for the directory I provide to guppy upload <space> <path>?

...but if we don't have sources where does the name go? Well, what if we provide a name to guppy upload?

$ guppy upload did:key:zSpace "My Stuff" /path/to/alans/stuff

This feels better, upload is an action and it looks like we're asking the name of this upload to be "My Stuff" and I feel it's more concievable that this will overwrite another upload with the same name. Note that this also allows the same source to be put to multiple paths. e.g.

$ guppy upload did:key:zSpace "Backups/My Stuff/2026-04-20" /path/to/alans/stuff

it's more concievable that this will overwrite another upload with the same name

IMHO it's better, and I could probably get behind this. However, the name for this upload is not an attribute, it's more like an ID, a path or a key i.e. if you upload with the same key it'll overwrite. The problem is we've mixed uploading files and managing buckets. The word "upload" makes it look like the name is an attribute not an identity. Also calling it "name" and not "key".

One way we could easily rectify this would be to switch to using bucket semantics for our CLI commands. i.e. instead of upload we prefer put, and instead of name we use key. I think this more accurately describes the action.

The magic is that when you provide a file path the CLI will automatically create a source and upload that data for you. This is expected, and is no different from say, the AWS CLI.

So then we can distil our command to simply:

$ guppy put <space> <key> <path>
  • It's incredibly short.
  • It accurately describes the action.
  • It combines what previously was two commands (guppy upload source add and guppy upload) into one.
  • It adds multabiliy as a first class citizen.

Side note, this also allows users to construct buckets that contain data that they didn't directly upload to a space e.g. we might in the future support <path-or-cid>.

Proposal Summary

To Summarize, my proposal would be:

  • Add guppy put <space> <key> <path>.

    • It'll add <path> as a source automatically if it does not exist.
    • It'll always set the value of name to the root CID of the DAG generated by <path>, even if the source thinks the DAG has been fully uploaded.
  • Change guppy ls to list the keys.

  • Add guppy get <space> <key> to retrieve a value for a key

  • Remove guppy retrieve - mutable references are the default.

  • Remove guppy upload - mutable references are the default.

  • Remove guppy upload source add command in favour of sources being created as a result of a put action.

  • Optionally retain source ls/rm to allow users to manually garbage collect their DB.

    • Move guppy upload source to guppy source.

Like Felipe, I agree this makes the DX more intuitive and clears up the confusion around name as an identity. That said, I think there are a couple of things worth thinking about before moving forward with this change.

The current two-step approach lets users add sources over time and then run uploads cumulatively, instead of just doing one-off uploads. Guppy also supports different configs per source (like shard size). That might not be a big deal to drop, but it really depends on whether users actually use different shard sizes per source, I’m not sure if they do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plan privacy/mutability story

4 participants