rfc: mutability & encryption for forge by fforbeck · Pull Request #84 · storacha/RFC

fforbeck · 2026-03-04T16:47:27Z

Forge Mutability & Encryption - Summary

What We're Building

Two complementary features for Forge enterprise customers:

Mutability - Track and update content with stable paths
Encryption - Protect content with client-side encryption and access control

Mutability (📖 Forge Mutability RFC)

Problem: Content on IPFS is immutable (CIDs change when content changes). Enterprise backup workflows need stable references that update to the latest version.

Solution: Extend existing Guppy commands (upload source add, upload, retrieve) with Pail + UCN integration.

How it works:

Pail - A sharded data structure that maps paths to CIDs
UCN - Publishes updates so other clients can resolve the latest version

Commands:

guppy upload source add <space> <path> --name backups   # Register source, init Pail bucket
guppy upload <space> backups                             # Upload + store path→CID in Pail
guppy retrieve <space> backups ./output                  # Retrieve by path (resolves via Pail)

Encryption (📖 Forge Encryption RFC)

Problem: Enterprise customers need data protection before content leaves their systems.

Solution: Client-side encryption with UCAN-gated access control.

How it works:

Per-file encryption - Each file gets its own key (DEK)
KMS integration - Keys are wrapped and managed by Storacha KMS (KEK)
UCAN delegations - Fine-grained access control (per-file or per-folder)

Encryption modes

Mode	Use Case	Access Control
KMS (production)	Enterprise	UCAN-gated
Local key (dev)	Testing	Anyone with key

How They Work Together

Encryption is an opt-in flag on the existing upload source add command:

# Plaintext source (mutability only)
guppy upload source add did:key:z6Mk... /my-backups --name backups

# Encrypted source (mutability + encryption)
guppy upload source add did:key:z6Mk... /my-secrets --name secrets --encrypt

Key rotation requires mutability because rotating keys creates new metadata CIDs that need to be tracked in Pail.

Implementation Status

Component	Status
Block-level encryption (AES-256-CTR)	✅ POC done
Pail library (Go)	✅ Exists
UCN wrapper	🔨 Needs Go impl
KMS integration	🔨 TODO
Guppy command extensions	🔨 TODO

Why This Approach?

Pail + UCN = battle-tested components, already used in Storacha
Per-file encryption = matches industry standard (AWS S3, Azure, GCP)
Minimal changes = extends existing commands, encryption is just a flag

Closes storacha/project-tracking#663

hannahhoward

Overall, this is good, but there are some critical missing bits, that I think will emerge from working closely with @Peeja , @alanshaw and the existing go devs.

Specifically,

"Service layer" -- is this a server? a secondary process process on the dev machine? I don't either of these are a good idea, and the server approach would break a number of design principles about our system (namely that all CIDs should be generated on the client). Personally I think a language port is gonna be WAY faster (especially with AI aided dev) and product a way less complex system, so I'd argue strongly for a full port. These aren't complex libraries and I think we could have them ported in a week or two with the AI helping. And then we have a single process for a single machine, way simpler to maintain and reason about.
There's a bit of unspecified confusion about how Pail works in the Forge context, that I think you might need to embed with @Peeja on guppy to really grok. So Guppy has a notion of "sources" -- i.e. data sources (usually large, deep directories) that get uploaded within a space. Each space has 1..n sources, and when you upload within a space, after the first upload of a source, only the "delta" gets updated-- Guppy knows how to upload just blocks to make a new updated UnixFS root. So with mutabiltiy:

You have the list of sources which get updated, and you DEFINITELY want that to be represented by Pail + UCN.
You have the directory tree structure within the sources itself. This is currently UnixFS and is updated properly each incremental upload.
So the real question is about whether to use Pail for the whole directory tree, and I think that's a complicated question that merits further examination

Reasons not to use Pail:

These are extremely big complicated directories and Pail hasn't been tested at a scale even remotely close to working with these directories
The retrieval patters and general usage for Pail is totally different than for UnixFS -- so the downstream change implications of using Pail for the whole directory tree structure are unknown.

Reasons to use Pail:

Much more fine grained "multi-writer" capabilities are unlocked if you use Pail for everything. If you used pail for just the sources list, then you'd essentially have a last-writer-wins on a per-source level -- if source X is in state A, and two different guppies make several changes to the directory tree structure, written as UnixFS, then the directory structure would by default ONLY get the changes of the last client to write. Note: we could apply a smarter merge outside of PAIL, similar to the way I merge Markdown files in Clawracha. I actually believe this wouldn't be TOO hard.

Final sidebar: Current Guppy is also smart enough to only upload diff blocks for Files when they change. Encryption will kill that ability I believe, unless there's some useful way to encode only changes that works for encrypted data. Worth a google.

fforbeck · 2026-03-11T17:20:19Z

After the feedback received and the new POC completed by @alanshaw, I decided to break it into 2 RFCs:

hannahhoward

I'm pretty set on using Pail, especially since the go library exists, but if @alanshaw disagrees I'd be open to using the catalog approach.

fforbeck · 2026-03-17T16:09:21Z

@hannahhoward & @alanshaw - I've updated the Mutability RFC and addressed Hannah's suggestions.

alanshaw · 2026-03-17T16:04:38Z

+1. User requests decryption with delegation containing `nb.prefix`
+2. KMS extracts `path` from encrypted metadata block
+3. KMS validates: `path.startsWith(delegation.nb.prefix)`
+4. If valid → unwrap DEK; if invalid → reject


How does the service know what the wrapped decryption key is? Does it have to fetch the content? Does it need a retrieval delegation?

We could combine the path with the DEK before wrapping it, and then use the wrapped DEK in KMS. WDYT?

alanshaw · 2026-03-17T16:45:30Z

+1. **Mutable References**: Stable pointers that update to the latest version of uploaded content
+2. **Content Catalog**: Track uploaded content (path to CID mappings) across clients
+
+**Related:** [forge-encryption.md](./forge-encryption.md) - Encryption key rotation depends on mutability to track metadata CID changes.


Perhaps worth being a bit more explicit about this in the encryption doc. Is it just this? It does not depend on mutability otherwise?

fforbeck · 2026-03-18T14:35:21Z

@alanshaw & @hannahhoward - we talked about starting v1 storing only the root path in Pail to handle mutability, I think that makes sense for the non-encrypted files. However, for the encryption, we kind of need the full path to handle the key rotation process.

For encrypted uploads, we need filePath -> metadataCID entries because:

Each file has its own DEK (wrapped in its encrypted metadata block)
KEK rotation needs to iterate all encrypted files to re-wrap DEKs
Path-based access control (nb.path) needs to validate file paths

Proposal

Upload Type	Pail Entry	Rationale
Raw	`rootPath -> rootCID`	File structure in UnixFS, traverse from root
Encrypted	`filePath -> metadataCID`	Each file has its own DEK, needs individual entries for rotation & access control

Does this dual strategy make sense, or should we unify?

alanshaw · 2026-03-20T10:20:10Z

RFC: An alternate interface for mutable content in Guppy

My main concern with adding a "name" to a "source" is that it's not an explicit action. It's a passive attribute of the source that we've decided will be used after the data is uploaded to record a name for the uploaded DAG. The name is attached to a collection of static files AND it is being used as a mutable pointer that MAY point to the DAG created by said source.

Lets walk though the proposed interface:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/alans/stuff

When I type guppy upload did:key:zSpace what do I expect to happen? All my named sources get uploaded and appear in my guppy ls output:

$ guppy upload did:key:zSpace
bafyAlansStuff

$ guppy ls did:key:zSpace
My Stuff    bafyAlansStuff

I delegate access to did:key:zSpace to Felipe.

Felipe runs:

$ guppy upload source add did:key:zSpace "My Stuff" /path/to/felipes/stuff

Then Felipe runs guppy upload did:key:zSpace:

$ guppy upload did:key:zSpace
bafyFelipesStuff

$ guppy ls did:key:zSpace
My Stuff    bafyFelipesStuff

What happens when if I want to update "My Stuff" to point back at my stuff not Felipe's stuff? Can I run guppy upload did:key:zSpace? No, I believe guppy will just think everything is done already. Do I need to add a new source with the same name?

This is an odd situation.

We could make guppy upload did:key:zSpace always update the name to the value we think it should be...but then what happens with old sources that I don't want to update? i.e. Felipe's stuff is the right stuff, but I just typed guppy upload did:key:zSpace to upload a different source I added e.g. "My Pictures" and it overwrites "My Stuff" as well!

One possible solution to prevent issues like this is for guppy upload to ALWAYS take a file path/source directory.

However at this point I find myself questioning the exposure of sources in the CLI UX at all. Why do I need to create a source before uploading it? If I am always going to reference the file path in the upload command, why doesn't it just create or use the source for the directory I provide to guppy upload <space> <path>?

...but if we don't have sources where does the name go? Well, what if we provide a name to guppy upload?

$ guppy upload did:key:zSpace "My Stuff" /path/to/alans/stuff

This feels better, upload is an action and it looks like we're asking the name of this upload to be "My Stuff" and I feel it's more concievable that this will overwrite another upload with the same name. Note that this also allows the same source to be put to multiple paths. e.g.

$ guppy upload did:key:zSpace "Backups/My Stuff/2026-04-20" /path/to/alans/stuff

it's more concievable that this will overwrite another upload with the same name

IMHO it's better, and I could probably get behind this. However, the name for this upload is not an attribute, it's more like an ID, a path or a key i.e. if you upload with the same key it'll overwrite. The problem is we've mixed uploading files and managing buckets. The word "upload" makes it look like the name is an attribute not an identity. Also calling it "name" and not "key".

One way we could easily rectify this would be to switch to using bucket semantics for our CLI commands. i.e. instead of upload we prefer put, and instead of name we use key. I think this more accurately describes the action.

The magic is that when you provide a file path the CLI will automatically create a source and upload that data for you. This is expected, and is no different from say, the AWS CLI.

So then we can distil our command to simply:

$ guppy put <space> <key> <path>

It's incredibly short.
It accurately describes the action.
It combines what previously was two commands (guppy upload source add and guppy upload) into one.
It adds multabiliy as a first class citizen.

Side note, this also allows users to construct buckets that contain data that they didn't directly upload to a space e.g. we might in the future support <path-or-cid>.

Proposal Summary

To Summarize, my proposal would be:

Add guppy put <space> <key> <path>.
- It'll add <path> as a source automatically if it does not exist.
- It'll always set the value of name to the root CID of the DAG generated by <path>, even if the source thinks the DAG has been fully uploaded.
Change guppy ls to list the keys.
Add guppy get <space> <key> to retrieve a value for a key
Remove guppy retrieve - mutable references are the default.
Remove guppy upload - mutable references are the default.
Remove guppy upload source add command in favour of sources being created as a result of a put action.
Optionally retain source ls/rm to allow users to manually garbage collect their DB.
- Move guppy upload source to guppy source.

fforbeck · 2026-03-25T17:43:23Z

RFC: An alternate interface for mutable content in Guppy

My main concern with adding a "name" to a "source" is that it's not an explicit action. It's a passive attribute of the source that we've decided will be used after the data is uploaded to record a name for the uploaded DAG. The name is attached to a collection of static files AND it is being used as a mutable pointer that MAY point to the DAG created by said source.

Lets walk though the proposed interface:
$ guppy upload source add did:key:zSpace "My Stuff" /path/to/alans/stuff
When I type guppy upload did:key:zSpace what do I expect to happen? All my named sources get uploaded and appear in my guppy ls output:
$ guppy upload did:key:zSpace
bafyAlansStuff

$ guppy ls did:key:zSpace
My Stuff    bafyAlansStuff
I delegate access to did:key:zSpace to Felipe.

Felipe runs:
$ guppy upload source add did:key:zSpace "My Stuff" /path/to/felipes/stuff
Then Felipe runs guppy upload did:key:zSpace:
$ guppy upload did:key:zSpace
bafyFelipesStuff

$ guppy ls did:key:zSpace
My Stuff    bafyFelipesStuff
What happens when if I want to update "My Stuff" to point back at my stuff not Felipe's stuff? Can I run guppy upload did:key:zSpace? No, I believe guppy will just think everything is done already. Do I need to add a new source with the same name?

This is an odd situation.

We could make guppy upload did:key:zSpace always update the name to the value we think it should be...but then what happens with old sources that I don't want to update? i.e. Felipe's stuff is the right stuff, but I just typed guppy upload did:key:zSpace to upload a different source I added e.g. "My Pictures" and it overwrites "My Stuff" as well!

One possible solution to prevent issues like this is for guppy upload to ALWAYS take a file path/source directory.

However at this point I find myself questioning the exposure of sources in the CLI UX at all. Why do I need to create a source before uploading it? If I am always going to reference the file path in the upload command, why doesn't it just create or use the source for the directory I provide to guppy upload <space> <path>?

...but if we don't have sources where does the name go? Well, what if we provide a name to guppy upload?
$ guppy upload did:key:zSpace "My Stuff" /path/to/alans/stuff
This feels better, upload is an action and it looks like we're asking the name of this upload to be "My Stuff" and I feel it's more concievable that this will overwrite another upload with the same name. Note that this also allows the same source to be put to multiple paths. e.g.
$ guppy upload did:key:zSpace "Backups/My Stuff/2026-04-20" /path/to/alans/stuff
it's more concievable that this will overwrite another upload with the same name

IMHO it's better, and I could probably get behind this. However, the name for this upload is not an attribute, it's more like an ID, a path or a key i.e. if you upload with the same key it'll overwrite. The problem is we've mixed uploading files and managing buckets. The word "upload" makes it look like the name is an attribute not an identity. Also calling it "name" and not "key".

One way we could easily rectify this would be to switch to using bucket semantics for our CLI commands. i.e. instead of upload we prefer put, and instead of name we use key. I think this more accurately describes the action.

The magic is that when you provide a file path the CLI will automatically create a source and upload that data for you. This is expected, and is no different from say, the AWS CLI.

So then we can distil our command to simply:
$ guppy put <space> <key> <path>
It's incredibly short.

It accurately describes the action.

It combines what previously was two commands (guppy upload source add and guppy upload) into one.

It adds multabiliy as a first class citizen.

Side note, this also allows users to construct buckets that contain data that they didn't directly upload to a space e.g. we might in the future support <path-or-cid>.

Proposal Summary

To Summarize, my proposal would be:

Add guppy put <space> <key> <path>.

It'll add <path> as a source automatically if it does not exist.

It'll always set the value of name to the root CID of the DAG generated by <path>, even if the source thinks the DAG has been fully uploaded.

Change guppy ls to list the keys.

Add guppy get <space> <key> to retrieve a value for a key

Remove guppy retrieve - mutable references are the default.

Remove guppy upload - mutable references are the default.

Remove guppy upload source add command in favour of sources being created as a result of a put action.

Optionally retain source ls/rm to allow users to manually garbage collect their DB.

Move guppy upload source to guppy source.

@alanshaw - I really liked the idea of having new commands to handle the bucket operations, that looks more natural and easy to reason about, plus the key attribute solution is quite elegant. I do think we should move into this direction, and I am updating the RFC with this proposal. In addition to that, I like the fact that it will be more similar to the S3/GCS bucket operations, which adds less overhead and improves the DX.

BravoNatalie · 2026-03-26T02:09:40Z

RFC: An alternate interface for mutable content in Guppy

My main concern with adding a "name" to a "source" is that it's not an explicit action. It's a passive attribute of the source that we've decided will be used after the data is uploaded to record a name for the uploaded DAG. The name is attached to a collection of static files AND it is being used as a mutable pointer that MAY point to the DAG created by said source.

Lets walk though the proposed interface:
$ guppy upload source add did:key:zSpace "My Stuff" /path/to/alans/stuff
When I type guppy upload did:key:zSpace what do I expect to happen? All my named sources get uploaded and appear in my guppy ls output:
$ guppy upload did:key:zSpace
bafyAlansStuff

$ guppy ls did:key:zSpace
My Stuff    bafyAlansStuff
I delegate access to did:key:zSpace to Felipe.

Felipe runs:
$ guppy upload source add did:key:zSpace "My Stuff" /path/to/felipes/stuff
Then Felipe runs guppy upload did:key:zSpace:
$ guppy upload did:key:zSpace
bafyFelipesStuff

$ guppy ls did:key:zSpace
My Stuff    bafyFelipesStuff
What happens when if I want to update "My Stuff" to point back at my stuff not Felipe's stuff? Can I run guppy upload did:key:zSpace? No, I believe guppy will just think everything is done already. Do I need to add a new source with the same name?

This is an odd situation.

We could make guppy upload did:key:zSpace always update the name to the value we think it should be...but then what happens with old sources that I don't want to update? i.e. Felipe's stuff is the right stuff, but I just typed guppy upload did:key:zSpace to upload a different source I added e.g. "My Pictures" and it overwrites "My Stuff" as well!

One possible solution to prevent issues like this is for guppy upload to ALWAYS take a file path/source directory.

However at this point I find myself questioning the exposure of sources in the CLI UX at all. Why do I need to create a source before uploading it? If I am always going to reference the file path in the upload command, why doesn't it just create or use the source for the directory I provide to guppy upload <space> <path>?

...but if we don't have sources where does the name go? Well, what if we provide a name to guppy upload?
$ guppy upload did:key:zSpace "My Stuff" /path/to/alans/stuff
This feels better, upload is an action and it looks like we're asking the name of this upload to be "My Stuff" and I feel it's more concievable that this will overwrite another upload with the same name. Note that this also allows the same source to be put to multiple paths. e.g.
$ guppy upload did:key:zSpace "Backups/My Stuff/2026-04-20" /path/to/alans/stuff
it's more concievable that this will overwrite another upload with the same name

IMHO it's better, and I could probably get behind this. However, the name for this upload is not an attribute, it's more like an ID, a path or a key i.e. if you upload with the same key it'll overwrite. The problem is we've mixed uploading files and managing buckets. The word "upload" makes it look like the name is an attribute not an identity. Also calling it "name" and not "key".

One way we could easily rectify this would be to switch to using bucket semantics for our CLI commands. i.e. instead of upload we prefer put, and instead of name we use key. I think this more accurately describes the action.

The magic is that when you provide a file path the CLI will automatically create a source and upload that data for you. This is expected, and is no different from say, the AWS CLI.

So then we can distil our command to simply:
$ guppy put <space> <key> <path>
It's incredibly short.

It accurately describes the action.

It combines what previously was two commands (guppy upload source add and guppy upload) into one.

It adds multabiliy as a first class citizen.

Side note, this also allows users to construct buckets that contain data that they didn't directly upload to a space e.g. we might in the future support <path-or-cid>.

Proposal Summary

To Summarize, my proposal would be:

Add guppy put <space> <key> <path>.

It'll add <path> as a source automatically if it does not exist.

It'll always set the value of name to the root CID of the DAG generated by <path>, even if the source thinks the DAG has been fully uploaded.

Change guppy ls to list the keys.

Add guppy get <space> <key> to retrieve a value for a key

Remove guppy retrieve - mutable references are the default.

Remove guppy upload - mutable references are the default.

Remove guppy upload source add command in favour of sources being created as a result of a put action.

Optionally retain source ls/rm to allow users to manually garbage collect their DB.

Move guppy upload source to guppy source.

Like Felipe, I agree this makes the DX more intuitive and clears up the confusion around name as an identity. That said, I think there are a couple of things worth thinking about before moving forward with this change.

The current two-step approach lets users add sources over time and then run uploads cumulatively, instead of just doing one-off uploads. Guppy also supports different configs per source (like shard size). That might not be a big deal to drop, but it really depends on whether users actually use different shard sizes per source, I’m not sure if they do.

rfc: mutability & encryption for forge

b6a46df

fforbeck requested a review from a team March 4, 2026 16:47

fforbeck self-assigned this Mar 4, 2026

fforbeck mentioned this pull request Mar 4, 2026

Plan privacy/mutability story storacha/project-tracking#663

Open

hannahhoward reviewed Mar 5, 2026

View reviewed changes

alanshaw reviewed Mar 5, 2026

View reviewed changes

Comment thread rfc/forge-mutability-encryption.md

alanshaw reviewed Mar 5, 2026

View reviewed changes

Comment thread rfc/forge-mutability-encryption.md

fforbeck added 2 commits March 11, 2026 14:16

rfc: forge encryption

a3a48cd

rfc: forge mutability

8cdaec7

fforbeck requested review from alanshaw and hannahhoward March 11, 2026 17:19

hannahhoward requested changes Mar 16, 2026

View reviewed changes

fforbeck added 2 commits March 16, 2026 13:49

minor updates in the forge-encryption rfc

4d94f49

rfc: update mutability rfc

90f11be

fforbeck requested a review from hannahhoward March 17, 2026 16:08

alanshaw requested changes Mar 17, 2026

View reviewed changes

rfc: update privacy rfc

4ea0217

fforbeck mentioned this pull request Mar 18, 2026

Forge Encryption (Epic) storacha/project-tracking#674

Open

fforbeck mentioned this pull request Mar 18, 2026

Phase 4: Key Rotation (Requires Mutability) storacha/project-tracking#678

Open

BravoNatalie reviewed Mar 19, 2026

View reviewed changes

Comment thread rfc/forge-encryption.md Outdated

fforbeck added 3 commits March 19, 2026 09:58

minor update in encryption rfc

ade8ef0

using a new command to handle mutability & privacy in guppy

3884eb7

remove bucket cmd

ca8049f

fforbeck requested review from BravoNatalie and alanshaw March 19, 2026 16:38

fforbeck mentioned this pull request Mar 19, 2026

Forge Mutability (Epic) storacha/project-tracking#681

Open

update mutability RFC to follow the bucket cmd proposal

057dd0c

Conversation

fforbeck commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Forge Mutability & Encryption - Summary

What We're Building

Mutability (📖 Forge Mutability RFC)

Encryption (📖 Forge Encryption RFC)

How They Work Together

Implementation Status

Why This Approach?

Uh oh!

hannahhoward left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fforbeck commented Mar 11, 2026

Uh oh!

hannahhoward left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fforbeck commented Mar 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alanshaw Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

fforbeck Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alanshaw Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fforbeck commented Mar 18, 2026

Uh oh!

Uh oh!

alanshaw commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RFC: An alternate interface for mutable content in Guppy

Proposal Summary

Uh oh!

fforbeck commented Mar 25, 2026

RFC: An alternate interface for mutable content in Guppy

Proposal Summary

Uh oh!

BravoNatalie commented Mar 26, 2026

RFC: An alternate interface for mutable content in Guppy

Proposal Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fforbeck commented Mar 4, 2026 •

edited

Loading

hannahhoward left a comment •

edited

Loading

alanshaw commented Mar 20, 2026 •

edited

Loading