-
Notifications
You must be signed in to change notification settings - Fork 243
IPIP-431: Opt-in Extensible CAR Metadata on Trustless Gateway #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
2eb7b9e
3814b6a
cb1d8b3
1c0fbaa
a7e75d7
68715c4
ed86a0f
5056bde
9170c29
65ffcfc
9d1b61f
93c3c28
eacf51a
62fb207
72ed04c
e1fc296
152f4a6
b6069bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -80,7 +80,8 @@ Below response types SHOULD be supported: | |||||||||||
| - [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) | ||||||||||||
| - Disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be | ||||||||||||
| returned, implementations MAY support optional CAR content type parameters | ||||||||||||
| (:cite[ipip-0412]) and the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request). | ||||||||||||
| (:cite[ipip-0412]), the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request) | ||||||||||||
| and the optional [CAR metadata block](#car-meta-content-type-parameter). | ||||||||||||
|
|
||||||||||||
| - [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) | ||||||||||||
| - A verifiable :cite[ipns-record] (multicodec `0x0300`). | ||||||||||||
|
|
@@ -301,6 +302,32 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as | |||||||||||
| the raw data is already present in the parent block that links to the identity | ||||||||||||
| CID. | ||||||||||||
|
|
||||||||||||
| ## CAR `meta` (content type parameter) | ||||||||||||
|
|
||||||||||||
| The `meta=eof` parameter allows clients to request the server to include additional metadata about the | ||||||||||||
| CAR to be included at the end of the response body. | ||||||||||||
|
|
||||||||||||
| This parameter SHOULD only be used with CAR `version=1`. | ||||||||||||
| Values other than `eof` SHOULD be ignored. | ||||||||||||
|
|
||||||||||||
| When the parameter is not set, the server must not add any extra CAR blocks to the response. | ||||||||||||
|
|
||||||||||||
| The metadata block is a regular CAR block with the following properties: | ||||||||||||
|
|
||||||||||||
| - CID specifies multicodec `car-metadata` (`0x04ff`), see | ||||||||||||
| [multicodec#334](https://github.com/multiformats/multicodec/pull/334). | ||||||||||||
|
bajtos marked this conversation as resolved.
Outdated
|
||||||||||||
|
|
||||||||||||
| - The payload contains metadata encoded as DAG-CBOR. | ||||||||||||
|
|
||||||||||||
| The metadata MUST include the following fields: | ||||||||||||
|
|
||||||||||||
| - `len` - byte length of the CAR data (excluding the metadata block) | ||||||||||||
|
bajtos marked this conversation as resolved.
Outdated
|
||||||||||||
| - `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block). | ||||||||||||
| - `b3h_sig` - A signature over `<len><b3h><request>` using server's Ed2559 identity. | ||||||||||||
| - `len` is encoded as `varint`, | ||||||||||||
| - `b3h` is encoded as 32 bytes, | ||||||||||||
| - The effective query as executed by the gateway. This query is the request url - path and query string arguments. | ||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These Please move them to "User benefit" section of the IPIP document and explain how ps. I know other services like dagHouse use different hash functions for getting "CAR CID", putting all bets on Blake3 feels like an unnecessary divergence. Perhaps this could be made bit more future-proof and generic if blake3 is represented as Multihash wrapped in CIDv1+car codec (0x0202)? Just an idea, fine to ignore, given these are specific to SPARK. Either way, this belongs to the "userland benefiting from metadata extensibility" story.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that the Spark use case belongs in Userland section. However, the individual keys of the metadata section, and what the servers must do to implement them, feels like something that should be in the trustless gateway spec. Keys like What's more, for the Spark use case, we do not want gateway operators to know that they are serving a Spark request and not some other request. Since the Spark ones will be incentivised and other request may not be, servers may simply provide a good retrieval service to Spark clients and a poor service to other clients.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps what we need to do is leave this IPIP to concern the metadata block being appended without any constraints on what can be included in it. Then in a separate place, we define a canonical way to include a key value object in the metadata block and how the server should implement certain useful keys such as
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be ok to suggest a key name convention for generic things like
I think it would be also ok to have a documented convention for passing a hash of the CAR stream (aka CAR CID) – maybe name it
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1 to document For SPARK, we specifically want a Blake3 hash so that we can use inclusion proofs. That's why we want to use a dedicated field
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Including content path and CAR export parameters feels generic enough to keep in the spec, but we should not mix content path with car and url parameters as it leads to bugs around things like percent-encoding especially where These should be three separate fields:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lidel what is the reasoning behind splitting up
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that was a conscious design choice, to avoid mixing data selector with details of the transport format (not everything is in URL query params):
@patrickwoodhead that being said, if you want to simplify, this IPIP could go with a single dag-json map named
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about using What is our motivation in SPARK: If two SPARK checker clients submit a metadata block with the same retrieval parameters (CID, subpath, dag params, car params) then we want:
When the client requests
We need the metadata block to describe both the CID requested ( |
||||||||||||
|
|
||||||||||||
| ## CAR format parameters and determinism | ||||||||||||
|
|
||||||||||||
| The default header and block order in a CAR format is not specified by IPLD specifications. | ||||||||||||
|
|
||||||||||||
|
bajtos marked this conversation as resolved.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| --- | ||
| title: "IPIP-0431: Opt-in Extensible CAR Metadata on Trustless Gateway" | ||
| date: 2023-08-08 | ||
| ipip: proposal | ||
| editors: | ||
| - name: Miroslav Bajtoš | ||
| github: bajtos | ||
| affiliation: | ||
| name: Protocol Labs | ||
| url: https://protocol.ai/ | ||
| relatedIssues: | ||
| - https://github.com/filecoin-project/boost/issues/1597 | ||
| order: 431 | ||
| tags: ['ipips'] | ||
| --- | ||
|
|
||
| ## Summary | ||
|
|
||
| Define an optional enhancement of the CARv1 stream that allows a Gateway server to provide | ||
| additional metadata about the CARv1 response. Introduce a new content type that allows the client | ||
| and the server to signal or negotiate the inclusion of extra metadata. | ||
|
|
||
| ## Motivation | ||
|
|
||
| SPARK is a Filecoin Station module that measures the reputation of Storage Providers by periodically | ||
| retrieving a random CID. Since both SPs and SPARK nodes are permissionless, and Proof of Retrieval | ||
| is an unsolved problem, we need a way to verify that a SPARK node retrieved the given CID from the | ||
| given SP. To enable that, we need the Trustless Gateway serving the retrieval request to include a | ||
| retrieval attestation after the entire response was sent to the client. | ||
|
|
||
| Aside from this specific use case, the IPFS Ecosystem at large has no reliable | ||
| mechanism to signal that a CAR file transmission over HTTP completed successfully. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate? We already have a multicodec for CAR. Couldn't you retrieve a CAR file from a gateway by CID e.g. https://w3s.link/ipfs/bagbaierabxhdw7wglmlehzgobjuoq3v3bdv64iagjdhu74ysjvdecrezxldq - you don't need to signal successful transmission if the content hashes to the same CID. If you're using the graph API (
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there are a couple rough edges that motivated the desire to have additional signaling / metadata here:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Where do we get the "CAR CID" from? AFAIK in the majority of real world use cases:
This works only if there is no HTTP midleware in-between client and server, which is never the case. There is always some HTTP middleware or CDN in production. Once you are limited to HTTP semantics, you will cache truncated responses, and the client has to be smart enough to (1) detect that (2) be able to retry in a way that does not hit the same cache. This is why it does not work in places like Rhea/Saturn, where HTTP responses are (last time i checked) cached blindly based only on HTTP semantics without understanding internal Block/DAG structure. |
||
|
|
||
| However, we need this in order to be able to use CARs as a way of serving streaming | ||
| responses for queries. One way of solving this problem is to append an extra block at the end of the | ||
| CAR stream with information that clients can use to check whether all CAR blocks have been received. | ||
|
|
||
| ## Detailed design | ||
|
|
||
| CAR content type | ||
| ([`application/vnd.ipld.car`](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) | ||
| already supports optional parameters like `version` and `order`, which allows | ||
| HTTP client to opt-in via `Accept` header and Gateway to indicate via | ||
| `Content-Type` header which CAR flavor is returned with the response. | ||
|
|
||
| The proposed solution introduces a new parameter for the CAR content type in HTTP requests | ||
| and responses: `meta`. | ||
|
|
||
| When the CAR content type parameter `meta` is set to `eof`, the Gateway will write one additional CAR | ||
| block with metadata to the response, after it sent all CAR blocks. | ||
|
|
||
| The metadata format is DAG-CBOR and open to extension, allowing standardized | ||
| userland experimentation similar to the Extensible Data field from IPNS V2. | ||
|
|
||
| See [CAR `meta` (content type parameter)](/http-gateways/trustless-gateway/#car-meta-content-type-parameter) | ||
| in Trustless Gateway specification for more details. | ||
|
|
||
| ## Design rationale | ||
|
|
||
| The proposal introduces a minimal change allowing Gateways and retrieval clients to explicitly opt | ||
| into receiving additional metadata block at the end of the CAR response stream. | ||
|
|
||
| The metadata block is designed to be very flexible and able to support new use-cases that may arise | ||
| in the future. | ||
|
|
||
| ### User benefit | ||
|
|
||
| - Clients of trustless gateways can use the fields from the metadata as an attestation that they | ||
| performed the retrieval from the given server. | ||
|
|
||
| - The `len` field in the metadata block allows clients to verify whether they received all CAR | ||
| bytes, which provides a backward-compatible solution for the [CARv1 streaming problem](https://github.com/ipfs/specs/pull/332) until new CAR version is introduced. | ||
|
|
||
| ### Compatibility | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bajtos Let's go extra mile here and elaborate what happens when CAR response with My suggestion is to add some clear statement about expected interop, like "libraries and implementations SHOULD ignore the suffix after 0x00", otherwise we will create a bad UX/DX, where developer tries to debug things with existign tooling and the tooling errors. I imagine we don't want things to fail due to
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's a great idea to think about compatibility with existing & future tooling and clearly describe our thinking. 👍🏻 The most important aspect is avoiding the "0x00 insertion attack" vector. You can find more details in the section Zero-length-block insertion attacks (including the Filecoin-specific logic). I am cross-posting the mitigation I proposed:
When developers use existing tooling, they will never receive a CAR file with the There are two major ways how a CAR with a
I am arguing that (2) is a bug in the tooling, introduced by the change that modified Trustless Gateway requests to opt-into Regarding (1): do you think this will happen frequently enough to justify the effort required to change all libraries you mentioned to start ignoring the Maybe it's actually a good thing that the tooling reports an error because it tells the user they are using the new As an alternative to silently stripping the Thoughts?
https://github.com/ipld/go-car/blob/5c5d432d582564f88fd2124f2fce4f2f3e47a654/cmd/car/inspect.go#L26 rd, err := carv2.NewReader(inStream, carv2.ZeroLengthSectionAsEOF(true))
https://github.com/ipld/js-car/blob/562c39266edda8422e471b7f83eadc8b7362ea0c/src/decoder.js#L94-L97 let length = decodeVarint(await reader.upTo(8), reader)
if (length === 0) {
throw new Error('Invalid CAR section (zero length)')
}I guess I can test how existing tooling handles zero-length blocks and document this behaviour in the IPIP, so that we better understand the current landscape. |
||
|
|
||
| The new feature requires clients to explicitly ask the server to include the extra block via `Accept` header, | ||
| therefore the change is fully backwards-compatible for all existing gateway clients. | ||
|
|
||
| Gateways receiving requests for the CAR content type can ignore the `meta` parameter they don't | ||
| support and return back a response with one of the CAR content types they support. This makes the | ||
| proposed change backwards-compatible for existing gateways too. | ||
|
|
||
|
|
||
| ### Security | ||
|
|
||
| The proposed specification change does not introduce any negative security implications. | ||
|
|
||
| ### Alternatives | ||
|
bajtos marked this conversation as resolved.
|
||
|
|
||
| #### HTTP Trailers | ||
|
|
||
| Instead of adding a new content type argument, we were considering sending the additional metadata | ||
| in HTTP response trailers. Unfortunately, HTTP trailers are not widely supported by the ecosystem. | ||
| Nginx proxy module discards them, [browser `Fetch API` does not allow JS clients to access trailer | ||
| headers](https://github.com/mdn/browser-compat-data/issues/14703), neither does the Rust `reqwest` client. | ||
|
|
||
| #### New Content-Type | ||
|
|
||
| We could introduce new `Content-Type: application/vnd.ipld.car-stream` and | ||
| create a specification of its wire format that wraps CARv1 and includes | ||
| additional DAG-CBOR manifest at the end. It would be effectively the same CAR | ||
| byte stream, but with different `Content-Type`. | ||
|
|
||
| Downsides of this solution: | ||
|
|
||
| - maintenance cost, requires duplicating of all CAR-related tests and features | ||
| - opportunity cost, in creating new content type, we increase cognitive | ||
| overhead for everyone working with IPFS over HTTP | ||
| - no backward-compatible interop with existing tools and gateways that only | ||
| speak `application/vnd.ipld.car` | ||
| - distracts us away from working on things like large blocks and CARv3 | ||
|
lidel marked this conversation as resolved.
|
||
|
|
||
| ## Test fixtures | ||
|
|
||
| TBD | ||
|
|
||
| Using one CID, request the CAR data using various combinations of content type parameters. | ||
|
Comment on lines
+191
to
+193
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Flagging this TODO to show in the PR discussion. |
||
|
|
||
| ### Copyright | ||
|
|
||
| Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). | ||
Uh oh!
There was an error while loading. Please reload this page.