Skip to content

fix(windows): associate machine-scoped TPM keys with stored certificates#1053

Merged
joshdrake merged 3 commits into
masterfrom
diego/capi-machine-keyset-cert-association
Jun 12, 2026
Merged

fix(windows): associate machine-scoped TPM keys with stored certificates#1053
joshdrake merged 3 commits into
masterfrom
diego/capi-machine-keyset-cert-association

Conversation

@darkfronza

Copy link
Copy Markdown
Contributor

Summary

On Windows, certificates stored for machine-scoped TPM (Microsoft Platform Crypto Provider) keys were left without an associated private key, making them unusable (e.g. for Wi-Fi/EAP-TLS, VPN). This was surfaced by the step-agent's windows-key-scope work, which moves the agent's AK — and therefore its attested endpoint keys — to the local machine keyset.

Root cause

The certificate→key association ran exclusively through CryptFindCertificateKeyProvInfo (discovery), which fails here for two compounding reasons:

  1. It is skipped. kms/platform's transformToTPMKMS sets skip-find-certificate-key=true on Windows by default, so discovery never runs for these certificates.
  2. It cannot find the key anyway. CryptFindCertificateKeyProvInfo does not search the local machine keyset, so it cannot discover a machine-scoped PCP/TPM key even when asked.

Keeping endpoint keys user-scoped isn't an option: tpm.AttestKey requires an attested key to share its AK's scope (an attest-session constraint), and the AK is machine-scoped.

Fix — associate explicitly instead of by discovery

  • tpmkms hands the CAPI layer the exact key when storing to the Windows certificate store: its CNG container name (tpm.ApplicationKeyName, the app--prefixed key name used to persist the key), the TPM provider, and whether the key lives in the local machine keyset.
  • capi sets CERT_KEY_PROV_INFO_PROP_ID directly via setCertificateKeyProvInfo when a container name is supplied. This needs no container enumeration, so it never prompts for a smart card and runs even when skip-find-certificate-key is set; it marks the prov info with CRYPT_MACHINE_KEYSET for machine-scoped keys.
  • The CRYPT_KEY_PROV_INFO struct is corrected to pointer/DWORD fields so it can be used to set the property (it was int-typed and effectively read-only).
  • The discovery fallback (used when no key is named) now searches both the user and machine keysets for machine-location certificates.
  • tpm.ApplicationKeyName is exported as the single source of truth for the persisted application-key name, shared with tpmkms.

Testing

  • GOOS=windows go build ./kms/... ./tpm/... and host go build both clean; go vet shows only a pre-existing unsafe.Pointer line; tpm name tests pass.
  • Verified end-to-end on a Windows host with a TPM: a freshly-issued hardware-attested endpoint certificate now has its private key linked in LocalMachine\My (previously orphaned).

🤖 Generated with Claude Code

On Windows the agent stores attested endpoint certificates through the
platform KMS, which sets skip-find-certificate-key and uses machine-scoped
Platform Crypto Provider (TPM) keys. The certificate-to-key association
relied on CryptFindCertificateKeyProvInfo, which (a) is skipped when
skip-find-certificate-key is set and (b) cannot discover keys in the local
machine keyset. As a result the certificate was stored without a linked
private key, leaving it unusable.

Associate the certificate with its key explicitly instead of by discovery:

- tpmkms hands the CAPI layer the exact key when storing to the Windows
  certificate store — its CNG container name (tpm.ApplicationKeyName, the
  "app-"-prefixed key name used to persist the key), the TPM provider, and
  whether it lives in the local machine keyset.
- capi sets CERT_KEY_PROV_INFO_PROP_ID directly via setCertificateKeyProvInfo
  when a container name is supplied. This needs no container enumeration, so
  it never prompts for a smart card and runs even with
  skip-find-certificate-key set. It also marks the prov info with
  CRYPT_MACHINE_KEYSET for machine-scoped keys.
- The CRYPT_KEY_PROV_INFO struct is corrected to use pointer/DWORD fields so
  it can be used to set the property (it was int-typed and read-only before).
- The discovery fallback now searches both the user and machine keysets for
  machine-location certificates.

tpm.ApplicationKeyName is exported as the single source of truth for the
persisted application-key name, shared with tpmkms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@maraino maraino left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to avoid the new parameter, we know using tpmkms parameter what would be the value of the key-machine-keyset, let's use the same logic, adding the already existing key-scope if needed.

I also have some comments about Microsoft naming.

Comment thread kms/capi/capi.go Outdated
Comment thread kms/capi/capi.go Outdated
Comment thread kms/capi/capi.go
Comment thread kms/capi/capi.go Outdated
Comment thread kms/capi/ncrypt_windows.go Outdated
Comment thread kms/tpmkms/tpmkms.go Outdated
- Drop the new capi "key-machine-keyset" parameter. tpmkms now forwards the
  existing "key-scope", and capi derives the keyset with the same resolution
  tpmkms uses (key-scope, falling back to store-location). Avoids a redundant
  parameter per review.
- Restrict the discovery fallback to a single keyset by scope instead of
  setting both flags (which is the default and searches both anyway).
- Rename the uriAttributes "containerName" field to "keyContainerName" to
  match CRYPT_KEY_PROV_INFO's pwszContainerName ("key container").
- Drop the runtime.KeepAlive calls in setCertificateKeyProvInfo; info keeps
  the strings reachable through the call, matching the other property helpers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@darkfronza

Copy link
Copy Markdown
Contributor Author

Thanks for the review. Pushed b0e2839 addressing the feedback:

  • Dropped the new key-machine-keyset parameter; tpmkms forwards the existing key-scope and capi resolves the keyset itself (same logic as tpmkms.isMachineKey(), falling back to store-location).
  • Discovery fallback now restricts to the matching keyset by scope instead of setting both flags.
  • Renamed the uriAttributes.containerName field to keyContainerName.
  • Removed the runtime.KeepAlive calls in setCertificateKeyProvInfo.

One thing worth a look: the discovery fallback now restricts to the scope's keyset both ways (machine→machine, user→user), whereas before it searched both. It's more precise and matches the "a machine cert shouldn't probe the user container" point, but it's a subtle behavior change for the general (non-tpmkms) StoreCertificate discovery path. Since the tpmkms path now uses explicit association and rarely hits discovery, the risk is low — happy to revert just that part to the both-keysets default if you'd prefer.

The fix is verified working end-to-end on a Windows TPM host (attested endpoint cert now has its private key linked in LocalMachine\My).

@darkfronza darkfronza requested a review from maraino June 11, 2026 23:10

@maraino maraino left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I think we might leave keysetFlags as 0 by default if u.isMachineKeySet() is false. See comment.

Comment thread kms/capi/capi.go Outdated
Comment on lines +955 to +958
keysetFlags := CRYPT_FIND_USER_KEYSET_FLAG
if u.isMachineKeyset() {
keysetFlags = CRYPT_FIND_MACHINE_KEYSET_FLAG
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be ok, but I'm not sure if it will be better to use both by default, I think leaving the keysetFlags to 0 will do it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on 25d69a9

CryptFindCertificateKeyProvInfo searches both the user and machine containers
when no keyset flag is passed (its documented default, and what the library did
before this change). The discovery fallback now defaults to that both-containers
search and restricts to a single keyset only when the key scope is known:
explicit key-scope=user (new isUserKeyset helper) → user, otherwise the existing
isMachineKeyset check → machine. This preserves the legacy behavior for callers
that don't specify a key scope while still finding machine-scoped keys.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@maraino maraino left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@darkfronza

Copy link
Copy Markdown
Contributor Author

Pushed 25d69a9 to make the key-discovery fallback backward-compatible.

Context: CryptFindCertificateKeyProvInfo searches both the user and machine key containers when no keyset flag is passed — that's its documented default ("The default is to search both the user and machine containers") and what this library did before the PR. My earlier revision defaulted the discovery path to CRYPT_FIND_USER_KEYSET_FLAG, which narrowed that to user-only — a behavior change for existing callers that don't specify a key scope.

Change: the discovery fallback now defaults to 0 (both containers) and only restricts when the key scope is actually known:

var keysetFlags uint32 // 0 = search both containers (CryptFindCertificateKeyProvInfo default)
switch {
case u.isUserKeyset():    // explicit key-scope=user
    keysetFlags = CRYPT_FIND_USER_KEYSET_FLAG
case u.isMachineKeyset(): // key-scope=machine, or store-location=machine fallback
    keysetFlags = CRYPT_FIND_MACHINE_KEYSET_FLAG
}

Behavior:

key-scope store-location search vs. pre-PR
(unset) user (default) both preserved
(unset) machine machine narrowed (intended)
user any user narrowed
machine any machine narrowed

So the common legacy case (no key-scope, default store) keeps the both-containers search, and we only narrow on an explicit signal.

One deliberate asymmetry: the new isUserKeyset() checks only explicit key-scope, while isMachineKeyset() keeps its store-location=machine fallback. That's intentional — adding a user-side fallback would make the defaulted store-location=user over-restrict to user-only and defeat the back-compat goal. The only edge it doesn't cover is an (exotic) machine-store cert whose key is in the user keyset, discovered without a container name; the explicit-association path — which the agent always takes — is unaffected.

For background, this is consistent with the canonical Windows behavior: a certificate's store location and its private key's keyset are independent axes (the link is the cert's CERT_KEY_PROV_INFO_PROP_ID / CRYPT_KEY_PROV_INFO, which has no store field), so it's correct to leave the keyset unspecified — and let discovery search both — unless the caller tells us the scope.

@joshdrake joshdrake merged commit 889cf4e into master Jun 12, 2026
10 of 12 checks passed
@joshdrake joshdrake deleted the diego/capi-machine-keyset-cert-association branch June 12, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants