Skip to content

feat: add support for GIT LFS checkout#3909

Open
lizrabuya wants to merge 10 commits into
mainfrom
feat/sup-6529/lfs-checkout
Open

feat: add support for GIT LFS checkout#3909
lizrabuya wants to merge 10 commits into
mainfrom
feat/sup-6529/lfs-checkout

Conversation

@lizrabuya

@lizrabuya lizrabuya commented May 11, 2026

Copy link
Copy Markdown
Contributor

Description

Add configuration and environment variables when the backend sets BUILDKITE_GIT_LFS_ENABLED=true in the job environment. When BUILDKITE_GIT_LFS_ENABLED=true is set, the agent will check if the git-lfs binary is installed and proceed to perform a git lfs fetch + git lfs checkout.

Changes

Configuration & CLI (clicommand/bootstrap.go, internal/job/config.go)

  • Added a new GitLFSEnabled bool field to ExecutorConfig, backed by env var BUILDKITE_GIT_LFS_ENABLED
  • Exposed it as a --git-lfs-enabled CLI flag on the bootstrap command with usage description
  • Wired the flag through from BootstrapConfig into ExecutorConfig

Checkout behaviour (internal/job/checkout.go)

  • Added a fail-fast binary check: if GitLFSEnabled is true and git-lfs is not on PATH, the checkout fails immediately with a descriptive error before any fetch/checkout work
  • Added git lfs install --local after clone/clean but before fetch, so the LFS filter is registered in the local repo config
  • Added a call to gitLFSFetchCheckout after the main git checkout (and after submodule updates) to explicitly pull down and materialise LFS objects

LFS fetch helper (internal/job/git.go)

  • Added gitLFSFetchCheckout, which runs git lfs fetch then git lfs checkout, wrapping each failure with a distinct error prefix ("git lfs fetch: ..." / "git lfs checkout: ...") so the failing step is identifiable from logs alone
  • gitLFSFetchCheckout supports opt-in retry: 5 attempts with subsecond exponential backoff + jitter, returning a *gitError{Type: gitErrorLFS, WasRetried: true} on exhaustion so the outer checkout retrier doesn't double-retry the same failure

Smudge bypass (internal/job/executor.go)

  • Set GIT_LFS_SKIP_SMUDGE=1 in setUp so LFS objects are never downloaded implicitly during git checkout; the explicit gitLFSFetchCheckout call handles download instead

Tests (internal/job/checkout_test.go)

  • Added TestDefaultCheckoutPhase_GitLFS covering four scenarios: LFS disabled, LFS enabled with binary present, LFS enabled with binary missing (PATH restricted via t.Setenv), and LFS enabled with a failing git lfs command (fake git-lfs script that exits 1)

Windows test caveats

Two of the LFS error-path subtests (LFS enabled git lfs command fails and LFS enabled git lfs fetch fails) are skipped on Windows. They cannot run there for a fundamental reason, not flakiness:

  • These tests work by overriding PATH to point at a fake git-lfs script that exits non-zero, so we can assert our error-wrapping behaviour.
  • Git for Windows ships its own git-lfs.exe inside GIT_EXEC_PATH (the Git installation's libexec/git-core directory). When git resolves a subcommand like git lfs ..., it looks in GIT_EXEC_PATH before falling back to PATH.
  • That means our PATH override is bypassed entirely on Windows — the real bundled git-lfs.exe runs, succeeds, and the test fails because we expected our fake to return an error.
  • The Go code being exercised (error wrapping in gitLFSFetchCheckout and git lfs install --local failure handling) is platform-agnostic, so the Linux runs give us the real coverage. Skipping on Windows is a test-harness limitation, not a behavioural gap.

Additionally, the two happy-path subtests (LFS disabled, LFS enabled binary present) use os.MkdirTemp + a best-effort RemoveAll in t.Cleanup rather than t.TempDir(). On Windows, git's child processes (credential helpers, the git-lfs filter-process) can keep file handles open past their parent's exit. t.TempDir()'s cleanup is strict and fails the test on unlinkat ... being used by another process errors; the best-effort variant lets the test pass while still cleaning up what it can.

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go tool gofumpt -extra -w .)

Disclosures / Credits

The main function changes were implemented by me based on the recommendation from the research done for this function. I used Claude to review the code and to assist with writing the tests.

@lizrabuya lizrabuya changed the title Enable git-lfs checkout Add Git LFS Config and Environment Variables May 11, 2026
@lizrabuya lizrabuya changed the base branch from main to feat/git-checkout-features May 11, 2026 06:57
@lizrabuya lizrabuya changed the title Add Git LFS Config and Environment Variables Add Git LFS support May 11, 2026
@lizrabuya lizrabuya force-pushed the feat/sup-6529/lfs-checkout branch from f9e4186 to 255aa11 Compare May 11, 2026 07:10
@lizrabuya lizrabuya changed the title Add Git LFS support Add support for GIT LFS checkout May 11, 2026
@lizrabuya lizrabuya marked this pull request as ready for review May 22, 2026 04:48
@lizrabuya lizrabuya requested review from a team as code owners May 22, 2026 04:48
Comment thread internal/job/git.go
Comment thread internal/job/checkout_test.go
@ozdenyilmaz

Copy link
Copy Markdown

@lizrabuya could you ask for a review in #ask-agents or in our project channel tagging Josh/Ming once you address Dahtey's comments?

@lizrabuya lizrabuya requested a review from dahtey-bk June 5, 2026 03:23
@lizrabuya lizrabuya changed the base branch from feat/git-checkout-features to main June 5, 2026 04:33
@lizrabuya lizrabuya force-pushed the feat/sup-6529/lfs-checkout branch 2 times, most recently from 242cb29 to 9a649dd Compare June 5, 2026 06:07
@lizrabuya lizrabuya changed the title Add support for GIT LFS checkout feat: add support for GIT LFS checkout Jun 5, 2026
@lizrabuya lizrabuya added the feature New user-facing feature! label Jun 5, 2026
@quinn-diesel

Copy link
Copy Markdown
Contributor

going to tag and throw @buildsworth-bk in here for a review

@buildsworth-bk-app buildsworth-bk-app Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds opt-in Git LFS support to the default checkout: a --git-lfs-enabled flag, GIT_LFS_SKIP_SMUDGE=1 so checkout writes pointer files rather than downloading inline, and an explicit git lfs fetch / git lfs checkout after checkout with its own bounded retrier. The double-retry concern from the earlier round looks resolved — exhausting the inner retrier returns WasRetried: true, which breaks the outer checkout retrier, so a persistent LFS failure fails once rather than being re-attempted ~30 times.

Two things to confirm and one small cleanup, all inline. The one I'd most want an answer on is the env var name: the SUP-6529 issue specifies BUILDKITE_GIT_LFS, but this reads BUILDKITE_GIT_LFS_ENABLED — if those don't match what the backend emits, the feature silently never activates.

The PR has no risk: label, so I'm leaving this as a comment rather than an approval.

Want to dig deeper? The full session log is attached to this Buildkite build. Download the session file and open a new pi session with it:

Download the buildsworth logs from build 518, then answer my questions about the findings.

Comment thread clicommand/bootstrap.go
Comment thread internal/job/checkout.go Outdated
Comment on lines +943 to +948
// Fail fast before any git work if git-lfs is required but missing.
if e.GitLFSEnabled {
if _, err := exec.LookPath("git-lfs"); err != nil {
return fmt.Errorf("BUILDKITE_GIT_LFS_ENABLED=true but git-lfs binary is not found on PATH: %w", err)
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: Two things about this fail-fast check. First, the comment says "before any git work", but the mirror update and clone earlier in defaultCheckoutPhase have already run by the time we reach here. Second — and more impactful — the returned error is a plain fmt.Errorf, not a *gitError, so the outer checkout retrier in checkout() doesn't recognise it: errors.As(err, &errGit) fails and it falls through to the default: branch, which removes the checkout dir and retries. A missing git-lfs binary is deterministic, so this re-clones the repo on every attempt (up to ~6×) before finally failing — the opposite of fail-fast. Moving the check above the mirror/clone block and returning something that breaks the outer retrier (e.g. a *gitError{..., WasRetried: true}, the way gitLFSFetchCheckout does on exhaustion) would let it fail once, immediately.

Comment thread internal/job/checkout.go Outdated

@buildsworth-bk-app buildsworth-bk-app Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interesting part here is the retry interaction. gitLFSFetchCheckout runs its own 5-attempt budget and tags an exhausted failure with WasRetried + gitErrorLFS, so the outer checkout retrier breaks (and gitErrorLFS falls through to return err without wiping the checkout dir) rather than multiplying this into ~30 attempts. I traced that path and it holds up. The fail-fast git-lfs lookup and the split fetch/checkout with distinct error prefixes are sensible.

Three non-blocking notes inline: a redundant GIT_LFS_SKIP_SMUDGE set, an env-var naming question worth confirming against the backend, and an error-path test that pays the full retry backoff. The PR also has no risk: label, so it falls outside AI-only approval.

Want to dig deeper? The full session log is attached to this Buildkite build. Download the session file and open a new pi session with it:

Download the buildsworth logs from build 519, then answer my questions about the findings.

Comment thread internal/job/config.go
Comment thread internal/job/checkout.go Outdated
Comment thread internal/job/checkout_test.go
@mcncl

mcncl commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@buildsworth-bk review please

@buildsworth-bk-app

Copy link
Copy Markdown

On it! Follow along with my review here: https://buildkite.com/buildkite/buildsworth-review-pull-request/builds/751

@buildsworth-bk-app buildsworth-bk-app Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous Buildsworth feedback looks resolved: the env-var naming question has a backend-confirmation reply, the duplicate GIT_LFS_SKIP_SMUDGE set was removed, and the retry-test runtime note has an explicit author decision. This pass found two checkout-path issues that I think need fixing before merge: the fail-fast binary check uses a different environment than the checkout commands, and LFS still performs an unfiltered fetch for sparse checkouts.

Want to dig deeper? The full session log is attached to this Buildkite build. Download the session file and open a new pi session with it:

Download the buildsworth logs from build 751, then answer my questions about the findings.

Comment thread internal/job/checkout.go
// Fail fast before any git work if git-lfs is required but missing.
// This operation only handles default checkout behavior, so it's possible for a custom checkout hook to require git-lfs but not have this check. That's a bit unfortunate, but we can add it to custom hooks later if needed.
if e.GitLFSEnabled {
if _, err := exec.LookPath("git-lfs"); err != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: exec.LookPath reads the agent process PATH, but checkout commands are resolved from e.shell.Env via shell.Command / AbsolutePath, and hooks/plugins can mutate that environment before checkout. That can fail a job where git-lfs was added to the job PATH, or pass this check even though the shell PATH used for git lfs ... cannot resolve it. Can this check use the shell environment too (for example e.shell.AbsolutePath("git-lfs")) and cover that with a test that changes sh.Env rather than only t.Setenv?

Comment thread internal/job/checkout.go
}
}

if e.GitLFSEnabled {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: sparseCheckoutActive is ignored here, so BUILDKITE_GIT_LFS_ENABLED=true with sparse checkout still runs an unfiltered git lfs fetch. That is the combination SUP-6529 calls out to avoid: sparse checkout users can still download LFS objects outside the sparse set, defeating the sparse checkout and potentially blowing up disk/network usage. Can this branch be gated on !sparseCheckoutActive, or otherwise constrained to the sparse paths, before merging?

Comment thread internal/job/git.go
Comment on lines +189 to +192
if err != nil && args.Retry {
return &gitError{error: err, Type: gitErrorLFS, WasRetried: true}
}
return err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to be returning an error if retrying is not set to true here? WasRetried should return the value of args.Retry so even if retries are disabled, the error's context is not lost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New user-facing feature!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants