Skip to content

Add a minimal data-parallel distributed example (#2930)#3577

Closed
adityasingh2400 wants to merge 1 commit into
ml-explore:mainfrom
adityasingh2400:doc-distributed-examples-2930
Closed

Add a minimal data-parallel distributed example (#2930)#3577
adityasingh2400 wants to merge 1 commit into
ml-explore:mainfrom
adityasingh2400:doc-distributed-examples-2930

Conversation

@adityasingh2400
Copy link
Copy Markdown
Contributor

Summary

Closes (partially) #2930 — awni asked for a few short, standalone mlx.distributed examples back in December 2025 and nothing landed in the months since. This PR starts small with a single data-parallel example so we can lock in the structure before adding more.

  • examples/distributed/data_parallel/main.py: ~80-line MLP trained on a synthetic classification task. Each rank gets a different data shard; gradients are averaged with nn.average_gradients (which batches mx.distributed.all_sum calls).
  • examples/distributed/README.md: how to run with mlx.launch -n 2, plus a pointer to the existing distributed docs.

The script also runs unchanged under plain python because mx.distributed ops are no-ops at world size 1, so it doubles as a smoke test.

If this structure looks good, I'm happy to send a follow-up PR with a tensor-parallel inference example (and any others you have in mind).

Test plan

  • python3 -m py_compile examples/distributed/data_parallel/main.py
  • mlx.launch -n 2 examples/distributed/data_parallel/main.py (maintainer-side, requires a built MLX with a distributed backend available)
  • python examples/distributed/data_parallel/main.py (single process, should run with any MLX install)

awni requested simple mlx.distributed examples in ml-explore#2930 in
December 2025; nothing landed since. Add an examples/distributed/
data_parallel example: a small MLP trains on a synthetic
classification task with gradients averaged via all_reduce after
each step. README documents the mlx.launch invocation.

A tensor-parallel example can follow in a separate PR if this
structure works for you.

Fixes ml-explore#2930 (data-parallel example; tensor-parallel to follow)
@zcbenz
Copy link
Copy Markdown
Collaborator

zcbenz commented May 23, 2026

Closing since this is not even wrong.

@zcbenz zcbenz closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants