Skip to content

Implement batched matmul for large 1D dot products#3580

Open
Ved235 wants to merge 2 commits into
ml-explore:mainfrom
Ved235:main
Open

Implement batched matmul for large 1D dot products#3580
Ved235 wants to merge 2 commits into
ml-explore:mainfrom
Ved235:main

Conversation

@Ved235
Copy link
Copy Markdown

@Ved235 Ved235 commented May 22, 2026

Proposed changes

Addresses issue #3533. Adds routing logic in mlx/ops.cpp so that it divides the large 1D dot product into chunks so gemv parallelizes.

Benchmark

import mlx.core as mx
import numpy as np
import time

def bench(fn, rounds=100, label=""):
    for _ in range(3):
        r = fn()
        mx.eval(r)

    times = []
    for _ in range(rounds):
        mx.eval()  
        t0 = time.perf_counter()
        r = fn()
        mx.eval(r) 
        times.append(time.perf_counter() - t0)

    times.sort()
    median = times[len(times) // 2]
    best = times[0]
    worst = times[-1]
    print(f"{label}")
    print(f"median={median*1000:.3f}ms | min={best*1000:.3f}ms | max={worst*1000:.3f}ms")
    return r

a = mx.random.normal(shape=(50_000_000,), dtype=mx.float32)
b = mx.random.normal(shape=(50_000_000,), dtype=mx.float32)

a_np = np.array(a, copy=False)
b_np = np.array(b, copy=False)

ccc = bench(lambda: mx.inner(a, b), label="MLX native")

print(f"mx.inner : {float(ccc)}")

Using this benchmarking script the performance changes are:

median=15.393ms | min=15.323ms | max=15.769ms

to

median=1.741ms | min=1.682ms | max=1.835ms

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant