Add parallel tuning on multiple remote GPUs using Ray by isazi · Pull Request #328 · KernelTuner/kernel_tuner

isazi · 2025-08-13T09:39:21Z

Working on a simple parallel runner that uses Ray to distribute the benchmarking of different configurations to remote Ray workers.

sonarqubecloud · 2025-08-13T09:39:59Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

stijnh · 2026-01-20T13:32:17Z

The current parallel runner works. I've been able to run on multiple GPUs on DAS6-VU and DAS6-Leiden.

There are several remaining problems:

The timings are incorrect as the host assumes that the total time is just the sum over individual configurations
Use of tuning_options need to be refactored, as now it the entire object is sent to every node for each benchmark job
Logging information can be improved
The strategies are not parallel-aware yet (except brute-force)
A guide needs to be added to the docs explaining how to launch a Ray cluster on DAS6

…FF, GA, PSO, all hillclimbers, random

benvanwerkhoven · 2026-02-16T14:44:45Z

Sometimes with Python you run into an error and think: 'How on Earth has this error not surfaced years ago?'

It seems that observers has been None by default instead of an empty list since forever and miraculously it was never an issue. It seems that the code responsible for replacing a None value with an empty list is currently hidden in (and duplicated across) the backends, and because there is now code that (rightfully so) assumes observers is a list just before the backends are created this is suddenly an issue. The real issue is of course that the backends have somehow become responsible for sanitizing user input, which is not what a backend should do.

…properties, driver version, etc)

sonarqubecloud · 2026-02-17T13:53:00Z

Quality Gate passed

Issues
77 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

… instead of `__init__`

…` instead of `__init__`

…d of `runner.dev.name`

sonarqubecloud · 2026-04-07T15:47:30Z

Quality Gate passed

Issues
80 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

stijnh · 2026-04-20T07:31:30Z

I have been tested this on Snellius and it appears to work fine with multiple GPUs across multiple nodes.

Remaining issues:

overhead_time can be negative sometimes. Needs some investigation
Fix the issues flagged by SonarQube

sonarqubecloud · 2026-05-28T13:29:04Z

Quality Gate passed

Issues
57 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

isazi added 15 commits May 23, 2025 11:12

Typo.

37c5338

Add missing parameter to the interface.

6c5b360

Formatting.

a21caf8

First early draft of the parallel runner.

a2328c4

Merge branch 'master' into parallel_runner

c5896ad

Need a dummy DeviceInterface even on the master.

68a569b

Missing device_options in state.

9d0dee4

Flatten the results.

aff21f0

Various bug fixes.

d7e8cae

Add another example for the parallel runner.

b4ff7fa

Merge branch 'master' into parallel_runner

dd4f5ff

Merge branch 'master' into parallel_runner

5cb0243

Merge branch 'master' into parallel_runner

c4f7f32

Merge branch 'master' into parallel_runner

dd4a4ed

Merge branch 'master' into parallel_runner

e322824

isazi self-assigned this Aug 13, 2025

isazi added the enhancement label Aug 13, 2025

isazi marked this pull request as draft August 13, 2025 09:39

Rewrite parallel runner to use stateful actors

426dd2a

stijnh changed the title ~~Simple parallel runner~~ Add parallel tuning on multiple remote GPUs using Ray Jan 19, 2026

stijnh self-assigned this Jan 19, 2026

stijnh added 3 commits January 19, 2026 16:49

Merge branch 'master' into parallel_runner

baf4fd1

Move tuning_options to constructor of ParallelRunner

f585d42

Fix several errors related to parallel runner

ad55ba4

stijnh added 4 commits January 20, 2026 17:33

Extend several strategies with support for parallel tuning: DiffEvo, …

4d8f4f5

…FF, GA, PSO, all hillclimbers, random

Add pcu_bus_id to environment for Nvidia backends

fd41333

Add support eval_all in CostFunc

96e168d

Remove return_raw from CostFunc as it is unused

d7129cd

stijnh and others added 4 commits February 10, 2026 18:15

Change how timings are collected in all runners

d844c19

Add ability to pass lambda factory functions as observers

4a57daa

Add test to check if observer can be passed as lambda function

3ce803f

fix test_runner

3d54317

benvanwerkhoven and others added 3 commits February 16, 2026 15:58

ensure observers is a list in DeviceInterface

d74271b

Print warning in parallel worker when no progress is made for 60 seconds

a728e0c

Add more environment information in HIP backend (device UUID, device …

3c0e6c6

…properties, driver version, etc)

stijnh added 7 commits March 4, 2026 20:47

Change PMTObserver to perform initialization inside register_device…

32c3d8a

… instead of `__init__`

Add UUID to environment for cupy and nvcuda backends

f3d84ff

Change NVMLObserver to perform initialization inside `register_device…

0486fa5

…` instead of `__init__`

Let cache file get device name from runner.get_device_info() instea…

0d5e9d8

…d of `runner.dev.name`

Fix wrong variable name in nvml

0225f42

Merge remote-tracking branch 'origin/master' into parallel_runner

5592104

Add simulation_mode=False to ParallelRunner

8d1569d

stijnh force-pushed the parallel_runner branch 3 times, most recently from ee74137 to a769aea Compare April 7, 2026 15:44

Fix test for test_process_cache

c6345b5

stijnh force-pushed the parallel_runner branch from a769aea to c6345b5 Compare April 7, 2026 15:46

stijnh mentioned this pull request Apr 20, 2026

Add AMDSMIObserver that uses amdsmi to measure energy #372

Merged

stijnh added 3 commits May 28, 2026 09:51

Merge remote-tracking branch 'origin/master' into parallel_runner

a37fe94

Replace parallel_runner by parallel in examples

38a6694

Remove imports of all backends from core.py

9222fae

benvanwerkhoven marked this pull request as ready for review May 28, 2026 13:25

Remove trailing whitespaces in several files

0dad036

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel tuning on multiple remote GPUs using Ray #328

Add parallel tuning on multiple remote GPUs using Ray #328
isazi wants to merge 68 commits into
masterfrom
parallel_runner

isazi commented Aug 13, 2025

Uh oh!

sonarqubecloud Bot commented Aug 13, 2025

Uh oh!

stijnh commented Jan 20, 2026 •

edited

Loading

Uh oh!

benvanwerkhoven commented Feb 16, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Feb 17, 2026

Uh oh!

sonarqubecloud Bot commented Apr 7, 2026

Uh oh!

stijnh commented Apr 20, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

isazi commented Aug 13, 2025

Uh oh!

sonarqubecloud Bot commented Aug 13, 2025

Quality Gate passed

Uh oh!

stijnh commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benvanwerkhoven commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented Feb 17, 2026

Quality Gate passed

Uh oh!

sonarqubecloud Bot commented Apr 7, 2026

Quality Gate passed

Uh oh!

stijnh commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 28, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stijnh commented Jan 20, 2026 •

edited

Loading

benvanwerkhoven commented Feb 16, 2026 •

edited

Loading

stijnh commented Apr 20, 2026 •

edited

Loading