[LoopSplitting][MicroBenchmarks] Benchmarking Loop Splitting Transformation#379
[LoopSplitting][MicroBenchmarks] Benchmarking Loop Splitting Transformation#379
Loop Splitting Transformation#379Conversation
Loop Splitting TransformationLoop Splitting Transformation
Meinersbur
left a comment
There was a problem hiding this comment.
https://llvm.org/docs/AIToolPolicy.html
Contributors are expected to be transparent and label contributions that contain substantial amounts of tool-generated content.
Benchmarks in llvm-test-suite are to track improvements of compiler optimizations. With #pragma omp split you already told the compiler what optimization to apply. There is very limited amout of optimization a compiler can do after that, basically applying the same optimization 4 times (each generated loop individually). There is no expectation it would improve in future versions of Clang. You are basically illustrating what speed difference of #pragma omp split for programmers considering using it, but it is a waste of time for compiler engineers that want to improve code optimization passes. That is, I don't think we need a benchmark for split upstream.
| # Copy this directory to llvm-test-suite/MicroBenchmarks/LoopSplit/ | ||
| # and add: add_subdirectory(LoopSplit) to MicroBenchmarks/CMakeLists.txt. | ||
| # | ||
| # Configure test-suite with a Clang that supports -fopenmp and -fopenmp-version=60. |
There was a problem hiding this comment.
| # Copy this directory to llvm-test-suite/MicroBenchmarks/LoopSplit/ | |
| # and add: add_subdirectory(LoopSplit) to MicroBenchmarks/CMakeLists.txt. | |
| # | |
| # Configure test-suite with a Clang that supports -fopenmp and -fopenmp-version=60. |
remove instructions from AI that you obviously applied.
There was a problem hiding this comment.
Thanks for pointing that one out, will keep a note for future references.
|
|
||
| // Kernel: sum 0..(N-1) with split into four segments. | ||
| static long run_split() { | ||
| long sum = 0; |
There was a problem hiding this comment.
long is 32 bits on 32 bit platforms and Windows 64. It will overflow on 19999999900000000
| static void BM_Split(benchmark::State &state) { | ||
| long x = 0; | ||
| for (auto _ : state) | ||
| benchmark::DoNotOptimize(x += run_split()); |
There was a problem hiding this comment.
| benchmark::DoNotOptimize(x += run_split()); | |
| auto x = run_split(); | |
| benchmark::DoNotOptimize(x); |
Understood. Thanks for reviewing and sharing the doc, will go through it. |
PR adds benchmarking mechanics for the upstream PR on
#pragma omp splitdirective