Skip to content

Commit b559793

Browse files
committed
Fix Google Benchmark usage to reveal true improvements, by using benchmark::DoNotOptimize, add plots and automatize plot generation
1 parent 7856f95 commit b559793

6 files changed

Lines changed: 105 additions & 142 deletions

File tree

README.md

Lines changed: 1 addition & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -40,101 +40,7 @@ A high-performance C++ framework for SIMD (Single Instruction Multiple Data) ope
4040

4141
Performance improvements comparing SIMD operations vs. standard operations:
4242

43-
#### Float256 Operations (100,000 elements)
44-
45-
| Operation | SIMD Time | Plain Time | Speedup Factor |
46-
|-----------|-----------|------------|----------------|
47-
| Addition | 0.069 ms | 0.273 ms | 3.96x |
48-
| Subtraction | 0.066 ms | 0.270 ms | 4.09x |
49-
| Multiplication | 0.072 ms | 0.292 ms | 4.06x |
50-
| Division | 0.084 ms | 0.701 ms | 8.35x |
51-
52-
#### Double256 Operations (100,000 elements)
53-
54-
| Operation | SIMD Time | Plain Time | Speedup Factor |
55-
|-----------|-----------|------------|----------------|
56-
| Addition | 0.064 ms | 0.146 ms | 2.28x |
57-
| Subtraction | 0.071 ms | 0.151 ms | 2.13x |
58-
| Multiplication | 0.077 ms | 0.213 ms | 2.77x |
59-
| Division | 0.130 ms | 0.464 ms | 3.57x |
60-
61-
#### Int128 Operations with int32_t (100,000 elements)
62-
63-
| Operation | SIMD Time | Plain Time | Speedup Factor |
64-
|-----------|-----------|------------|----------------|
65-
| Addition | 0.051 ms | 0.134 ms | 2.63x |
66-
| Subtraction | 0.063 ms | 0.140 ms | 2.22x |
67-
| Multiplication | 0.072 ms | 0.196 ms | 2.72x |
68-
69-
#### Int128 Operations with int16_t (100,000 elements)
70-
71-
| Operation | SIMD Time | Plain Time | Speedup Factor |
72-
|-----------|-----------|------------|----------------|
73-
| Addition | 0.055 ms | 0.285 ms | 5.18x |
74-
| Subtraction | 0.046 ms | 0.267 ms | 5.80x |
75-
| Multiplication | 0.044 ms | 0.444 ms | 10.09x |
76-
77-
#### Int128 Operations with int8_t (100,000 elements)
78-
79-
| Operation | SIMD Time | Plain Time | Speedup Factor |
80-
|-----------|-----------|------------|----------------|
81-
| Addition | 0.046 ms | 0.517 ms | 11.24x |
82-
| Subtraction | 0.046 ms | 0.507 ms | 11.02x |
83-
84-
#### Int256 Operations with int32_t (100,000 elements)
85-
86-
| Operation | SIMD Time | Plain Time | Speedup Factor |
87-
|-----------|-----------|------------|----------------|
88-
| Addition | 0.070 ms | 0.273 ms | 3.90x |
89-
| Subtraction | 0.065 ms | 0.277 ms | 4.26x |
90-
| Multiplication | 0.070 ms | 0.269 ms | 3.84x |
91-
92-
#### Int256 Operations with int16_t (100,000 elements)
93-
94-
| Operation | SIMD Time | Plain Time | Speedup Factor |
95-
|-----------|-----------|------------|----------------|
96-
| Addition | 0.076 ms | 0.622 ms | 8.18x |
97-
| Subtraction | 0.067 ms | 0.519 ms | 7.75x |
98-
| Multiplication | 0.064 ms | 0.870 ms | 13.59x |
99-
100-
#### Int256 Operations with int8_t (100,000 elements)
101-
102-
| Operation | SIMD Time | Plain Time | Speedup Factor |
103-
|-----------|-----------|------------|----------------|
104-
| Addition | 0.067 ms | 1.02 ms | 15.22x |
105-
| Subtraction | 0.071 ms | 1.06 ms | 14.93x |
106-
107-
### Speedup Factor Comparison
108-
109-
```
110-
Speedup Factors (Higher is better)
111-
-------------------------------------------------------------
112-
Int256 int8_t Addition | 🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪ | 15.22x |
113-
Int256 int8_t Subtraction | 🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪ | 14.93x |
114-
Int256 int16_t Multiplication| 🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪ | 13.59x |
115-
Int128 int8_t Addition | 🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪ | 11.24x |
116-
Int128 int8_t Subtraction | 🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪ | 11.02x |
117-
Int128 int16_t Multiplication| 🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪ | 10.09x |
118-
Float256 Division | 🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪ | 8.35x |
119-
Int256 int16_t Addition | 🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪ | 8.18x |
120-
Int256 int16_t Subtraction | 🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪ | 7.75x |
121-
Int128 int16_t Subtraction | 🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 5.80x |
122-
Int128 int16_t Addition | 🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 5.18x |
123-
Int256 int32_t Subtraction | 🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 4.26x |
124-
Float256 Subtraction | 🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 4.09x |
125-
Float256 Multiplication | 🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 4.06x |
126-
Float256 Addition | 🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 3.96x |
127-
Int256 int32_t Addition | 🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 3.90x |
128-
Int256 int32_t Multiplication| 🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 3.84x |
129-
Double256 Division | 🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 3.57x |
130-
Double256 Multiplication | 🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 2.77x |
131-
Int128 int32_t Multiplication| 🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 2.72x |
132-
Int128 int32_t Addition | 🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 2.63x |
133-
Double256 Addition | 🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 2.28x |
134-
Int128 int32_t Subtraction | 🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 2.22x |
135-
Double256 Subtraction | 🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪ | 2.13x |
136-
-------------------------------------------------------------
137-
```
43+
![Benchmark Speedup](benchmark_results_linux_gcc/consolidated_speedup.png)
13844

13945
## System Information
14046

SIMD.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
#include <iostream>
99
#include <stdint.h>
1010
#include <array>
11+
#include <memory>
1112
#ifdef _WIN32
1213
#include <malloc.h>
1314
#elif defined(__linux__)

analyze_benchmarks.py

Lines changed: 82 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import pandas as pd
55
from pathlib import Path
66
import argparse # Add argparse for command line arguments
7+
import cpuinfo
78

89
def parse_benchmark_results(file_path):
910
"""Parse benchmark results from file."""
@@ -187,60 +188,70 @@ def plot_comparisons(grouped_benchmarks, output_dir):
187188
plain_times.append(row['time_ms_plain'])
188189
speedups.append(row['speedup_percent'])
189190

190-
# Sort the data by data type and operation for better visualization
191-
sorted_indices = np.argsort([f"{dt}_{op}" for dt, op in zip(data_types, operations)])
191+
# Sort the data by speedup in decreasing order
192+
speedup_x = [speedup / 100 + 1 for speedup in speedups]
193+
sorted_indices = np.argsort(speedup_x)[::-1] # Sort in decreasing order
192194
categories = [categories[i] for i in sorted_indices]
193195
data_types = [data_types[i] for i in sorted_indices]
194196
operations = [operations[i] for i in sorted_indices]
195197
simd_times = [simd_times[i] for i in sorted_indices]
196198
plain_times = [plain_times[i] for i in sorted_indices]
197199
speedups = [speedups[i] for i in sorted_indices]
200+
speedup_x = [speedup_x[i] for i in sorted_indices]
198201

199202
# Create the consolidated comparison plot
200203
plt.figure(figsize=(18, 10))
201204
bar_width = 0.35
202205
x = np.arange(len(categories))
203-
204-
# Create a bar chart with SIMD and Plain implementations
205-
plt.bar(x - bar_width/2, simd_times, bar_width, label='SIMD', color='royalblue')
206-
plt.bar(x + bar_width/2, plain_times, bar_width, label='Plain', color='lightcoral')
207-
208-
plt.xlabel('Benchmark Category', fontsize=12)
209-
plt.ylabel('Time (ms)', fontsize=12)
210-
plt.title('SIMD vs Plain Performance Comparison', fontsize=14)
211-
212-
# Add speedup text on top of bars
213-
for i in range(len(categories)):
214-
speedup = speedups[i]
215-
color = 'green' if speedup > 0 else 'red'
216-
position = max(simd_times[i], plain_times[i]) + 0.002
217-
plt.text(i, position, f"{speedup:.1f}%", ha='center', color=color, weight='bold')
218-
219-
plt.xticks(x, categories, rotation=45, ha='right', fontsize=10)
220-
plt.legend(fontsize=12)
221-
plt.tight_layout()
222-
plt.grid(axis='y', linestyle='--', alpha=0.7)
223-
plt.savefig(output_path / "consolidated_comparison.png", dpi=300)
206+
224207

225208
# Also create a speedup chart
226209
plt.figure(figsize=(18, 10))
227-
colors = ['green' if s > 0 else 'red' for s in speedups]
228-
plt.bar(x, speedups, color=colors)
229-
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
210+
colors = ['limegreen' if s > 0 else 'red' for s in speedups]
211+
plt.bar(x, speedup_x, 0.5, color=colors)
230212

231213
plt.xlabel('Benchmark Category', fontsize=12)
232-
plt.ylabel('Speedup (%)', fontsize=12)
233-
plt.title('SIMD Speedup over Plain Implementation', fontsize=14)
214+
plt.ylabel('Speedup', fontsize=12)
215+
compiler_text = ("GCC " + gcc_version) if os_platform == 'Linux' else (("MSVC " + msvc_version) if os_platform == 'Windows' else "Unknown Compiler")
216+
plt.title(f"SIMD Speedup Over Plain Implementation ({os_platform} {compiler_text})", fontsize=14, weight='bold')
234217

235218
# Add speedup values as text
236219
for i in range(len(categories)):
237-
va = 'bottom' if speedups[i] > 0 else 'top'
238-
offset = 2 if speedups[i] > 0 else -2
239-
plt.text(i, speedups[i] + offset, f"{speedups[i]:.1f}%", ha='center', va=va, fontsize=10)
220+
va = 'bottom' if speedup_x[i] > 1 else 'top'
221+
offset = 0.05 if speedup_x[i] > 1 else -0.15
222+
plt.text(i, speedup_x[i] + offset, f"{speedup_x[i]:.2f}x", ha='center', va=va, fontsize=10, weight='bold')
223+
224+
# Format y-axis ticks to show "x" suffix
225+
from matplotlib.ticker import FuncFormatter
226+
def format_speedup(value, pos):
227+
return f"{value:.0f}x"
228+
plt.gca().yaxis.set_major_formatter(FuncFormatter(format_speedup))
240229

241230
plt.xticks(x, categories, rotation=45, ha='right', fontsize=10)
242231
plt.tight_layout()
243232
plt.grid(axis='y', linestyle='--', alpha=0.7)
233+
# Get CPU info including cores, architecture and add it to the plot as a box on top right
234+
cpu_info = cpuinfo.get_cpu_info()
235+
cpu_name = cpu_info.get('brand_raw', 'N/A')
236+
cpu_arch = cpu_info.get('arch_string_raw', 'N/A')
237+
cpu_cores = cpu_info.get('count', 'N/A')
238+
cpu_freq_actual = cpu_info.get('hz_actual_friendly', 'N/A')
239+
cpu_freq_advertised = cpu_info.get('hz_advertised_friendly', 'N/A')
240+
241+
info_text = (
242+
f"CPU: {cpu_name}\n"
243+
f"Arch: {cpu_arch}\n"
244+
f"Cores: {cpu_cores}\n"
245+
f"Freq (Actual): {cpu_freq_actual}\n"
246+
f"Freq (Advertised): {cpu_freq_advertised}"
247+
)
248+
249+
# Position the text box on the top right
250+
# Adjust x and y coordinates as needed based on your plot's scale
251+
# Using axes coordinates (0 to 1 for x and y) for positioning relative to the plot area
252+
plt.text(0.80, 0.98, info_text, transform=plt.gca().transAxes,
253+
fontsize=9, verticalalignment='top', horizontalalignment='left',
254+
bbox=dict(boxstyle='round,pad=0.5', fc='wheat', alpha=0.5))
244255
plt.savefig(output_path / "consolidated_speedup.png", dpi=300)
245256

246257
# Create a table plot with the data
@@ -303,33 +314,66 @@ def generate_summary_report(grouped_benchmarks, output_dir):
303314
data_type = group['data_type']
304315
operation = group['operation']
305316

306-
f.write(f"## {data_type} {operation}\n\n")
307-
f.write("| Variant | SIMD Time (ms) | Plain Time (ms) | Speedup (%) |\n")
317+
f.write(f"#### {data_type} {operation}\n\n")
318+
f.write("| Variant | SIMD Time (ms) | Plain Time (ms) | Speedup (x) |\n")
308319
f.write("|---------|---------------|----------------|------------|\n")
309320

310321
for _, row in comp_df.iterrows():
311322
simd_time = row['time_ms_simd']
312323
plain_time = row['time_ms_plain']
313324
speedup = row['speedup_percent']
314325

315-
f.write(f"| {row['size']} | {simd_time:.3f} | {plain_time:.3f} | {speedup:.2f} |\n")
326+
f.write(f"| {row['size']} | {simd_time:.3f} | {plain_time:.3f} | {speedup/100.0 + 1:.2f}x |\n")
316327

317328
f.write("\n")
318329

330+
import platform
331+
import subprocess
319332
def main():
333+
global gcc_version
334+
global msvc_version
335+
global os_platform
336+
337+
os_platform = platform.system()
338+
339+
if os_platform == 'Linux':
340+
output = subprocess.check_output(['gcc', '--version'], stderr=subprocess.STDOUT)
341+
output = output.decode('utf-8')
342+
gcc_version = re.search(r'(\d+\.\d+\.\d+)', output).group(1)
343+
print(f"GCC version: {gcc_version}")
344+
elif os_platform == 'Windows':
345+
try:
346+
result = subprocess.run(
347+
[
348+
r"C:\Program Files (x86)\Microsoft Visual Studio\Installer\vswhere.exe",
349+
"-latest",
350+
"-products", "*",
351+
"-requires", "Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
352+
"-property", "catalog_productDisplayVersion"
353+
],
354+
stdout=subprocess.PIPE,
355+
stderr=subprocess.PIPE,
356+
text=True,
357+
check=True
358+
)
359+
msvc_version = result.stdout.strip()
360+
except subprocess.CalledProcessError as e:
361+
print(f"Error getting MSVC version: {e}")
362+
return None
363+
320364
"""Main function to run the benchmark analysis."""
321365
# Parse command line arguments
322366
parser = argparse.ArgumentParser(description='Analyze SIMD benchmark results.')
323-
parser.add_argument('--input', '-i',
367+
parser.add_argument('--input_file', '-i',
324368
required=True,
325369
help='Path to the benchmark results file')
326-
parser.add_argument('--output', '-o',
370+
parser.add_argument('--output_dir', '-o',
327371
required=True,
328372
help='Directory to save analysis results')
329373
args = parser.parse_args()
330374

331-
input_file = args.input
332-
output_dir = args.output
375+
input_file = args.input_file
376+
output_dir = args.output_dir
333377

334378
print(f"Analyzing benchmarks from: {input_file}")
335379
print(f"Saving results to: {output_dir}")

run_tests.bat

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
cmake -S . -B build
22
cmake --build build\ --config Release
3-
.\build\Release\BasicSIMD_Tests.exe
3+
.\build\Release\BasicSIMD_Tests.exe > test_results.txt
4+
5+
python3 analyze_benchmarks.py --input_file=test_results.txt --output_dir=benchmark_results_windows_msvc/

run_tests.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@ cmake -DCMAKE_BUILD_TYPE=Release -S . -B build
77
cmake --build build --config Release
88

99
# Run the tests
10-
./build/BasicSIMD_Tests
10+
./build/BasicSIMD_Tests > test_results.txt
11+
12+
# Generate benchmark analysis
13+
python3 analyze_benchmarks.py --input_file=test_results.txt --output_dir=benchmark_results_linux_gcc/
1114

1215
# Make the output more readable
13-
echo "Test execution complete."
16+
echo "Test execution completed, plots saved to benchmark_results/ directory and README.md updated."

0 commit comments

Comments
 (0)