Skip to content

Update MLIR-AIE Wheel Dependency to use new Placer Logic#115

Open
andrej wants to merge 3 commits into
amd:develfrom
andrej:placer
Open

Update MLIR-AIE Wheel Dependency to use new Placer Logic#115
andrej wants to merge 3 commits into
amd:develfrom
andrej:placer

Conversation

@andrej

@andrej andrej commented May 11, 2026

Copy link
Copy Markdown
Collaborator

Some downstream work I have in the queue depends on a newer compiler version and but that comes with the new placer logic. It's cleaner to do the upgrade in a separate PR so proposing to do it now.

@andrej andrej requested review from hunhoffe and jgmelber as code owners May 11, 2026 22:56

@hunhoffe hunhoffe left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, another PR I missed! Thanks for doing this. Definitely move forward if this is blocking you. I had planned to do the minor release + IRON upgrade once the following landed:

However, I'm also more than fine with an intermediate update landing first, I can do the next update later :)

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

b1939ea (2026_06_29_18_01_20)

IRON - CI Summary

Examples

iron/applications/llama_3.2_1b
Test Krackan Status Krackan Phoenix Status Phoenix
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40] - - -

Small

iron/operators/axpy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0] 174.28 475.16
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0] 165.62 430.98
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0] 198.56 1126.24
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0] 247.96 - -
iron/operators/dequant
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32] 188.04 274.20
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32] 193.86 270.22
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32] 173.88 671.48
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32] 172.14 527.36
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32] 202.78 370.22
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32] 207.18 420.38
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32] 195.70 - -
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32] 221.98 - -
iron/operators/elementwise_add
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048] 183.12 285.54
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024] 180.20 296.80
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512] 209.16 326.86
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256] 233.24 - -
iron/operators/elementwise_mul
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048] 194.78 298.48
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024] 185.02 261.64
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512] 216.60 343.86
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256] 319.84 - -
iron/operators/gelu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 198.16 406.16
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 206.00 242.80
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 186.38 331.72
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 206.86 248.02
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 178.98 251.10
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 195.74 439.52
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 163.32 - -
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 237.08 - -
iron/operators/gemm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1] 2281.20 - -
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1] 268.04 520.90
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1] 236.24 484.00
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 48717.20 82017.10
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 28642.24 24713.80
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1] 7876.72 - -
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1] 2172.96 3827.72
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4] 3538.56 5989.74
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1] 1492.90 - -
iron/operators/gemv
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128] 0.21 0.08
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048] 12.36 3.71
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024] 22.84 5.83
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512] 39.33 9.55
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256] 41.84 - -
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024] 12.55 3.66
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024] 23.89 5.69
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024] 40.72 10.31
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024] 41.48 - -
iron/operators/layer_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 164.12 405.60
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 194.30 488.18
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 172.96 352.62
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 193.06 363.72
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 171.90 370.44
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 208.42 453.88
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 211.58 - -
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 224.50 - -
iron/operators/mem_copy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048] 203.12 417.62
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128] 238.82 - -
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024] 176.64 538.18
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024] 179.94 374.10
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512] 209.62 502.78
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512] 184.72 594.28
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256] 201.58 - -
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256] 215.56 612.82
iron/operators/mha
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0] - - -
iron/operators/relu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 147.88 324.60
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 156.26 406.10
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 152.34 614.58
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 159.86 659.40
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 175.60 369.48
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 179.32 437.00
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 198.44 - -
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 289.18 - -
iron/operators/rms_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False] 180.16 585.88
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True] 173.54 407.60
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False] 168.54 408.52
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True] 162.28 480.32
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False] 173.86 338.50
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True] 159.56 376.80
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False] 185.92 424.34
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True] 186.94 501.66
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False] 188.44 347.24
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True] 176.32 411.58
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False] 176.36 350.98
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True] 208.24 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False] 188.22 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True] 197.64 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False] 220.72 - -
iron/operators/rope
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0] 176.66 377.46
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0] 187.16 378.20
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0] 170.94 475.26
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0] 188.28 - -
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0] 194.70 413.96
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0] 217.84 318.64
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0] 181.70 492.26
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0] 196.80 - -
iron/operators/sigmoid
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 141.78 441.72
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 156.94 474.46
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 159.92 538.78
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 167.88 362.78
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 162.44 421.92
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 193.66 406.86
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 148.52 - -
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 257.76 - -
iron/operators/silu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 143.06 809.20
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 168.72 370.76
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 151.72 353.40
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 202.70 - -
iron/operators/softmax
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024] 192.48 457.98
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048] 179.86 514.28
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512] 172.38 457.42
iron/operators/swiglu_decode
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584] 3532.00 10232.53
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048] 3772.44 18599.83
iron/operators/swiglu_prefill
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False] 12436.76 23164.89
iron/operators/tanh
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 188.44 285.08
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 178.06 375.14
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 168.92 421.10
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 186.44 436.02
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 182.42 406.60
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 188.84 626.04
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 173.66 - -
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 228.86 - -
iron/operators/transpose
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1] 193.90 461.72
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2] 261.36 1251.96
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1] 182.32 426.88
Krackan - Small

IRON

Tested on 2026_06_29_18_01_20 at commit b1939ea.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5174.280.07n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5165.620.08n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5198.560.06n/a
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]✅ 5/5247.960.06n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5188.040.03n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5193.860.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5173.880.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5172.140.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5202.780.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5207.180.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]✅ 5/5195.700.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]✅ 5/5221.980.02n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5183.120.07n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5180.200.07n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5209.160.06n/a
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5233.240.06n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5194.780.06n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5185.020.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5216.600.06n/a
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5319.840.04n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5198.160.04n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5206.000.04n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5186.380.04n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5206.860.04n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5178.980.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5195.740.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5163.320.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5237.080.04n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]✅ 5/52281.204.171640.30
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5268.040.8837.70
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5236.240.9841.90
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/548717.200.52352.65
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/528642.240.88599.81
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/57876.723.202181.83
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/52172.963.80996.51
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/53538.560.3619.20
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]✅ 5/51492.904.541403.91
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.210.21
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a12.3612.35
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a22.8422.83
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a39.3339.30
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]✅ 5/5n/a41.8441.82
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a12.5512.54
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a23.8923.87
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a40.7240.70
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a41.4841.46
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5164.120.05n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5194.300.05n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5172.960.05n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5193.060.04n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5171.900.05n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5208.420.04n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5211.580.04n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5224.500.04n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5203.120.04n/a
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]✅ 5/5238.820.03n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5176.640.05n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5179.940.05n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5209.620.04n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5184.720.04n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]✅ 5/5201.580.04n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5215.560.04n/a
iron/operators/mha
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]❌ 0/5n/an/an/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5147.880.06n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5156.260.05n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5152.340.06n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5159.860.06n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5175.600.05n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5179.320.05n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5198.440.04n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5289.180.03n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5180.160.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5173.540.07n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5168.540.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5162.280.06n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5173.860.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5159.560.07n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5185.920.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5186.940.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5188.440.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5176.320.06n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5176.360.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]✅ 5/5208.240.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]✅ 5/5188.220.05n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]✅ 5/5197.640.05n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]✅ 5/5220.720.04n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5176.660.57n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5187.160.54n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5170.940.59n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]✅ 5/5188.280.53n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5194.700.39n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5217.840.37n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5181.700.41n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]✅ 5/5196.800.38n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5141.780.06n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5156.940.05n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5159.920.05n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5167.880.05n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5162.440.05n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5193.660.04n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5148.520.06n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5257.760.03n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5143.060.06n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5168.720.05n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5151.720.06n/a
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5202.700.05n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5192.480.69n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5179.860.78n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5172.380.78n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/53532.000.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/53772.440.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/512436.760.18n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5188.440.05n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5178.060.05n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5168.920.05n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5186.440.04n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5182.420.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5188.840.04n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5173.660.05n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5228.860.04n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]❌ 0/5193.902.77n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]❌ 0/5261.364.24n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]❌ 0/5182.322.97n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.10 (+6.02%)0.07 (-6.82%)0.07 (-6.82%)0.05 (-21.41%)0.02 (+89.40%)224.30 (+27.23%)174.28 (+11.42%)175.30 (+7.28%)122.00 (-5.72%)41.91 (+130.52%)
9c70ba8 — 2026-06-29 16:51:590.09 (n/a)0.08 (n/a)0.08 (n/a)0.07 (n/a)0.01 (n/a)176.30 (n/a)156.42 (n/a)163.40 (n/a)129.40 (n/a)18.18 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.10 (+23.53%)0.08 (+13.89%)0.08 (+12.45%)0.05 (-0.42%)0.02 (+71.89%)232.90 (+0.43%)165.62 (-10.09%)158.20 (-11.07%)125.10 (-19.08%)40.38 (+41.03%)
9c70ba8 — 2026-06-29 16:51:590.08 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.01 (n/a)231.90 (n/a)184.20 (n/a)177.90 (n/a)154.60 (n/a)28.63 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.08 (-18.22%)0.06 (-8.72%)0.06 (-1.40%)0.05 (-4.28%)0.01 (-46.09%)224.80 (+4.51%)198.56 (+6.93%)208.50 (+1.41%)160.50 (+22.24%)24.99 (-33.59%)
9c70ba8 — 2026-06-29 16:51:590.09 (n/a)0.07 (n/a)0.06 (n/a)0.06 (n/a)0.02 (n/a)215.10 (n/a)185.70 (n/a)205.60 (n/a)131.30 (n/a)37.63 (n/a)

test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (-4.12%)0.06 (-3.05%)0.06 (+7.63%)0.03 (-20.40%)0.02 (+29.91%)364.70 (+25.63%)247.96 (+9.00%)215.40 (-7.08%)166.70 (+4.32%)91.64 (+66.04%)
9c70ba8 — 2026-06-29 16:51:590.08 (n/a)0.06 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)290.30 (n/a)227.48 (n/a)231.80 (n/a)159.80 (n/a)55.19 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.04 (-5.47%)0.03 (-12.40%)0.03 (-7.40%)0.02 (-27.53%)0.01 (+11.29%)260.00 (+38.00%)188.04 (+16.23%)180.30 (+7.96%)141.70 (+5.83%)44.02 (+69.66%)
9c70ba8 — 2026-06-29 16:51:590.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)188.40 (n/a)161.78 (n/a)167.00 (n/a)133.90 (n/a)25.94 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.03 (+1.72%)0.03 (+6.20%)0.03 (+7.06%)0.02 (+10.16%)0.00 (-7.59%)228.80 (-9.24%)193.86 (-6.26%)194.60 (-6.58%)162.40 (-1.69%)25.94 (-17.49%)
9c70ba8 — 2026-06-29 16:51:590.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)252.10 (n/a)206.80 (n/a)208.30 (n/a)165.20 (n/a)31.44 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.04 (+8.29%)0.03 (-0.30%)0.03 (-8.42%)0.02 (-16.34%)0.01 (+84.00%)231.00 (+19.50%)173.88 (+4.22%)185.00 (+9.21%)124.90 (-7.69%)44.06 (+98.11%)
9c70ba8 — 2026-06-29 16:51:590.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)193.30 (n/a)166.84 (n/a)169.40 (n/a)135.30 (n/a)22.24 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (+46.27%)0.03 (+20.86%)0.03 (+9.35%)0.03 (+28.48%)0.01 (+74.52%)195.70 (-22.16%)172.14 (-15.67%)187.30 (-8.54%)106.30 (-31.64%)37.13 (-10.28%)
9c70ba8 — 2026-06-29 16:51:590.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)251.40 (n/a)204.12 (n/a)204.80 (n/a)155.50 (n/a)41.39 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (+29.01%)0.03 (+3.00%)0.03 (+4.02%)0.02 (-21.30%)0.01 (+80.05%)314.30 (+27.04%)202.78 (+3.69%)191.50 (-3.87%)115.90 (-22.47%)71.42 (+77.98%)
9c70ba8 — 2026-06-29 16:51:590.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)247.40 (n/a)195.56 (n/a)199.20 (n/a)149.50 (n/a)40.13 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.04 (+11.13%)0.03 (-8.51%)0.02 (-21.97%)0.02 (-12.82%)0.01 (+41.41%)272.20 (+14.71%)207.18 (+12.65%)224.40 (+28.16%)131.80 (-10.03%)53.29 (+42.44%)
9c70ba8 — 2026-06-29 16:51:590.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)237.30 (n/a)183.92 (n/a)175.10 (n/a)146.50 (n/a)37.41 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.03 (-9.85%)0.03 (-8.77%)0.03 (-15.24%)0.02 (+19.49%)0.00 (-47.53%)217.00 (-16.31%)195.70 (+5.54%)206.20 (+17.96%)156.30 (+10.93%)23.77 (-51.37%)
9c70ba8 — 2026-06-29 16:51:590.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)259.30 (n/a)185.42 (n/a)174.80 (n/a)140.90 (n/a)48.88 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.03 (-24.07%)0.02 (-11.12%)0.02 (-12.32%)0.02 (+27.82%)0.00 (-62.37%)240.10 (-21.77%)221.98 (+6.22%)232.70 (+14.07%)182.80 (+31.70%)24.09 (-61.61%)
9c70ba8 — 2026-06-29 16:51:590.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)306.90 (n/a)208.98 (n/a)204.00 (n/a)138.80 (n/a)62.76 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.08 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)219.00 (n/a)183.12 (n/a)179.60 (n/a)151.30 (n/a)30.72 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.09 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.01 (n/a)229.20 (n/a)180.20 (n/a)177.00 (n/a)140.30 (n/a)35.88 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.08 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)246.90 (n/a)209.16 (n/a)236.60 (n/a)146.70 (n/a)44.02 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.09 (n/a)0.06 (n/a)0.06 (n/a)0.03 (n/a)0.02 (n/a)382.80 (n/a)233.24 (n/a)211.10 (n/a)141.70 (n/a)89.55 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (n/a)0.06 (n/a)0.06 (n/a)0.06 (n/a)0.01 (n/a)214.20 (n/a)194.78 (n/a)193.60 (n/a)171.30 (n/a)17.62 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.08 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)222.60 (n/a)185.02 (n/a)186.60 (n/a)159.60 (n/a)26.26 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.08 (n/a)0.06 (n/a)0.07 (n/a)0.04 (n/a)0.02 (n/a)345.00 (n/a)216.60 (n/a)178.70 (n/a)156.20 (n/a)78.28 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (n/a)0.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)393.00 (n/a)319.84 (n/a)364.40 (n/a)224.70 (n/a)79.06 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)247.00 (n/a)198.16 (n/a)185.30 (n/a)173.50 (n/a)30.91 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)277.70 (n/a)206.00 (n/a)215.80 (n/a)118.80 (n/a)57.09 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (n/a)0.04 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)226.30 (n/a)186.38 (n/a)169.00 (n/a)157.40 (n/a)31.03 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.01 (n/a)357.90 (n/a)206.86 (n/a)169.70 (n/a)139.20 (n/a)87.33 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)203.30 (n/a)178.98 (n/a)171.90 (n/a)156.60 (n/a)22.92 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.05 (n/a)0.05 (n/a)0.02 (n/a)0.02 (n/a)353.90 (n/a)195.74 (n/a)165.90 (n/a)126.50 (n/a)90.54 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)208.70 (n/a)163.32 (n/a)167.30 (n/a)123.00 (n/a)31.84 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)273.80 (n/a)237.08 (n/a)232.20 (n/a)183.90 (n/a)37.14 (n/a)
iron/operators/gemm

test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:424.89 (-0.99%)4.17 (-5.49%)4.17 (-0.60%)3.50 (-12.54%)0.50 (+17.75%)2683.20 (+14.34%)2281.20 (+6.27%)2253.80 (+0.60%)1924.10 (+1.00%)273.57 (+37.07%)1922.65 (-0.99%)1640.30 (-5.49%)1641.38 (-0.60%)1378.72 (-12.54%)195.70 (+17.75%)
9c70ba8 — 2026-06-29 16:51:594.94 (n/a)4.41 (n/a)4.20 (n/a)4.01 (n/a)0.42 (n/a)2346.60 (n/a)2146.64 (n/a)2240.30 (n/a)1905.10 (n/a)199.58 (n/a)1941.82 (n/a)1735.67 (n/a)1651.26 (n/a)1576.47 (n/a)166.20 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:421.18 (-12.71%)0.88 (-22.80%)0.99 (-24.09%)0.60 (-11.79%)0.25 (-14.65%)370.10 (+13.39%)268.04 (+29.40%)224.30 (+31.79%)187.60 (+14.53%)80.05 (+15.20%)50.30 (-12.71%)37.70 (-22.80%)42.08 (-24.09%)25.50 (-11.79%)10.53 (-14.65%)
9c70ba8 — 2026-06-29 16:51:591.35 (n/a)1.14 (n/a)1.30 (n/a)0.68 (n/a)0.29 (n/a)326.40 (n/a)207.14 (n/a)170.20 (n/a)163.80 (n/a)69.49 (n/a)57.63 (n/a)48.83 (n/a)55.44 (n/a)28.91 (n/a)12.34 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:421.27 (-3.23%)0.98 (-8.44%)1.01 (-15.88%)0.67 (+2.15%)0.23 (-21.78%)331.90 (-2.09%)236.24 (+6.46%)218.70 (+18.86%)174.10 (+3.32%)60.73 (-17.14%)54.19 (-3.23%)41.90 (-8.44%)43.14 (-15.88%)28.43 (+2.15%)9.72 (-21.78%)
9c70ba8 — 2026-06-29 16:51:591.31 (n/a)1.07 (n/a)1.20 (n/a)0.65 (n/a)0.29 (n/a)339.00 (n/a)221.90 (n/a)184.00 (n/a)168.50 (n/a)73.29 (n/a)56.00 (n/a)45.76 (n/a)51.29 (n/a)27.83 (n/a)12.42 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:420.52 (-0.81%)0.52 (-0.37%)0.52 (-0.08%)0.51 (-0.47%)0.00 (-28.19%)48898.00 (+0.47%)48717.20 (+0.37%)48631.70 (+0.08%)48606.60 (+0.82%)135.14 (-27.22%)353.45 (-0.81%)352.65 (-0.37%)353.26 (-0.08%)351.34 (-0.47%)0.98 (-28.19%)
9c70ba8 — 2026-06-29 16:51:590.52 (n/a)0.52 (n/a)0.52 (n/a)0.52 (n/a)0.00 (n/a)48668.40 (n/a)48539.04 (n/a)48595.00 (n/a)48211.30 (n/a)185.70 (n/a)356.35 (n/a)353.94 (n/a)353.53 (n/a)353.00 (n/a)1.36 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:420.88 (+0.03%)0.88 (+0.05%)0.88 (+0.04%)0.88 (+0.44%)0.00 (-32.31%)28736.40 (-0.44%)28642.24 (-0.05%)28667.90 (-0.04%)28503.10 (-0.03%)96.16 (-32.59%)602.74 (+0.03%)599.81 (+0.05%)599.27 (+0.04%)597.84 (+0.44%)2.02 (-32.32%)
9c70ba8 — 2026-06-29 16:51:590.88 (n/a)0.88 (n/a)0.88 (n/a)0.87 (n/a)0.00 (n/a)28863.80 (n/a)28656.32 (n/a)28678.50 (n/a)28511.80 (n/a)142.64 (n/a)602.55 (n/a)599.53 (n/a)599.05 (n/a)595.20 (n/a)2.98 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:423.31 (-3.68%)3.20 (-2.48%)3.17 (-4.13%)3.15 (+0.17%)0.07 (-45.67%)7998.60 (-0.17%)7876.72 (+2.46%)7927.50 (+4.31%)7610.00 (+3.82%)159.66 (-44.04%)2257.53 (-3.68%)2181.83 (-2.48%)2167.13 (-4.13%)2147.85 (+0.17%)45.11 (-45.67%)
9c70ba8 — 2026-06-29 16:51:593.43 (n/a)3.28 (n/a)3.31 (n/a)3.14 (n/a)0.12 (n/a)8012.50 (n/a)7687.26 (n/a)7600.00 (n/a)7330.00 (n/a)285.31 (n/a)2343.79 (n/a)2237.32 (n/a)2260.51 (n/a)2144.14 (n/a)83.03 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:424.23 (+9.93%)3.80 (+12.07%)4.11 (+23.82%)2.81 (-3.65%)0.60 (+62.42%)2865.20 (+3.80%)2172.96 (-9.48%)1962.10 (-19.24%)1904.10 (-9.03%)408.07 (+54.13%)1110.17 (+9.93%)996.51 (+12.07%)1077.41 (+23.82%)737.80 (-3.65%)158.30 (+62.42%)
9c70ba8 — 2026-06-29 16:51:593.85 (n/a)3.39 (n/a)3.32 (n/a)2.92 (n/a)0.37 (n/a)2760.40 (n/a)2400.60 (n/a)2429.40 (n/a)2093.20 (n/a)264.75 (n/a)1009.89 (n/a)889.15 (n/a)870.16 (n/a)765.79 (n/a)97.46 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:420.43 (-7.29%)0.36 (-5.60%)0.36 (+3.57%)0.31 (-4.96%)0.05 (-26.69%)4049.50 (+5.21%)3538.56 (+5.07%)3507.10 (-3.45%)2901.10 (+7.86%)422.52 (-18.36%)23.13 (-7.29%)19.20 (-5.60%)19.14 (+3.57%)16.57 (-4.96%)2.45 (-26.69%)
9c70ba8 — 2026-06-29 16:51:590.46 (n/a)0.38 (n/a)0.34 (n/a)0.32 (n/a)0.06 (n/a)3848.80 (n/a)3367.78 (n/a)3632.40 (n/a)2689.60 (n/a)517.52 (n/a)24.95 (n/a)20.34 (n/a)18.48 (n/a)17.44 (n/a)3.34 (n/a)

test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:425.07 (-21.00%)4.54 (-9.54%)4.76 (+0.83%)3.42 (-13.68%)0.65 (-29.72%)1944.60 (+15.85%)1492.90 (+9.90%)1398.20 (-0.83%)1311.90 (+26.58%)255.86 (+7.94%)1566.58 (-21.00%)1403.91 (-9.54%)1469.91 (+0.83%)1056.87 (-13.68%)199.32 (-29.72%)
9c70ba8 — 2026-06-29 16:51:596.42 (n/a)5.02 (n/a)4.72 (n/a)3.96 (n/a)0.92 (n/a)1678.60 (n/a)1358.40 (n/a)1409.90 (n/a)1036.40 (n/a)237.03 (n/a)1982.97 (n/a)1552.03 (n/a)1457.75 (n/a)1224.38 (n/a)283.59 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:420.29 (+9.54%)0.21 (-8.74%)0.19 (-11.56%)0.18 (-5.51%)0.05 (+57.41%)0.29 (+9.54%)0.21 (-8.74%)0.19 (-11.56%)0.18 (-5.51%)0.04 (+57.41%)
9c70ba8 — 2026-06-29 16:51:590.27 (n/a)0.23 (n/a)0.22 (n/a)0.19 (n/a)0.03 (n/a)0.26 (n/a)0.23 (n/a)0.22 (n/a)0.19 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4213.17 (-0.34%)12.36 (-1.25%)12.25 (-5.86%)11.76 (+7.94%)0.52 (-46.20%)13.16 (-0.34%)12.35 (-1.25%)12.24 (-5.86%)11.75 (+7.94%)0.51 (-46.20%)
9c70ba8 — 2026-06-29 16:51:5913.21 (n/a)12.51 (n/a)13.01 (n/a)10.89 (n/a)0.96 (n/a)13.21 (n/a)12.51 (n/a)13.00 (n/a)10.89 (n/a)0.96 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4224.39 (-3.25%)22.84 (-6.71%)22.03 (-10.69%)21.64 (-7.99%)1.39 (+111.75%)24.37 (-3.25%)22.83 (-6.71%)22.02 (-10.69%)21.63 (-7.99%)1.39 (+111.75%)
9c70ba8 — 2026-06-29 16:51:5925.21 (n/a)24.49 (n/a)24.67 (n/a)23.52 (n/a)0.65 (n/a)25.19 (n/a)24.47 (n/a)24.66 (n/a)23.50 (n/a)0.65 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4242.89 (+0.12%)39.33 (-1.98%)39.83 (-0.52%)34.80 (-4.87%)2.91 (+20.93%)42.87 (+0.12%)39.30 (-1.98%)39.81 (-0.52%)34.78 (-4.87%)2.91 (+20.93%)
9c70ba8 — 2026-06-29 16:51:5942.84 (n/a)40.12 (n/a)40.04 (n/a)36.58 (n/a)2.41 (n/a)42.82 (n/a)40.10 (n/a)40.02 (n/a)36.56 (n/a)2.41 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4243.31 (-4.76%)41.84 (-2.68%)42.39 (+0.76%)39.85 (-0.54%)1.48 (-36.88%)43.29 (-4.76%)41.82 (-2.68%)42.37 (+0.76%)39.83 (-0.54%)1.48 (-36.88%)
9c70ba8 — 2026-06-29 16:51:5945.48 (n/a)42.99 (n/a)42.07 (n/a)40.07 (n/a)2.34 (n/a)45.45 (n/a)42.97 (n/a)42.05 (n/a)40.05 (n/a)2.34 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4213.41 (-0.38%)12.55 (-0.03%)13.15 (+1.66%)10.65 (-0.50%)1.13 (+3.27%)13.41 (-0.38%)12.54 (-0.03%)13.14 (+1.66%)10.65 (-0.50%)1.13 (+3.27%)
9c70ba8 — 2026-06-29 16:51:5913.46 (n/a)12.55 (n/a)12.94 (n/a)10.71 (n/a)1.10 (n/a)13.46 (n/a)12.54 (n/a)12.93 (n/a)10.70 (n/a)1.10 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4224.81 (-0.29%)23.89 (-1.70%)23.97 (-2.70%)22.63 (-2.28%)0.79 (+14.02%)24.80 (-0.29%)23.87 (-1.70%)23.96 (-2.70%)22.61 (-2.28%)0.79 (+14.02%)
9c70ba8 — 2026-06-29 16:51:5924.88 (n/a)24.30 (n/a)24.64 (n/a)23.16 (n/a)0.69 (n/a)24.87 (n/a)24.28 (n/a)24.62 (n/a)23.14 (n/a)0.69 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4241.93 (-1.47%)40.72 (+0.82%)40.60 (+2.69%)39.84 (+3.60%)0.79 (-55.90%)41.91 (-1.47%)40.70 (+0.82%)40.57 (+2.69%)39.81 (+3.60%)0.79 (-55.90%)
9c70ba8 — 2026-06-29 16:51:5942.56 (n/a)40.39 (n/a)39.53 (n/a)38.45 (n/a)1.80 (n/a)42.53 (n/a)40.37 (n/a)39.51 (n/a)38.43 (n/a)1.80 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:56:4245.55 (+0.57%)41.48 (-6.18%)42.76 (-4.53%)36.88 (-13.69%)3.78 (+247.91%)45.52 (+0.57%)41.46 (-6.18%)42.74 (-4.53%)36.86 (-13.69%)3.77 (+247.91%)
9c70ba8 — 2026-06-29 16:51:5945.29 (n/a)44.22 (n/a)44.80 (n/a)42.73 (n/a)1.09 (n/a)45.26 (n/a)44.19 (n/a)44.77 (n/a)42.70 (n/a)1.08 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)193.20 (n/a)164.12 (n/a)158.20 (n/a)135.90 (n/a)23.86 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)266.60 (n/a)194.30 (n/a)194.60 (n/a)128.80 (n/a)55.73 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)226.20 (n/a)172.96 (n/a)167.50 (n/a)129.80 (n/a)35.01 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)230.50 (n/a)193.06 (n/a)196.70 (n/a)158.40 (n/a)26.33 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)213.60 (n/a)171.90 (n/a)176.70 (n/a)127.70 (n/a)35.62 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)310.40 (n/a)208.42 (n/a)207.90 (n/a)132.90 (n/a)66.45 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.02 (n/a)396.60 (n/a)211.58 (n/a)175.90 (n/a)131.80 (n/a)105.56 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)303.80 (n/a)224.50 (n/a)213.60 (n/a)191.30 (n/a)45.39 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (+6.51%)0.04 (-15.40%)0.04 (-19.28%)0.03 (-36.47%)0.02 (+83.50%)299.70 (+57.41%)203.12 (+26.85%)191.20 (+23.91%)121.20 (-6.12%)67.00 (+166.23%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)190.40 (n/a)160.12 (n/a)154.30 (n/a)129.10 (n/a)25.16 (n/a)

test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.04 (-31.42%)0.03 (-11.21%)0.03 (-13.08%)0.03 (+48.16%)0.00 (-80.87%)255.40 (-32.49%)238.82 (+3.42%)241.80 (+15.09%)221.00 (+45.78%)15.19 (-82.36%)
9c70ba8 — 2026-06-29 16:51:590.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)378.30 (n/a)230.92 (n/a)210.10 (n/a)151.60 (n/a)86.13 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (-2.89%)0.05 (-12.27%)0.05 (-10.28%)0.04 (-14.08%)0.01 (+9.51%)220.00 (+16.34%)176.64 (+15.42%)171.00 (+11.47%)123.10 (+3.01%)38.36 (+32.12%)
9c70ba8 — 2026-06-29 16:51:590.07 (n/a)0.06 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)189.10 (n/a)153.04 (n/a)153.40 (n/a)119.50 (n/a)29.03 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (-23.43%)0.05 (-18.53%)0.05 (-11.34%)0.04 (-17.25%)0.01 (-37.98%)232.50 (+20.84%)179.94 (+20.89%)177.70 (+12.75%)140.30 (+30.63%)33.24 (+0.80%)
9c70ba8 — 2026-06-29 16:51:590.08 (n/a)0.06 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)192.40 (n/a)148.84 (n/a)157.60 (n/a)107.40 (n/a)32.97 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (+16.70%)0.04 (+0.73%)0.04 (-11.48%)0.02 (-19.80%)0.02 (+81.16%)349.80 (+24.71%)209.62 (+7.92%)205.50 (+12.97%)133.70 (-14.35%)88.20 (+76.93%)
9c70ba8 — 2026-06-29 16:51:590.05 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)280.50 (n/a)194.24 (n/a)181.90 (n/a)156.10 (n/a)49.85 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (-19.50%)0.04 (-9.69%)0.04 (-3.31%)0.04 (+0.50%)0.01 (-47.24%)217.60 (-0.50%)184.72 (+7.92%)185.20 (+3.41%)152.30 (+24.23%)24.27 (-34.17%)
9c70ba8 — 2026-06-29 16:51:590.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)218.70 (n/a)171.16 (n/a)179.10 (n/a)122.60 (n/a)36.86 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (-11.14%)0.04 (-16.69%)0.04 (-16.08%)0.04 (-17.20%)0.00 (+16.60%)219.60 (+20.79%)201.58 (+20.55%)203.80 (+19.11%)167.70 (+12.55%)21.16 (+58.71%)
9c70ba8 — 2026-06-29 16:51:590.05 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.00 (n/a)181.80 (n/a)167.22 (n/a)171.10 (n/a)149.00 (n/a)13.33 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (-1.10%)0.04 (-18.64%)0.04 (-15.34%)0.02 (-39.83%)0.01 (+66.16%)336.50 (+66.17%)215.56 (+30.61%)197.20 (+18.15%)142.40 (+1.14%)72.63 (+194.33%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)202.50 (n/a)165.04 (n/a)166.90 (n/a)140.80 (n/a)24.68 (n/a)
iron/operators/mha

test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:42n/a (n/a)n/a (n/a)n/a (n/a)n/a (n/a)n/a (n/a)n/a (n/a)n/a (n/a)n/a (n/a)n/a (n/a)n/a (n/a)
9c70ba8 — 2026-06-29 16:51:590.21 (n/a)0.20 (n/a)0.21 (n/a)0.20 (n/a)0.00 (n/a)40988.20 (n/a)40927.74 (n/a)40913.30 (n/a)40866.90 (n/a)51.11 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (-8.49%)0.05 (-5.27%)0.05 (-7.99%)0.04 (+16.82%)0.01 (-41.63%)198.60 (-14.40%)180.16 (+3.45%)178.90 (+8.69%)151.30 (+9.24%)18.86 (-46.80%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)232.00 (n/a)174.16 (n/a)164.60 (n/a)138.50 (n/a)35.45 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.09 (+14.75%)0.07 (+0.03%)0.07 (-8.00%)0.06 (-10.98%)0.01 (+117.18%)219.90 (+12.37%)173.54 (+2.30%)179.00 (+8.68%)136.90 (-12.86%)33.47 (+108.87%)
9c70ba8 — 2026-06-29 16:51:590.08 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)195.70 (n/a)169.64 (n/a)164.70 (n/a)157.10 (n/a)16.02 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (+14.20%)0.05 (-3.31%)0.05 (-3.34%)0.04 (-15.88%)0.01 (+126.05%)211.40 (+18.90%)168.54 (+5.68%)168.10 (+3.45%)126.60 (-12.45%)30.24 (+133.13%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.00 (n/a)177.80 (n/a)159.48 (n/a)162.50 (n/a)144.60 (n/a)12.97 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.08 (+10.14%)0.06 (+21.87%)0.07 (+30.32%)0.05 (+91.39%)0.01 (-34.85%)198.20 (-47.76%)162.28 (-25.35%)150.90 (-23.25%)126.30 (-9.27%)29.06 (-69.59%)
9c70ba8 — 2026-06-29 16:51:590.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.02 (n/a)379.40 (n/a)217.40 (n/a)196.60 (n/a)139.20 (n/a)95.58 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (+30.00%)0.05 (+13.38%)0.04 (-5.75%)0.04 (+26.64%)0.01 (+47.82%)205.50 (-21.05%)173.86 (-11.13%)190.80 (+6.06%)124.00 (-23.12%)33.85 (-12.71%)
9c70ba8 — 2026-06-29 16:51:590.05 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)260.30 (n/a)195.64 (n/a)179.90 (n/a)161.30 (n/a)38.78 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.09 (+11.56%)0.07 (+4.03%)0.06 (+0.38%)0.05 (+13.29%)0.01 (+9.67%)190.30 (-11.73%)159.56 (-4.05%)166.70 (-0.36%)118.20 (-10.39%)27.57 (-14.86%)
9c70ba8 — 2026-06-29 16:51:590.08 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)215.60 (n/a)166.30 (n/a)167.30 (n/a)131.90 (n/a)32.38 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (-28.88%)0.05 (-14.11%)0.04 (-6.66%)0.03 (-17.96%)0.01 (-36.75%)242.50 (+21.92%)185.92 (+14.85%)183.50 (+7.12%)145.60 (+40.54%)39.86 (+13.01%)
9c70ba8 — 2026-06-29 16:51:590.08 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)198.90 (n/a)161.88 (n/a)171.30 (n/a)103.60 (n/a)35.27 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (-12.55%)0.05 (-5.04%)0.05 (-10.49%)0.04 (-0.08%)0.01 (-21.64%)243.90 (+0.08%)186.94 (+3.44%)189.90 (+11.71%)131.10 (+14.40%)43.79 (-9.84%)
9c70ba8 — 2026-06-29 16:51:590.08 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.02 (n/a)243.70 (n/a)180.72 (n/a)170.00 (n/a)114.60 (n/a)48.57 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (-19.06%)0.05 (-2.47%)0.04 (-15.37%)0.03 (+56.90%)0.01 (-42.52%)234.40 (-36.27%)188.44 (-7.84%)206.40 (+18.15%)139.30 (+23.60%)40.83 (-57.89%)
9c70ba8 — 2026-06-29 16:51:590.07 (n/a)0.05 (n/a)0.05 (n/a)0.02 (n/a)0.02 (n/a)367.80 (n/a)204.48 (n/a)174.70 (n/a)112.70 (n/a)96.97 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (+20.37%)0.06 (-1.67%)0.06 (+8.20%)0.04 (-29.18%)0.02 (+352.41%)245.70 (+41.21%)176.32 (+8.26%)152.60 (-7.57%)125.30 (-16.91%)51.73 (+441.33%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.00 (n/a)174.00 (n/a)162.86 (n/a)165.10 (n/a)150.80 (n/a)9.56 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (-8.96%)0.05 (+7.16%)0.05 (+8.88%)0.03 (+11.94%)0.01 (-16.70%)259.90 (-10.66%)176.36 (-8.74%)165.50 (-8.16%)139.20 (+9.87%)49.33 (-19.48%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)290.90 (n/a)193.26 (n/a)180.20 (n/a)126.70 (n/a)61.27 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.06 (-3.68%)0.04 (-1.09%)0.04 (+0.42%)0.03 (-19.77%)0.01 (+26.48%)286.20 (+24.65%)208.24 (+3.64%)216.70 (-0.41%)155.80 (+3.80%)54.02 (+53.21%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)229.60 (n/a)200.92 (n/a)217.60 (n/a)150.10 (n/a)35.26 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.07 (+4.50%)0.05 (-9.06%)0.04 (-18.12%)0.03 (-0.50%)0.01 (+23.12%)250.10 (+0.48%)188.22 (+12.06%)190.10 (+22.09%)123.40 (-4.27%)53.13 (+13.15%)
9c70ba8 — 2026-06-29 16:51:590.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)248.90 (n/a)167.96 (n/a)155.70 (n/a)128.90 (n/a)46.95 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (+19.89%)0.05 (+10.99%)0.04 (+4.19%)0.04 (-2.84%)0.01 (+159.16%)240.70 (+2.91%)197.64 (-8.02%)201.90 (-3.99%)161.90 (-16.59%)34.81 (+113.69%)
9c70ba8 — 2026-06-29 16:51:590.04 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)233.90 (n/a)214.88 (n/a)210.30 (n/a)194.10 (n/a)16.29 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.05 (-3.50%)0.04 (-0.35%)0.04 (-4.37%)0.03 (+6.22%)0.01 (-3.86%)287.50 (-5.86%)220.72 (-0.15%)226.60 (+4.57%)174.50 (+3.68%)45.53 (-11.11%)
9c70ba8 — 2026-06-29 16:51:590.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)305.40 (n/a)221.06 (n/a)216.70 (n/a)168.30 (n/a)51.22 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.64 (-14.69%)0.57 (-1.44%)0.62 (+30.31%)0.43 (-6.10%)0.09 (-41.84%)227.60 (+6.50%)176.66 (-1.52%)158.40 (-23.26%)154.20 (+17.17%)31.31 (-27.39%)
9c70ba8 — 2026-06-29 16:51:590.75 (n/a)0.58 (n/a)0.48 (n/a)0.46 (n/a)0.15 (n/a)213.70 (n/a)179.38 (n/a)206.40 (n/a)131.60 (n/a)43.11 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.63 (-9.84%)0.54 (+11.91%)0.55 (+32.02%)0.40 (+11.28%)0.09 (-39.54%)247.20 (-10.14%)187.16 (-14.48%)179.30 (-24.22%)156.20 (+10.94%)36.06 (-40.13%)
9c70ba8 — 2026-06-29 16:51:590.70 (n/a)0.48 (n/a)0.42 (n/a)0.36 (n/a)0.15 (n/a)275.10 (n/a)218.84 (n/a)236.60 (n/a)140.80 (n/a)60.23 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.69 (+18.93%)0.59 (+11.34%)0.59 (+6.48%)0.46 (+6.02%)0.09 (+36.75%)215.40 (-5.69%)170.94 (-9.58%)167.20 (-6.12%)141.90 (-15.89%)28.20 (+10.65%)
9c70ba8 — 2026-06-29 16:51:590.58 (n/a)0.53 (n/a)0.55 (n/a)0.43 (n/a)0.07 (n/a)228.40 (n/a)189.06 (n/a)178.10 (n/a)168.70 (n/a)25.49 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.59 (-3.57%)0.53 (+3.71%)0.53 (+3.71%)0.46 (+15.79%)0.05 (-53.89%)212.60 (-13.61%)188.28 (-5.96%)185.10 (-3.59%)167.80 (+3.71%)16.56 (-58.32%)
9c70ba8 — 2026-06-29 16:51:590.61 (n/a)0.51 (n/a)0.51 (n/a)0.40 (n/a)0.10 (n/a)246.10 (n/a)200.22 (n/a)192.00 (n/a)161.80 (n/a)39.73 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.51 (+8.46%)0.39 (-11.58%)0.36 (-18.19%)0.32 (-23.25%)0.08 (+245.79%)231.80 (+30.30%)194.70 (+16.16%)206.10 (+22.24%)143.60 (-7.77%)34.77 (+310.62%)
9c70ba8 — 2026-06-29 16:51:590.47 (n/a)0.44 (n/a)0.44 (n/a)0.41 (n/a)0.02 (n/a)177.90 (n/a)167.62 (n/a)168.60 (n/a)155.70 (n/a)8.47 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.46 (-6.64%)0.37 (-13.07%)0.40 (-4.44%)0.19 (-42.99%)0.10 (+66.23%)382.50 (+75.38%)217.84 (+24.62%)183.20 (+4.63%)161.30 (+7.10%)92.55 (+236.91%)
9c70ba8 — 2026-06-29 16:51:590.49 (n/a)0.43 (n/a)0.42 (n/a)0.34 (n/a)0.06 (n/a)218.10 (n/a)174.80 (n/a)175.10 (n/a)150.60 (n/a)27.47 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.49 (+1.71%)0.41 (+0.51%)0.41 (+0.83%)0.34 (-0.97%)0.06 (+23.84%)218.50 (+0.97%)181.70 (+0.15%)180.50 (-0.82%)152.00 (-1.68%)28.72 (+21.14%)
9c70ba8 — 2026-06-29 16:51:590.48 (n/a)0.41 (n/a)0.41 (n/a)0.34 (n/a)0.05 (n/a)216.40 (n/a)181.42 (n/a)182.00 (n/a)154.60 (n/a)23.71 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.42 (-20.05%)0.38 (-3.94%)0.40 (+7.66%)0.33 (+30.16%)0.04 (-64.85%)221.20 (-23.17%)196.80 (-2.43%)186.50 (-7.12%)174.00 (+25.09%)22.06 (-64.30%)
9c70ba8 — 2026-06-29 16:51:590.53 (n/a)0.39 (n/a)0.37 (n/a)0.26 (n/a)0.12 (n/a)287.90 (n/a)201.70 (n/a)200.80 (n/a)139.10 (n/a)61.81 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.84 (-10.06%)0.69 (-2.61%)0.67 (-6.14%)0.59 (+5.11%)0.10 (-31.54%)223.40 (-4.90%)192.48 (+1.19%)194.80 (+6.56%)156.60 (+11.14%)25.60 (-27.39%)
9c70ba8 — 2026-06-29 16:51:590.93 (n/a)0.71 (n/a)0.72 (n/a)0.56 (n/a)0.14 (n/a)234.90 (n/a)190.22 (n/a)182.80 (n/a)140.90 (n/a)35.26 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:421.17 (+48.97%)0.78 (+12.87%)0.69 (+2.45%)0.58 (-3.81%)0.24 (+231.09%)225.60 (+3.96%)179.86 (-6.10%)190.80 (-2.40%)112.40 (-32.86%)47.59 (+135.31%)
9c70ba8 — 2026-06-29 16:51:590.78 (n/a)0.69 (n/a)0.67 (n/a)0.60 (n/a)0.07 (n/a)217.00 (n/a)191.54 (n/a)195.50 (n/a)167.40 (n/a)20.22 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.97 (-5.84%)0.78 (+0.31%)0.73 (+1.56%)0.61 (+9.84%)0.14 (-25.52%)215.10 (-8.97%)172.38 (-2.47%)178.80 (-1.54%)135.00 (+6.22%)31.16 (-27.75%)
9c70ba8 — 2026-06-29 16:51:591.03 (n/a)0.78 (n/a)0.72 (n/a)0.55 (n/a)0.19 (n/a)236.30 (n/a)176.74 (n/a)181.60 (n/a)127.10 (n/a)43.13 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.00 (+0.00%)0.00 (+13.46%)0.00 (+9.09%)0.00 (+22.22%)0.00 (-66.67%)3649.43 (-18.46%)3532.00 (-11.57%)3507.31 (-8.23%)3493.26 (-0.69%)66.37 (-83.82%)
9c70ba8 — 2026-06-29 16:51:590.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4475.59 (n/a)3994.10 (n/a)3821.83 (n/a)3517.56 (n/a)410.30 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.00 (+0.00%)0.00 (+13.40%)0.00 (+27.78%)0.00 (+5.88%)0.00 (-22.38%)4675.41 (-2.03%)3772.44 (-10.96%)3548.84 (-21.32%)3531.42 (-0.07%)504.91 (-14.82%)
9c70ba8 — 2026-06-29 16:51:590.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4772.10 (n/a)4236.73 (n/a)4510.67 (n/a)3533.78 (n/a)592.76 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:420.28 (+0.04%)0.18 (-2.30%)0.16 (+3.77%)0.14 (+6.02%)0.05 (-4.08%)14518.73 (-5.70%)12436.76 (+1.37%)13369.40 (-3.63%)7613.92 (-0.06%)2761.04 (-12.30%)
9c70ba8 — 2026-06-29 16:51:590.28 (n/a)0.18 (n/a)0.15 (n/a)0.14 (n/a)0.06 (n/a)15396.52 (n/a)12268.40 (n/a)13872.79 (n/a)7618.14 (n/a)3148.14 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:423.49 (+12.51%)2.77 (+1.73%)2.67 (-2.80%)2.22 (+1.63%)0.49 (+27.92%)236.50 (-1.62%)193.90 (-0.98%)196.60 (+2.88%)150.30 (-11.12%)33.24 (+12.53%)
9c70ba8 — 2026-06-29 16:51:593.10 (n/a)2.72 (n/a)2.74 (n/a)2.18 (n/a)0.39 (n/a)240.40 (n/a)195.82 (n/a)191.10 (n/a)169.10 (n/a)29.54 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:425.23 (-5.14%)4.24 (-17.07%)4.81 (-9.54%)2.82 (-34.27%)1.04 (+116.49%)371.70 (+52.15%)261.36 (+26.51%)218.10 (+10.54%)200.60 (+5.41%)73.27 (+238.95%)
9c70ba8 — 2026-06-29 16:51:595.51 (n/a)5.12 (n/a)5.32 (n/a)4.29 (n/a)0.48 (n/a)244.30 (n/a)206.60 (n/a)197.30 (n/a)190.30 (n/a)21.62 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-23 23:08:223.72 (+20.32%)3.12 (+16.02%)3.04 (+6.32%)2.73 (+32.06%)0.43 (+8.15%)192.20 (-24.30%)170.32 (-14.26%)172.50 (-5.94%)141.00 (-16.86%)22.57 (-32.31%)
4d4b803 — 2026-06-22 17:54:573.09 (n/a)2.69 (n/a)2.86 (n/a)2.07 (n/a)0.40 (n/a)253.90 (n/a)198.64 (n/a)183.40 (n/a)169.60 (n/a)33.34 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:56:423.67 (+6.83%)2.97 (+3.61%)3.06 (+7.20%)2.26 (-1.11%)0.58 (+22.89%)231.70 (+1.13%)182.32 (-2.55%)171.30 (-6.75%)143.00 (-6.35%)36.73 (+17.62%)
9c70ba8 — 2026-06-29 16:51:593.43 (n/a)2.86 (n/a)2.85 (n/a)2.29 (n/a)0.47 (n/a)229.10 (n/a)187.10 (n/a)183.70 (n/a)152.70 (n/a)31.23 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-23 23:08:224.36 (+7.95%)3.26 (+13.04%)3.33 (+22.37%)2.38 (+18.97%)0.79 (-14.92%)220.70 (-15.96%)168.40 (-14.68%)157.40 (-18.28%)120.30 (-7.39%)40.78 (-34.38%)
4d4b803 — 2026-06-22 17:54:574.04 (n/a)2.89 (n/a)2.72 (n/a)2.00 (n/a)0.93 (n/a)262.60 (n/a)197.38 (n/a)192.60 (n/a)129.90 (n/a)62.14 (n/a)
Krackan - Examples

IRON

Tested on 2026_06_29_18_10_43 at commit b1939ea.

iron/applications/llama_3.2_1b
TestChecksTTFT (mean)TPS (mean)
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]✅ 5/52.12n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]✅ 5/52.154.30
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]✅ 5/52.08n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]✅ 5/52.074.28

Trends:

IRON Trends

iron/applications/llama_3.2_1b

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b1939ea — 2026-06-29 18:05:022.13 (-0.51%)2.12 (-0.09%)2.12 (-0.33%)2.11 (+0.57%)0.01 (-44.93%)
9c70ba8 — 2026-06-29 17:01:322.15 (n/a)2.12 (n/a)2.12 (n/a)2.10 (n/a)0.02 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b1939ea — 2026-06-29 18:05:024.32 (+2.66%)4.30 (+3.09%)4.31 (+3.21%)4.28 (+3.43%)0.01 (-47.66%)2.25 (-0.66%)2.15 (-0.37%)2.12 (-0.38%)2.10 (-0.47%)0.06 (-6.50%)
9c70ba8 — 2026-06-29 17:01:324.20 (n/a)4.18 (n/a)4.17 (n/a)4.14 (n/a)0.02 (n/a)2.26 (n/a)2.15 (n/a)2.13 (n/a)2.11 (n/a)0.06 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b1939ea — 2026-06-29 18:05:022.10 (-0.05%)2.08 (-0.01%)2.08 (-0.05%)2.06 (+0.00%)0.01 (+3.14%)
9c70ba8 — 2026-06-29 17:01:322.10 (n/a)2.08 (n/a)2.08 (n/a)2.06 (n/a)0.01 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
b1939ea — 2026-06-29 18:05:024.30 (+3.11%)4.28 (+2.66%)4.27 (+2.52%)4.27 (+2.72%)0.02 (+100.44%)2.08 (-1.33%)2.07 (-0.29%)2.06 (-0.29%)2.06 (+0.15%)0.01 (-62.34%)
9c70ba8 — 2026-06-29 17:01:324.17 (n/a)4.17 (n/a)4.17 (n/a)4.15 (n/a)0.01 (n/a)2.10 (n/a)2.07 (n/a)2.07 (n/a)2.06 (n/a)0.02 (n/a)
Phoenix - Small

IRON

Tested on 2026_06_29_17_47_34 at commit b1939ea.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5475.160.03n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5430.980.03n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/51126.240.02n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5274.200.02n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5270.220.02n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5671.480.01n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5527.360.01n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5370.220.02n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5420.380.01n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5285.540.04n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5296.800.04n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5326.860.04n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5298.480.04n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5261.640.05n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5343.860.04n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5406.160.02n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5242.800.03n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5331.720.03n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5248.020.03n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5251.100.03n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5439.520.02n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5520.900.4318.42
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5484.000.4720.26
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/582017.100.31209.49
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/524713.801.02695.51
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/53827.722.35617.08
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/55989.740.2211.80
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.080.08
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a3.713.71
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a5.835.83
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a9.559.55
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a3.663.66
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a5.695.68
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a10.3110.31
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5405.600.02n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5488.180.02n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5352.620.02n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5363.720.03n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5370.440.03n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5453.880.02n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5417.620.02n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5538.180.02n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5374.100.02n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5502.780.02n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5594.280.01n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5612.820.01n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5324.600.03n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5406.100.02n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5614.580.03n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5659.400.02n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5369.480.03n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5437.000.02n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5585.880.03n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5407.600.03n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5408.520.02n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5480.320.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5338.500.03n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5376.800.03n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5424.340.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5501.660.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5347.240.03n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5411.580.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5350.980.02n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5377.460.28n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5378.200.29n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5475.260.22n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5413.960.20n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5318.640.25n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5492.260.16n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5441.720.02n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5474.460.02n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5538.780.02n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5362.780.02n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5421.920.02n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5406.860.03n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5809.200.02n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5370.760.03n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5353.400.03n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5457.980.32n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5514.280.26n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5457.420.30n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/510232.530.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/518599.830.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/523164.890.09n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5285.080.03n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5375.140.03n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5421.100.02n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5436.020.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5406.600.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5626.040.02n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]❌ 0/5461.721.28n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]❌ 0/51251.961.35n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]❌ 0/5426.881.40n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (-15.16%)0.03 (-25.53%)0.02 (-32.55%)0.02 (-16.99%)0.01 (-0.15%)549.80 (+20.46%)475.16 (+36.19%)522.90 (+48.26%)301.90 (+17.88%)101.57 (+38.81%)
9c70ba8 — 2026-06-29 16:29:160.05 (n/a)0.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)456.40 (n/a)348.90 (n/a)352.70 (n/a)256.10 (n/a)73.17 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (+37.85%)0.03 (+32.67%)0.02 (-9.91%)0.02 (+119.43%)0.01 (+32.73%)610.40 (-54.42%)430.98 (-31.61%)494.30 (+11.00%)248.00 (-27.44%)164.79 (-59.93%)
9c70ba8 — 2026-06-29 16:29:160.04 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)1339.30 (n/a)630.14 (n/a)445.30 (n/a)341.80 (n/a)411.31 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (+28.53%)0.02 (-8.34%)0.02 (-12.42%)0.01 (-10.15%)0.01 (+50.03%)2416.90 (+11.30%)1126.24 (+34.70%)581.80 (+14.19%)367.80 (-22.19%)939.87 (+25.82%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)2171.60 (n/a)836.08 (n/a)509.50 (n/a)472.70 (n/a)746.98 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (-18.43%)0.02 (+2.29%)0.02 (+15.27%)0.01 (+14.10%)0.00 (-27.96%)432.50 (-12.36%)274.20 (-6.42%)236.60 (-13.27%)227.60 (+22.56%)88.67 (-24.93%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)493.50 (n/a)293.02 (n/a)272.80 (n/a)185.70 (n/a)118.12 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (-25.29%)0.02 (+48.65%)0.02 (+93.61%)0.01 (+193.15%)0.00 (-66.66%)365.70 (-65.89%)270.22 (-51.51%)259.00 (-48.34%)220.90 (+33.80%)56.01 (-82.84%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)0.01 (n/a)1072.20 (n/a)557.28 (n/a)501.40 (n/a)165.10 (n/a)326.36 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-27.80%)0.01 (-41.95%)0.01 (-47.56%)0.00 (-86.54%)0.01 (+32.61%)1962.40 (+643.05%)671.48 (+208.73%)448.10 (+90.68%)191.10 (+38.48%)734.66 (+1335.72%)
9c70ba8 — 2026-06-29 16:29:160.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)264.10 (n/a)217.50 (n/a)235.00 (n/a)138.00 (n/a)51.17 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (-32.58%)0.01 (-47.26%)0.01 (-52.76%)0.01 (-65.55%)0.01 (-13.58%)971.70 (+190.23%)527.36 (+114.79%)474.50 (+111.74%)243.40 (+48.32%)272.84 (+264.81%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)334.80 (n/a)245.52 (n/a)224.10 (n/a)164.10 (n/a)74.79 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (+5.06%)0.02 (+9.68%)0.02 (+10.49%)0.01 (-11.41%)0.01 (+11.23%)673.60 (+12.87%)370.22 (-6.31%)276.80 (-9.48%)241.90 (-4.80%)183.51 (+13.72%)
9c70ba8 — 2026-06-29 16:29:160.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)596.80 (n/a)395.14 (n/a)305.80 (n/a)254.10 (n/a)161.38 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (+108.04%)0.01 (+88.07%)0.01 (+68.95%)0.01 (+204.03%)0.00 (+39.97%)610.00 (-67.11%)420.38 (-52.91%)377.40 (-40.80%)295.90 (-51.93%)118.75 (-77.99%)
9c70ba8 — 2026-06-29 16:29:160.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)0.00 (n/a)1854.70 (n/a)892.66 (n/a)637.50 (n/a)615.60 (n/a)539.55 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)404.40 (n/a)285.54 (n/a)269.30 (n/a)225.40 (n/a)72.63 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)349.80 (n/a)296.80 (n/a)287.10 (n/a)274.00 (n/a)31.15 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)394.60 (n/a)326.86 (n/a)333.20 (n/a)235.20 (n/a)61.29 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)386.60 (n/a)298.48 (n/a)244.80 (n/a)239.80 (n/a)77.53 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)342.40 (n/a)261.64 (n/a)244.70 (n/a)233.30 (n/a)45.41 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)659.20 (n/a)343.86 (n/a)283.80 (n/a)231.10 (n/a)178.14 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)569.80 (n/a)406.16 (n/a)438.10 (n/a)230.60 (n/a)154.18 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)262.40 (n/a)242.80 (n/a)253.40 (n/a)197.00 (n/a)27.09 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)547.60 (n/a)331.72 (n/a)251.50 (n/a)230.30 (n/a)139.30 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)270.10 (n/a)248.02 (n/a)239.60 (n/a)230.80 (n/a)16.41 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)298.80 (n/a)251.10 (n/a)239.70 (n/a)231.30 (n/a)28.24 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)606.50 (n/a)439.52 (n/a)477.90 (n/a)253.20 (n/a)154.68 (n/a)
iron/operators/gemm

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:310.52 (-19.49%)0.43 (+0.22%)0.44 (+16.23%)0.35 (+0.70%)0.06 (-49.20%)627.30 (-0.70%)520.90 (-3.46%)506.10 (-13.96%)425.00 (+24.20%)75.31 (-34.94%)22.20 (-19.49%)18.42 (+0.22%)18.65 (+16.23%)15.04 (+0.70%)2.66 (-49.20%)
9c70ba8 — 2026-06-29 16:29:160.65 (n/a)0.43 (n/a)0.38 (n/a)0.35 (n/a)0.12 (n/a)631.70 (n/a)539.58 (n/a)588.20 (n/a)342.20 (n/a)115.76 (n/a)27.58 (n/a)18.38 (n/a)16.04 (n/a)14.94 (n/a)5.23 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:310.58 (+5.78%)0.47 (+6.78%)0.47 (+9.10%)0.35 (-3.09%)0.10 (+22.88%)640.40 (+3.19%)484.00 (-5.29%)474.10 (-8.33%)384.30 (-5.48%)108.57 (+16.66%)24.55 (+5.78%)20.26 (+6.78%)19.91 (+9.10%)14.74 (-3.09%)4.32 (+22.88%)
9c70ba8 — 2026-06-29 16:29:160.54 (n/a)0.44 (n/a)0.43 (n/a)0.36 (n/a)0.08 (n/a)620.60 (n/a)511.02 (n/a)517.20 (n/a)406.60 (n/a)93.06 (n/a)23.21 (n/a)18.98 (n/a)18.25 (n/a)15.21 (n/a)3.51 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:310.31 (+0.16%)0.31 (+0.20%)0.31 (-0.33%)0.30 (+1.96%)0.00 (-35.30%)83056.30 (-1.93%)82017.10 (-0.21%)81943.00 (+0.33%)80852.90 (-0.16%)960.60 (-36.62%)212.48 (+0.16%)209.49 (+0.20%)209.66 (-0.33%)206.85 (+1.96%)2.45 (-35.30%)
9c70ba8 — 2026-06-29 16:29:160.31 (n/a)0.31 (n/a)0.31 (n/a)0.30 (n/a)0.01 (n/a)84687.50 (n/a)82190.72 (n/a)81670.10 (n/a)80980.40 (n/a)1515.67 (n/a)212.15 (n/a)209.08 (n/a)210.36 (n/a)202.86 (n/a)3.79 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:311.06 (+4.06%)1.02 (+3.68%)1.02 (+0.60%)0.99 (+9.28%)0.03 (-45.38%)25343.00 (-8.49%)24713.80 (-3.69%)24779.80 (-0.60%)23695.70 (-3.90%)618.13 (-52.12%)725.02 (+4.06%)695.51 (+3.68%)693.30 (+0.60%)677.89 (+9.28%)17.77 (-45.38%)
9c70ba8 — 2026-06-29 16:29:161.02 (n/a)0.98 (n/a)1.01 (n/a)0.91 (n/a)0.05 (n/a)27693.90 (n/a)25659.76 (n/a)24928.50 (n/a)24658.50 (n/a)1291.06 (n/a)696.71 (n/a)670.84 (n/a)689.16 (n/a)620.35 (n/a)32.54 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:313.49 (+53.56%)2.35 (+35.57%)2.08 (+29.24%)1.42 (+6.90%)0.88 (+114.49%)5660.50 (-6.46%)3827.72 (-21.03%)3875.40 (-22.62%)2311.30 (-34.88%)1386.09 (+27.69%)914.60 (+53.56%)617.08 (+35.57%)545.47 (+29.24%)373.45 (+6.90%)229.60 (+114.49%)
9c70ba8 — 2026-06-29 16:29:162.27 (n/a)1.74 (n/a)1.61 (n/a)1.33 (n/a)0.41 (n/a)6051.30 (n/a)4847.08 (n/a)5008.50 (n/a)3549.20 (n/a)1085.54 (n/a)595.60 (n/a)455.16 (n/a)422.07 (n/a)349.34 (n/a)107.05 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:310.30 (+4.69%)0.22 (-0.64%)0.20 (-5.14%)0.17 (-7.09%)0.06 (+35.11%)7520.20 (+7.63%)5989.74 (+3.16%)6348.80 (+5.41%)4109.30 (-4.48%)1446.36 (+42.02%)16.33 (+4.69%)11.80 (-0.64%)10.57 (-5.14%)8.92 (-7.09%)3.13 (+35.11%)
9c70ba8 — 2026-06-29 16:29:160.29 (n/a)0.22 (n/a)0.21 (n/a)0.18 (n/a)0.04 (n/a)6986.90 (n/a)5806.54 (n/a)6022.80 (n/a)4302.00 (n/a)1018.41 (n/a)15.60 (n/a)11.88 (n/a)11.14 (n/a)9.60 (n/a)2.32 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:310.13 (-0.84%)0.08 (+13.28%)0.10 (+40.92%)0.01 (-66.63%)0.05 (+33.70%)0.13 (-0.84%)0.08 (+13.28%)0.10 (+40.92%)0.01 (-66.63%)0.04 (+33.70%)
9c70ba8 — 2026-06-29 16:29:160.13 (n/a)0.07 (n/a)0.07 (n/a)0.04 (n/a)0.03 (n/a)0.13 (n/a)0.07 (n/a)0.07 (n/a)0.04 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:313.93 (+4.75%)3.71 (+3.15%)3.74 (+1.05%)3.35 (-1.04%)0.22 (+23.90%)3.93 (+4.75%)3.71 (+3.15%)3.74 (+1.05%)3.34 (-1.04%)0.22 (+23.90%)
9c70ba8 — 2026-06-29 16:29:163.76 (n/a)3.59 (n/a)3.70 (n/a)3.38 (n/a)0.18 (n/a)3.75 (n/a)3.59 (n/a)3.70 (n/a)3.38 (n/a)0.18 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:317.02 (-4.48%)5.83 (-6.49%)5.61 (-4.41%)4.48 (-19.80%)1.15 (+45.70%)7.02 (-4.48%)5.83 (-6.49%)5.60 (-4.41%)4.48 (-19.80%)1.15 (+45.70%)
9c70ba8 — 2026-06-29 16:29:167.35 (n/a)6.24 (n/a)5.87 (n/a)5.59 (n/a)0.79 (n/a)7.35 (n/a)6.23 (n/a)5.86 (n/a)5.59 (n/a)0.79 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:3113.54 (+6.05%)9.55 (+1.13%)9.37 (+12.07%)7.49 (-7.03%)2.43 (+21.06%)13.54 (+6.05%)9.55 (+1.13%)9.37 (+12.07%)7.48 (-7.03%)2.43 (+21.06%)
9c70ba8 — 2026-06-29 16:29:1612.77 (n/a)9.45 (n/a)8.36 (n/a)8.06 (n/a)2.01 (n/a)12.76 (n/a)9.44 (n/a)8.36 (n/a)8.05 (n/a)2.01 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:313.99 (+4.39%)3.66 (-1.68%)3.60 (-5.37%)3.38 (-1.40%)0.29 (+73.51%)3.99 (+4.39%)3.66 (-1.68%)3.60 (-5.37%)3.38 (-1.40%)0.29 (+73.51%)
9c70ba8 — 2026-06-29 16:29:163.82 (n/a)3.72 (n/a)3.81 (n/a)3.43 (n/a)0.17 (n/a)3.82 (n/a)3.72 (n/a)3.81 (n/a)3.43 (n/a)0.17 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:316.97 (+3.50%)5.69 (-9.87%)5.31 (-20.97%)4.60 (-18.13%)1.06 (+85.93%)6.97 (+3.50%)5.68 (-9.87%)5.30 (-20.97%)4.60 (-18.13%)1.06 (+85.93%)
9c70ba8 — 2026-06-29 16:29:166.74 (n/a)6.31 (n/a)6.71 (n/a)5.62 (n/a)0.57 (n/a)6.73 (n/a)6.30 (n/a)6.71 (n/a)5.62 (n/a)0.57 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
b1939ea — 2026-06-29 17:44:3113.77 (-1.93%)10.31 (-1.87%)8.58 (+0.73%)7.48 (-7.71%)3.08 (+4.36%)13.76 (-1.93%)10.31 (-1.87%)8.57 (+0.73%)7.48 (-7.71%)3.08 (+4.36%)
9c70ba8 — 2026-06-29 16:29:1614.04 (n/a)10.51 (n/a)8.52 (n/a)8.11 (n/a)2.95 (n/a)14.03 (n/a)10.50 (n/a)8.51 (n/a)8.10 (n/a)2.95 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)628.30 (n/a)405.60 (n/a)407.20 (n/a)236.40 (n/a)159.84 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)630.00 (n/a)488.18 (n/a)508.90 (n/a)299.70 (n/a)119.57 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (n/a)0.02 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)467.80 (n/a)352.62 (n/a)314.30 (n/a)284.40 (n/a)82.26 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)527.40 (n/a)363.72 (n/a)386.30 (n/a)179.00 (n/a)138.87 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)631.80 (n/a)370.44 (n/a)275.30 (n/a)237.80 (n/a)165.39 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)536.30 (n/a)453.88 (n/a)462.30 (n/a)368.40 (n/a)81.99 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (+10.41%)0.02 (+1.95%)0.02 (-13.41%)0.01 (-16.65%)0.01 (+26.26%)796.50 (+19.97%)417.62 (+4.46%)372.00 (+15.49%)266.80 (-9.41%)218.99 (+39.73%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)663.90 (n/a)399.80 (n/a)322.10 (n/a)294.50 (n/a)156.72 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-22.82%)0.02 (-19.04%)0.02 (-33.63%)0.01 (+75.68%)0.01 (-34.52%)1070.40 (-43.08%)538.18 (-14.13%)443.50 (+50.70%)305.90 (+29.56%)314.38 (-55.47%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.03 (n/a)0.00 (n/a)0.01 (n/a)1880.50 (n/a)626.76 (n/a)294.30 (n/a)236.10 (n/a)705.97 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (+9.54%)0.02 (+34.37%)0.02 (+61.63%)0.02 (+259.60%)0.01 (-27.85%)541.40 (-72.19%)374.10 (-50.62%)359.20 (-38.12%)243.10 (-8.71%)131.73 (-80.99%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)0.01 (n/a)1946.90 (n/a)757.54 (n/a)580.50 (n/a)266.30 (n/a)692.98 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (-31.81%)0.02 (-16.11%)0.02 (-9.09%)0.01 (-11.40%)0.00 (-56.45%)589.30 (+12.87%)502.78 (+15.08%)523.70 (+10.02%)401.70 (+46.61%)72.03 (-25.40%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)522.10 (n/a)436.90 (n/a)476.00 (n/a)274.00 (n/a)96.55 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (-42.01%)0.01 (-25.62%)0.02 (-7.97%)0.01 (-33.48%)0.00 (-50.79%)939.00 (+50.31%)594.28 (+29.59%)516.00 (+8.65%)402.70 (+72.46%)208.52 (+37.08%)
9c70ba8 — 2026-06-29 16:29:160.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)624.70 (n/a)458.58 (n/a)474.90 (n/a)233.50 (n/a)152.11 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.02 (-28.54%)0.01 (-12.60%)0.02 (+16.84%)0.01 (-40.73%)0.00 (-15.89%)1073.90 (+68.75%)612.82 (+19.75%)480.00 (-14.41%)433.10 (+39.94%)268.63 (+105.11%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)636.40 (n/a)511.74 (n/a)560.80 (n/a)309.50 (n/a)130.97 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-25.48%)0.03 (-3.05%)0.03 (+19.70%)0.00 (-73.95%)0.01 (+2.93%)1878.80 (+283.90%)585.88 (+67.81%)270.40 (-16.47%)236.90 (+34.14%)722.98 (+470.64%)
9c70ba8 — 2026-06-29 16:29:160.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)489.40 (n/a)349.14 (n/a)323.70 (n/a)176.60 (n/a)126.69 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (+40.06%)0.03 (+19.00%)0.03 (+12.95%)0.02 (-5.43%)0.01 (+197.08%)542.00 (+5.74%)407.60 (-9.91%)395.60 (-11.46%)270.80 (-28.61%)127.86 (+126.38%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)512.60 (n/a)452.44 (n/a)446.80 (n/a)379.30 (n/a)56.48 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-5.39%)0.02 (+7.01%)0.03 (+36.32%)0.01 (-3.53%)0.01 (+1.36%)622.50 (+3.65%)408.52 (-4.98%)298.20 (-26.66%)250.00 (+5.71%)181.06 (+9.62%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)600.60 (n/a)429.92 (n/a)406.60 (n/a)236.50 (n/a)165.18 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-23.58%)0.02 (-19.40%)0.02 (-15.05%)0.02 (+8.53%)0.00 (-53.88%)584.10 (-7.86%)480.32 (+15.42%)489.40 (+17.73%)359.90 (+30.83%)88.28 (-41.04%)
9c70ba8 — 2026-06-29 16:29:160.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)633.90 (n/a)416.14 (n/a)415.70 (n/a)275.10 (n/a)149.72 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-5.76%)0.03 (+15.81%)0.02 (+35.14%)0.02 (+14.32%)0.01 (-17.71%)495.20 (-12.52%)338.50 (-17.67%)328.00 (-25.99%)239.70 (+6.11%)106.44 (-27.54%)
9c70ba8 — 2026-06-29 16:29:160.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)566.10 (n/a)411.16 (n/a)443.20 (n/a)225.90 (n/a)146.90 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.05 (+5.59%)0.03 (+29.02%)0.03 (+45.17%)0.02 (+6.13%)0.01 (+10.14%)582.20 (-5.79%)376.80 (-20.94%)355.50 (-31.12%)201.40 (-5.31%)166.06 (+4.91%)
9c70ba8 — 2026-06-29 16:29:160.05 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)618.00 (n/a)476.62 (n/a)516.10 (n/a)212.70 (n/a)158.28 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-25.14%)0.02 (-11.07%)0.02 (-5.27%)0.01 (+38.36%)0.01 (-53.81%)549.60 (-27.73%)424.34 (-3.62%)437.30 (+5.55%)287.90 (+33.60%)96.22 (-55.89%)
9c70ba8 — 2026-06-29 16:29:160.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)760.50 (n/a)440.30 (n/a)414.30 (n/a)215.50 (n/a)218.15 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (-17.55%)0.02 (-4.52%)0.02 (-0.04%)0.02 (+17.34%)0.01 (-33.46%)579.20 (-14.79%)501.66 (-1.47%)551.30 (+0.04%)295.00 (+21.30%)117.57 (-26.92%)
9c70ba8 — 2026-06-29 16:29:160.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)679.70 (n/a)509.14 (n/a)551.10 (n/a)243.20 (n/a)160.89 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (+14.39%)0.03 (+15.57%)0.03 (+18.31%)0.02 (+15.76%)0.01 (+11.12%)482.30 (-13.61%)347.24 (-13.94%)327.50 (-15.46%)240.80 (-12.60%)111.49 (-15.79%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)558.30 (n/a)403.50 (n/a)387.40 (n/a)275.50 (n/a)132.40 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.04 (+46.02%)0.02 (+11.25%)0.02 (-25.02%)0.02 (+21.53%)0.01 (+87.67%)517.30 (-17.72%)411.58 (-5.12%)501.50 (+33.38%)234.60 (-31.52%)134.05 (+11.48%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)628.70 (n/a)433.80 (n/a)376.00 (n/a)342.60 (n/a)120.25 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.03 (+21.14%)0.02 (+34.55%)0.02 (+42.31%)0.02 (+37.09%)0.01 (-1.09%)417.30 (-27.06%)350.98 (-27.43%)364.10 (-29.74%)241.10 (-17.46%)65.46 (-40.37%)
9c70ba8 — 2026-06-29 16:29:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)572.10 (n/a)483.64 (n/a)518.20 (n/a)292.10 (n/a)109.77 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.43 (+12.81%)0.28 (+7.13%)0.25 (-18.96%)0.20 (+313.92%)0.09 (-33.59%)493.60 (-75.84%)377.46 (-44.26%)387.50 (+23.37%)227.60 (-11.37%)109.23 (-85.83%)
9c70ba8 — 2026-06-29 16:29:160.38 (n/a)0.26 (n/a)0.31 (n/a)0.05 (n/a)0.14 (n/a)2043.20 (n/a)677.16 (n/a)314.10 (n/a)256.80 (n/a)770.83 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.42 (-17.27%)0.29 (-23.43%)0.28 (-28.85%)0.18 (-7.33%)0.10 (-15.22%)542.60 (+7.92%)378.20 (+29.25%)347.70 (+40.54%)234.60 (+20.87%)126.79 (+4.58%)
9c70ba8 — 2026-06-29 16:29:160.51 (n/a)0.37 (n/a)0.40 (n/a)0.20 (n/a)0.11 (n/a)502.80 (n/a)292.60 (n/a)247.40 (n/a)194.10 (n/a)121.24 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.33 (-25.59%)0.22 (-26.66%)0.19 (-42.26%)0.16 (-1.83%)0.07 (-44.88%)603.10 (+1.88%)475.26 (+24.41%)523.60 (+73.21%)297.80 (+34.39%)121.91 (-28.82%)
9c70ba8 — 2026-06-29 16:29:160.44 (n/a)0.30 (n/a)0.33 (n/a)0.17 (n/a)0.12 (n/a)592.00 (n/a)382.00 (n/a)302.30 (n/a)221.60 (n/a)171.27 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.27 (-4.51%)0.20 (-9.98%)0.21 (-14.63%)0.12 (-15.78%)0.06 (-10.57%)625.90 (+18.74%)413.96 (+10.77%)358.80 (+17.14%)272.80 (+4.72%)148.93 (+10.12%)
9c70ba8 — 2026-06-29 16:29:160.28 (n/a)0.22 (n/a)0.24 (n/a)0.14 (n/a)0.07 (n/a)527.10 (n/a)373.70 (n/a)306.30 (n/a)260.50 (n/a)135.24 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.30 (-5.18%)0.25 (+5.04%)0.27 (-8.14%)0.15 (+14.12%)0.06 (-37.99%)497.40 (-12.37%)318.64 (-14.21%)274.00 (+8.86%)242.90 (+5.47%)102.31 (-41.81%)
9c70ba8 — 2026-06-29 16:29:160.32 (n/a)0.23 (n/a)0.29 (n/a)0.13 (n/a)0.10 (n/a)567.60 (n/a)371.40 (n/a)251.70 (n/a)230.30 (n/a)175.84 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.25 (-17.54%)0.16 (-18.36%)0.16 (-25.28%)0.11 (-9.75%)0.06 (-25.74%)663.10 (+10.81%)492.26 (+18.47%)474.20 (+33.84%)299.60 (+21.30%)159.98 (-2.59%)
9c70ba8 — 2026-06-29 16:29:160.30 (n/a)0.20 (n/a)0.21 (n/a)0.12 (n/a)0.08 (n/a)598.40 (n/a)415.52 (n/a)354.30 (n/a)247.00 (n/a)164.24 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.56 (+1.19%)0.32 (-13.19%)0.26 (-18.96%)0.23 (+4.02%)0.14 (-0.32%)560.50 (-3.86%)457.98 (+14.35%)504.20 (+23.40%)234.10 (-1.18%)131.49 (-7.97%)
9c70ba8 — 2026-06-29 16:29:160.55 (n/a)0.37 (n/a)0.32 (n/a)0.22 (n/a)0.14 (n/a)583.00 (n/a)400.52 (n/a)408.60 (n/a)236.90 (n/a)142.88 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.37 (-24.78%)0.26 (-24.45%)0.24 (-17.00%)0.21 (-7.54%)0.06 (-49.08%)617.20 (+8.15%)514.28 (+24.93%)553.10 (+20.47%)354.30 (+32.95%)102.40 (-24.09%)
9c70ba8 — 2026-06-29 16:29:160.49 (n/a)0.35 (n/a)0.29 (n/a)0.23 (n/a)0.12 (n/a)570.70 (n/a)411.66 (n/a)459.10 (n/a)266.50 (n/a)134.91 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.42 (+20.99%)0.30 (+1.40%)0.27 (-1.12%)0.24 (-0.88%)0.07 (+66.95%)543.00 (+0.89%)457.42 (+0.65%)480.40 (+1.14%)314.50 (-17.35%)88.10 (+36.95%)
9c70ba8 — 2026-06-29 16:29:160.34 (n/a)0.29 (n/a)0.28 (n/a)0.24 (n/a)0.04 (n/a)538.20 (n/a)454.48 (n/a)475.00 (n/a)380.50 (n/a)64.33 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.00 (+0.00%)0.00 (+31.58%)0.00 (+200.00%)0.00 (+0.00%)0.00 (-5.81%)19086.94 (-14.52%)10232.53 (-33.89%)7113.04 (-66.02%)5683.33 (-0.60%)5756.55 (-30.69%)
9c70ba8 — 2026-06-29 16:29:160.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)22329.93 (n/a)15477.94 (n/a)20933.44 (n/a)5717.79 (n/a)8304.94 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.00 (-54.55%)0.00 (-36.11%)0.00 (-16.67%)0.00 (+0.00%)0.00 (-82.41%)22327.14 (+5.32%)18599.83 (+36.72%)16450.10 (+15.81%)15878.94 (+110.76%)3345.65 (-43.67%)
9c70ba8 — 2026-06-29 16:29:160.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)21199.65 (n/a)13604.77 (n/a)14204.05 (n/a)7534.07 (n/a)5939.55 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:310.13 (-3.36%)0.09 (-17.21%)0.09 (-28.05%)0.08 (-6.64%)0.02 (-10.76%)26658.49 (+7.16%)23164.89 (+20.11%)24060.16 (+39.10%)15829.92 (+3.51%)4254.76 (-4.95%)
9c70ba8 — 2026-06-29 16:29:160.14 (n/a)0.11 (n/a)0.12 (n/a)0.08 (n/a)0.02 (n/a)24876.33 (n/a)19286.40 (n/a)17297.26 (n/a)15293.44 (n/a)4476.13 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:312.01 (+38.96%)1.28 (+27.68%)1.07 (+19.95%)0.78 (-5.84%)0.51 (+98.53%)669.40 (+6.19%)461.72 (-15.49%)491.70 (-16.63%)261.10 (-28.05%)165.94 (+52.23%)
9c70ba8 — 2026-06-29 16:29:161.44 (n/a)1.00 (n/a)0.89 (n/a)0.83 (n/a)0.26 (n/a)630.40 (n/a)546.34 (n/a)589.80 (n/a)362.90 (n/a)109.01 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:312.31 (-5.59%)1.35 (+2.51%)1.49 (-3.59%)0.31 (+5.68%)0.76 (-22.83%)3374.70 (-5.38%)1251.96 (-28.07%)702.10 (+3.74%)454.70 (+5.94%)1211.58 (-26.26%)
9c70ba8 — 2026-06-29 16:29:162.44 (n/a)1.32 (n/a)1.55 (n/a)0.29 (n/a)0.99 (n/a)3566.50 (n/a)1740.52 (n/a)676.80 (n/a)429.20 (n/a)1643.15 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-25 20:01:371.58 (-2.67%)1.21 (+5.65%)1.29 (+41.89%)0.66 (-20.05%)0.38 (-2.53%)792.90 (+25.08%)479.12 (-3.80%)406.80 (-29.52%)331.10 (+2.76%)189.18 (+26.62%)
4bb8427 — 2026-06-23 22:46:491.63 (n/a)1.14 (n/a)0.91 (n/a)0.83 (n/a)0.38 (n/a)633.90 (n/a)498.02 (n/a)577.20 (n/a)322.20 (n/a)149.40 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
b1939ea — 2026-06-29 17:44:312.22 (+73.02%)1.40 (+35.33%)1.43 (+34.15%)0.77 (+0.98%)0.56 (+192.05%)683.20 (-0.97%)426.88 (-18.16%)366.50 (-25.46%)235.90 (-42.20%)174.56 (+65.98%)
9c70ba8 — 2026-06-29 16:29:161.28 (n/a)1.04 (n/a)1.07 (n/a)0.76 (n/a)0.19 (n/a)689.90 (n/a)521.62 (n/a)491.70 (n/a)408.10 (n/a)105.17 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-25 20:01:371.75 (-15.44%)1.32 (-0.50%)1.36 (+22.47%)0.77 (-19.20%)0.36 (-21.75%)678.10 (+23.76%)430.02 (-0.09%)385.10 (-18.34%)300.10 (+18.29%)146.96 (+19.46%)
4bb8427 — 2026-06-23 22:46:492.07 (n/a)1.32 (n/a)1.11 (n/a)0.96 (n/a)0.46 (n/a)547.90 (n/a)430.40 (n/a)471.60 (n/a)253.70 (n/a)123.02 (n/a)
Phoenix - Examples

IRON

Tested on 2026_06_29_17_40_30 at commit b1939ea.

Trends:

IRON Trends

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants