Skip to content

perf: numba based aggregations for sparse data#4062

Merged
ilan-gold merged 24 commits into
mainfrom
ig/numba_agg_main
Apr 16, 2026
Merged

perf: numba based aggregations for sparse data#4062
ilan-gold merged 24 commits into
mainfrom
ig/numba_agg_main

style

83c0668
Select commit
Loading
Failed to load commit list.
scverse-benchmark / benchmark succeeded Apr 16, 2026 in 1h 26m 27s

Benchmark

Benchmark run successful

Details

All benchmarks:

Change Before [87dc1ec] After [83c0668] Ratio Benchmark (Parameter)
5.48G 5.14G 0.94 preprocessing_counts.Agg.peakmem_agg('count_nonzero')
- 4.91G 4.05G 0.82 preprocessing_counts.Agg.peakmem_agg('mean')
4.41G 4.41G 1.00 preprocessing_counts.Agg.peakmem_agg('median')
- 4.91G 4.05G 0.82 preprocessing_counts.Agg.peakmem_agg('sum')
- 5.83G 4.06G 0.70 preprocessing_counts.Agg.peakmem_agg('var')
- 887±5ms 639±0.8ms 0.72 preprocessing_counts.Agg.time_agg('count_nonzero')
- 548±0.4ms 90.5±0.8ms 0.17 preprocessing_counts.Agg.time_agg('mean')
3.30±0.01s 3.31±0.03s 1.00 preprocessing_counts.Agg.time_agg('median')
- 550±0.7ms 88.4±1ms 0.16 preprocessing_counts.Agg.time_agg('sum')
- 1.37±0.01s 134±5ms 0.10 preprocessing_counts.Agg.time_agg('var')
311M 311M 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('bmmc', 'counts')
312M 312M 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('bmmc', 'counts-off-axis')
4.11G 4.11G 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('lung93k', 'counts')
4.11G 4.11G 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('lung93k', 'counts-off-axis')
378M 378M 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('pbmc3k', 'counts')
378M 378M 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('pbmc3k', 'counts-off-axis')
288M 288M 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced', 'counts')
289M 289M 1.00 preprocessing_counts.FastSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced', 'counts-off-axis')
314M 314M 1.00 preprocessing_counts.FastSuite.peakmem_log1p('bmmc', 'counts')
314M 314M 1.00 preprocessing_counts.FastSuite.peakmem_log1p('bmmc', 'counts-off-axis')
4.45G 4.45G 1.00 preprocessing_counts.FastSuite.peakmem_log1p('lung93k', 'counts')
4.45G 4.45G 1.00 preprocessing_counts.FastSuite.peakmem_log1p('lung93k', 'counts-off-axis')
383M 384M 1.00 preprocessing_counts.FastSuite.peakmem_log1p('pbmc3k', 'counts')
384M 385M 1.00 preprocessing_counts.FastSuite.peakmem_log1p('pbmc3k', 'counts-off-axis')
288M 288M 1.00 preprocessing_counts.FastSuite.peakmem_log1p('pbmc68k_reduced', 'counts')
288M 288M 1.00 preprocessing_counts.FastSuite.peakmem_log1p('pbmc68k_reduced', 'counts-off-axis')
407M 402M 0.99 preprocessing_counts.FastSuite.peakmem_normalize_total('bmmc', 'counts')
402M 401M 1.00 preprocessing_counts.FastSuite.peakmem_normalize_total('bmmc', 'counts-off-axis')
4.97G 4.96G 1.00 preprocessing_counts.FastSuite.peakmem_normalize_total('lung93k', 'counts')
4.97G 4.96G 1.00 preprocessing_counts.FastSuite.peakmem_normalize_total('lung93k', 'counts-off-axis')
474M 473M 1.00 preprocessing_counts.FastSuite.peakmem_normalize_total('pbmc3k', 'counts')
473M 472M 1.00 preprocessing_counts.FastSuite.peakmem_normalize_total('pbmc3k', 'counts-off-axis')
288M 288M 1.00 preprocessing_counts.FastSuite.peakmem_normalize_total('pbmc68k_reduced', 'counts')
288M 288M 1.00 preprocessing_counts.FastSuite.peakmem_normalize_total('pbmc68k_reduced', 'counts-off-axis')
12.7±0.3ms 12.7±0.3ms 1.00 preprocessing_counts.FastSuite.time_calculate_qc_metrics('bmmc', 'counts')
12.6±0.3ms 12.6±0.2ms 1.00 preprocessing_counts.FastSuite.time_calculate_qc_metrics('bmmc', 'counts-off-axis')
2.05±0s 2.06±0.01s 1.00 preprocessing_counts.FastSuite.time_calculate_qc_metrics('lung93k', 'counts')
1.63±0s 1.63±0s 1.00 preprocessing_counts.FastSuite.time_calculate_qc_metrics('lung93k', 'counts-off-axis')
38.2±0.1ms 38.1±0.5ms 1.00 preprocessing_counts.FastSuite.time_calculate_qc_metrics('pbmc3k', 'counts')
28.9±0.9ms 27.6±0.7ms 0.96 preprocessing_counts.FastSuite.time_calculate_qc_metrics('pbmc3k', 'counts-off-axis')
4.63±0.02ms 4.67±0.06ms 1.01 preprocessing_counts.FastSuite.time_calculate_qc_metrics('pbmc68k_reduced', 'counts')
4.72±0.03ms 4.71±0.06ms 1.00 preprocessing_counts.FastSuite.time_calculate_qc_metrics('pbmc68k_reduced', 'counts-off-axis')
1.53±0.02ms 1.55±0.02ms 1.01 preprocessing_counts.FastSuite.time_log1p('bmmc', 'counts')
1.60±0.02ms 1.59±0.03ms 1.00 preprocessing_counts.FastSuite.time_log1p('bmmc', 'counts-off-axis')
647±10ms 632±3ms 0.98 preprocessing_counts.FastSuite.time_log1p('lung93k', 'counts')
633±6ms 635±1ms 1.00 preprocessing_counts.FastSuite.time_log1p('lung93k', 'counts-off-axis')
7.17±0.03ms 7.36±0.5ms 1.03 preprocessing_counts.FastSuite.time_log1p('pbmc3k', 'counts')
7.25±0.4ms 7.30±0.1ms 1.01 preprocessing_counts.FastSuite.time_log1p('pbmc3k', 'counts-off-axis')
386±3μs 395±20μs 1.02 preprocessing_counts.FastSuite.time_log1p('pbmc68k_reduced', 'counts')
388±5μs 385±8μs 0.99 preprocessing_counts.FastSuite.time_log1p('pbmc68k_reduced', 'counts-off-axis')
2.82±0.5ms 2.63±0.01ms 0.93 preprocessing_counts.FastSuite.time_normalize_total('bmmc', 'counts')
6.74±0.7ms 7.05±0.4ms 1.05 preprocessing_counts.FastSuite.time_normalize_total('bmmc', 'counts-off-axis')
574±7ms 542±4ms 0.94 preprocessing_counts.FastSuite.time_normalize_total('lung93k', 'counts')
2.73±0.01s 2.75±0.04s 1.01 preprocessing_counts.FastSuite.time_normalize_total('lung93k', 'counts-off-axis')
8.67±0.5ms 8.59±0.2ms 0.99 preprocessing_counts.FastSuite.time_normalize_total('pbmc3k', 'counts')
33.0±1ms 34.6±1ms 1.05 preprocessing_counts.FastSuite.time_normalize_total('pbmc3k', 'counts-off-axis')
549±2μs 546±2μs 0.99 preprocessing_counts.FastSuite.time_normalize_total('pbmc68k_reduced', 'counts')
546±1μs 547±2μs 1.00 preprocessing_counts.FastSuite.time_normalize_total('pbmc68k_reduced', 'counts-off-axis')
399M 399M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_per_cell('pbmc3k', 'random_state')
399M 399M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_per_cell('pbmc3k', 'rng')
336M 336M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_per_cell('pbmc68k_reduced', 'random_state')
335M 336M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_per_cell('pbmc68k_reduced', 'rng')
468M 468M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_total('pbmc3k', 'random_state')
432M 432M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_total('pbmc3k', 'rng')
341M 341M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_total('pbmc68k_reduced', 'random_state')
338M 339M 1.00 preprocessing_counts.PreprocessingCountsRngSuite.peakmem_downsample_total('pbmc68k_reduced', 'rng')
206±0.5ms 206±0.7ms 1.00 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc3k', 'random_state')
73.4±0.6ms 74.2±1ms 1.01 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc3k', 'rng')
25.1±0.07ms 25.3±0.2ms 1.01 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc68k_reduced', 'random_state')
16.6±0.07ms 16.4±0.1ms 0.99 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc68k_reduced', 'rng')
187±2ms 192±9ms 1.03 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc3k', 'random_state')
62.9±0.1ms 63.1±3ms 1.00 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc3k', 'rng')
12.4±0.5ms 12.3±0.07ms 0.99 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc68k_reduced', 'random_state')
7.07±0.3ms 7.10±0.07ms 1.00 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc68k_reduced', 'rng')
431M 431M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_cells('pbmc3k', 'counts')
431M 431M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_cells('pbmc3k', 'counts-off-axis')
299M 300M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_cells('pbmc68k_reduced', 'counts')
300M 300M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_cells('pbmc68k_reduced', 'counts-off-axis')
431M 431M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_genes('pbmc3k', 'counts')
432M 431M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_genes('pbmc3k', 'counts-off-axis')
299M 299M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_genes('pbmc68k_reduced', 'counts')
299M 300M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_filter_genes('pbmc68k_reduced', 'counts-off-axis')
1.11G 1.11G 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_scrublet('pbmc3k', 'counts')
1.11G 1.11G 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_scrublet('pbmc3k', 'counts-off-axis')
530M 528M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_scrublet('pbmc68k_reduced', 'counts')
527M 528M 1.00 preprocessing_counts.PreprocessingCountsSuite.peakmem_scrublet('pbmc68k_reduced', 'counts-off-axis')
56.2±4ms 56.5±0.6ms 1.01 preprocessing_counts.PreprocessingCountsSuite.time_filter_cells('pbmc3k', 'counts')
58.9±0.8ms 59.0±0.6ms 1.00 preprocessing_counts.PreprocessingCountsSuite.time_filter_cells('pbmc3k', 'counts-off-axis')
10.2±1ms 10.4±0.9ms 1.02 preprocessing_counts.PreprocessingCountsSuite.time_filter_cells('pbmc68k_reduced', 'counts')
9.43±0.7ms 10.2±0.7ms 1.08 preprocessing_counts.PreprocessingCountsSuite.time_filter_cells('pbmc68k_reduced', 'counts-off-axis')
52.3±0.7ms 52.1±0.6ms 1.00 preprocessing_counts.PreprocessingCountsSuite.time_filter_genes('pbmc3k', 'counts')
51.2±0.3ms 51.1±0.2ms 1.00 preprocessing_counts.PreprocessingCountsSuite.time_filter_genes('pbmc3k', 'counts-off-axis')
10.7±0.8ms 10.6±0.9ms 0.99 preprocessing_counts.PreprocessingCountsSuite.time_filter_genes('pbmc68k_reduced', 'counts')
10.4±0.8ms 10.6±0.9ms 1.02 preprocessing_counts.PreprocessingCountsSuite.time_filter_genes('pbmc68k_reduced', 'counts-off-axis')
2.78±0.2s 5.70±2s ~2.05 preprocessing_counts.PreprocessingCountsSuite.time_scrublet('pbmc3k', 'counts')
5.53±2s 2.44±0.02s ~0.44 preprocessing_counts.PreprocessingCountsSuite.time_scrublet('pbmc3k', 'counts-off-axis')
560±4ms 555±8ms 0.99 preprocessing_counts.PreprocessingCountsSuite.time_scrublet('pbmc68k_reduced', 'counts')
556±10ms 561±10ms 1.01 preprocessing_counts.PreprocessingCountsSuite.time_scrublet('pbmc68k_reduced', 'counts-off-axis')
440M 440M 1.00 preprocessing_log.PreprocessingSuite.peakmem_highly_variable_genes('pbmc3k', 'off-axis')
492M 491M 1.00 preprocessing_log.PreprocessingSuite.peakmem_highly_variable_genes('pbmc3k', None)
295M 295M 1.00 preprocessing_log.PreprocessingSuite.peakmem_highly_variable_genes('pbmc68k_reduced', 'off-axis')
297M 298M 1.00 preprocessing_log.PreprocessingSuite.peakmem_highly_variable_genes('pbmc68k_reduced', None)
573M 571M 1.00 preprocessing_log.PreprocessingSuite.peakmem_pca('pbmc3k', 'off-axis')
588M 590M 1.00 preprocessing_log.PreprocessingSuite.peakmem_pca('pbmc3k', None)
492M 492M 1.00 preprocessing_log.PreprocessingSuite.peakmem_pca('pbmc68k_reduced', 'off-axis')
494M 499M 1.01 preprocessing_log.PreprocessingSuite.peakmem_pca('pbmc68k_reduced', None)
n/a n/a n/a preprocessing_log.PreprocessingSuite.peakmem_regress_out('pbmc3k', 'off-axis')
n/a n/a n/a preprocessing_log.PreprocessingSuite.peakmem_regress_out('pbmc3k', None)
351M 349M 0.99 preprocessing_log.PreprocessingSuite.peakmem_regress_out('pbmc68k_reduced', 'off-axis')
353M 353M 1.00 preprocessing_log.PreprocessingSuite.peakmem_regress_out('pbmc68k_reduced', None)
1.3G 1.3G 1.00 preprocessing_log.PreprocessingSuite.peakmem_scale('pbmc3k', 'off-axis')
1.5G 1.5G 1.00 preprocessing_log.PreprocessingSuite.peakmem_scale('pbmc3k', None)
341M 342M 1.00 preprocessing_log.PreprocessingSuite.peakmem_scale('pbmc68k_reduced', 'off-axis')
339M 343M 1.01 preprocessing_log.PreprocessingSuite.peakmem_scale('pbmc68k_reduced', None)
34.7±0.6ms 34.5±0.7ms 1.00 preprocessing_log.PreprocessingSuite.time_highly_variable_genes('pbmc3k', 'off-axis')
40.1±0.8ms 40.1±4ms 1.00 preprocessing_log.PreprocessingSuite.time_highly_variable_genes('pbmc3k', None)
16.8±0.2ms 16.7±0.06ms 0.99 preprocessing_log.PreprocessingSuite.time_highly_variable_genes('pbmc68k_reduced', 'off-axis')
16.7±0.3ms 16.7±0.3ms 1.00 preprocessing_log.PreprocessingSuite.time_highly_variable_genes('pbmc68k_reduced', None)
1.94±0.02s 1.95±0.01s 1.01 preprocessing_log.PreprocessingSuite.time_pca('pbmc3k', 'off-axis')
2.15±0.01s 2.15±0.01s 1.00 preprocessing_log.PreprocessingSuite.time_pca('pbmc3k', None)
164±30ms 158±20ms 0.96 preprocessing_log.PreprocessingSuite.time_pca('pbmc68k_reduced', 'off-axis')
88.6±60ms 169±10ms ~1.91 preprocessing_log.PreprocessingSuite.time_pca('pbmc68k_reduced', None)
n/a n/a n/a preprocessing_log.PreprocessingSuite.time_regress_out('pbmc3k', 'off-axis')
n/a n/a n/a preprocessing_log.PreprocessingSuite.time_regress_out('pbmc3k', None)
16.8±0.3ms 17.1±0.6ms 1.02 preprocessing_log.PreprocessingSuite.time_regress_out('pbmc68k_reduced', 'off-axis')
17.0±0.7ms 17.3±0.4ms 1.02 preprocessing_log.PreprocessingSuite.time_regress_out('pbmc68k_reduced', None)
507±2ms 505±0.4ms 1.00 preprocessing_log.PreprocessingSuite.time_scale('pbmc3k', 'off-axis')
542±2ms 550±2ms 1.01 preprocessing_log.PreprocessingSuite.time_scale('pbmc3k', None)
4.77±0.2ms 4.76±0.06ms 1.00 preprocessing_log.PreprocessingSuite.time_scale('pbmc68k_reduced', 'off-axis')
4.81±0.2ms 4.63±0.1ms 0.96 preprocessing_log.PreprocessingSuite.time_scale('pbmc68k_reduced', None)
288M 288M 1.00 tools.ToolsSuite.peakmem_diffmap
294M 294M 1.00 tools.ToolsSuite.peakmem_leiden
375M 373M 0.99 tools.ToolsSuite.peakmem_rank_genes_groups
459M 459M 1.00 tools.ToolsSuite.peakmem_umap
16.5±0.6ms 16.9±0.4ms 1.02 tools.ToolsSuite.time_diffmap
19.8±0.06ms 19.8±0.06ms 1.00 tools.ToolsSuite.time_leiden
59.7±6ms 56.6±6ms 0.95 tools.ToolsSuite.time_rank_genes_groups
328±1ms 340±0.05ms 1.04 tools.ToolsSuite.time_umap