[WIP] massive mips and loongarch optimization by nihui · Pull Request #6662 · Tencent/ncnn

nihui · 2026-04-09T08:56:07Z

No description provided.

tencent-adm · 2026-04-09T08:56:26Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-04-09T09:01:43Z

Codecov Report

❌ Patch coverage is 94.91028% with 156 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.20%. Comparing base (71b1a61) to head (e910cbb).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
src/layer/loongarch/convolution_loongarch.cpp	72.79%	117 Missing ⚠️
src/layer/loongarch/convolution_im2col_gemm_int8.h	96.44%	26 Missing ⚠️
src/layer/loongarch/convolution_packed_bf16s.h	98.88%	4 Missing ⚠️
src/layer/loongarch/convolution1d_loongarch.cpp	92.50%	3 Missing ⚠️
src/layer/loongarch/convolution_packed_int8.h	98.87%	3 Missing ⚠️
src/layer/loongarch/binaryop_loongarch.cpp	99.32%	2 Missing ⚠️
src/layer/loongarch/convolution_packed.h	99.72%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6662      +/-   ##
==========================================
- Coverage   93.96%   93.20%   -0.76%     
==========================================
  Files         932      932              
  Lines      299059   332717   +33658     
==========================================
+ Hits       280998   310099   +29101     
- Misses      18061    22618    +4557

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add jj+=12 loop unrolling to pack_B_tile, transpose_pack_B_tile, transpose_unpack_output_tile, and gemm_transB_packed_tile for all ii sections (8, 4, 2, 1). MIPS MSA has 32 SIMD registers so jj+=12 fits well (24 registers for ii+=8, 12 for ii+=4). Update get_optimal_tile_mnk to align TILE_N to multiples of 12 for better utilization of the new kernel. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ngArch Integrate bf16 storage support into multiple operators: MIPS: batchnorm, clip, dropout, selu, erf LoongArch: batchnorm, clip, dropout Each operator now declares forward_inplace_bf16s in its header, sets support_bf16_storage=true in the constructor, dispatches bf16 inputs from forward_inplace, and implements the bf16s path using the existing bf16s helper headers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add support_bf16_storage = true in constructors for both architectures - Add crop_pack4_bf16s_msa() for MIPS MSA using int64_t copies (8 bytes) - Add crop_pack4_bf16s_lsx() for LoongArch LSX using int64_t copies - Add crop_pack8_lasx() for LoongArch LASX float pack8 (256-bit) - Add crop_pack8_bf16s_lsx() for LoongArch LASX bf16 pack8 (128-bit) - Dispatch to bf16 variants when elemsize matches bf16 packing - Remove debug fprintf statements from MIPS deconvolution_packed_bf16s.h Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add interp_bilinear_pack8.h and interp_bicubic_pack8.h implementing 256-bit SIMD (8 floats) resize operations using LASX intrinsics. Update interp_loongarch.cpp to: - Include lasxintrin.h and the new pack8 headers under __loongarch_asx - Add elempack == 8 paths for dims 1, 2, and 3 (nearest, bilinear, bicubic) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… approach - Replace hand-written kernel packing and convolution loops with convolution1d_transform_kernel_packed() and convolution1d_packed() from convolution1d_packed.h - Rename weight_data_packed to weight_data_tm to match x86 pattern - Add LASX (256-bit) support with pack8 out_elempack - Add NCNN_BF16 support using cast-based approach (bf16->fp32->conv->bf16) - Add bf16 weight/bias cast in dynamic weight forward path - Include cpu.h, lasxintrin.h headers for new functionality Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

massive mips and loongarch optimization

6529782

github-actions Bot added core loongarch mips labels Apr 9, 2026

opt

8a0c38d

nihui force-pushed the mips-opt3 branch from dc8fc0f to 8a0c38d Compare April 10, 2026 07:08

nihui and others added 11 commits April 10, 2026 07:10

apply code-format changes

8b2010e

wip

d1f9876

apply code-format changes

df9cac1

fix

f4bb8c7

wip

cccdcc2

wip

5e68a2f

nihui force-pushed the mips-opt3 branch from 6bbdc54 to 5e68a2f Compare April 15, 2026 02:33

github-actions Bot added the test label Apr 15, 2026

nihui and others added 3 commits April 15, 2026 02:35

apply code-format changes

19b564e

cc

e5b89af

cc

1d498d6

nihui force-pushed the mips-opt3 branch from 5431c84 to 1d498d6 Compare April 15, 2026 06:31

cc

bc43dcc

nihui force-pushed the mips-opt3 branch from 0720de1 to bc43dcc Compare April 15, 2026 07:30

nihui and others added 3 commits April 15, 2026 07:32

apply code-format changes

7ef3ae0

fix bias

d84a773

cc

39a8f45

nihui and others added 20 commits April 20, 2026 11:26

apply code-format changes

43bd1bb

code clean

ef87a00

apply code-format changes

2029695

lsx

46bb432

Merge branch 'mips-opt3' of https://github.com/nihui/ncnn into mips-opt3

75cbb50

lsx winograd int8

add6a6b

apply code-format changes

af2a8e7

f

433b874

Merge branch 'mips-opt3' of https://github.com/nihui/ncnn into mips-opt3

49926df

fix

d6499b8

gcc-14 build fix

c9d860b

apply code-format changes

6a703db

fix

fc06148

shuffle macro

7bcc174

apply code-format changes

d4aa4f1

update reshape

694c99c

Merge branch 'mips-opt3' of https://github.com/nihui/ncnn into mips-opt3

5783f32

wip

dd40c59

work in progress, still buggy

7d86b63

wip

bfbb9c8

nihui force-pushed the mips-opt3 branch from 95a5679 to bfbb9c8 Compare April 22, 2026 09:53

nihui and others added 5 commits April 22, 2026 09:55

apply code-format changes

7cff269

cc

7679f3e

apply code-format changes

5b64544

cc

6913fcf

cc

660ad9e

nihui force-pushed the mips-opt3 branch from d6f6840 to 660ad9e Compare April 23, 2026 03:21

nihui and others added 3 commits April 23, 2026 03:23

apply code-format changes

34f9958

cc

cd9da30

cc

e910cbb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] massive mips and loongarch optimization#6662

[WIP] massive mips and loongarch optimization#6662
nihui wants to merge 76 commits intoTencent:masterfrom
nihui:mips-opt3

nihui commented Apr 9, 2026 •

edited

Loading

Uh oh!

tencent-adm commented Apr 9, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nihui commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tencent-adm commented Apr 9, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nihui commented Apr 9, 2026 •

edited

Loading

codecov-commenter commented Apr 9, 2026 •

edited

Loading