opt innerproduct and convolutiondepthwise x86 int8 sse4.1 by Edwardssss · Pull Request #6687 · Tencent/ncnn

Edwardssss · 2026-04-19T14:42:13Z

Description

It's TODO in src/layer/x86/convolutiondepthwise_x86.cpp and src/layer/x86/innerproduct_x86.cpp

Add SSE4.1 _mm_cvtepi8_epi16 (pmovsxbw instruction) for int8 sign extension in x86, replacing the legacy SSE2 pseudo sign-extension sequence.

This avoids the overhead of unpacking and shifting masks, significantly unlocking the performance of int8 inference on x86 architectures, particularly for models with heavy fully connected or depthwise separable convolution layers. Added proper fallback and -msse4.1 conditional #ifndef __SSE4_1__ guard loops to preserve backward compatibility.

Benchmark

Environment: Debian 13 , 4 threads (Command: ./benchmark/benchncnn 8 4 0)
CPU: 12 × AMD Ryzen 5 PRO 4650U with Radeon Graphics
Base: master (Clean build) vs PR Branch: opt-innerproduct-x86-int8-sse41

Model	Master Avg (ms)	PR Avg (ms)	Speedup (%)
squeezenet_int8	6.83	6.56	+3.95%
mobilenet_int8	7.01	5.44	+22.40%
googlenet_int8	13.18	13.03	+1.14%
resnet18_int8	9.68	9.74	-0.62%
vgg16_int8	78.44	64.34	+17.98%
resnet50_int8	32.63	27.72	+15.05%
squeezenet_ssd_int8	17.06	15.57	+8.73%
mobilenet_ssd_int8	11.35	11.09	+2.29%

I noticed that there seems to be no benchmark testing for devices very similar to mine in benchmark/README.md. If needed, I can submit a new PR later to improve it. :)

Edwardssss added 2 commits April 19, 2026 21:58

opt innerproduct x86 int8 sse4.1

2691709

opt convolutiondepthwise x86 int8 sse4.1

b45d7ed

github-actions Bot added the x86 label Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt innerproduct and convolutiondepthwise x86 int8 sse4.1#6687

opt innerproduct and convolutiondepthwise x86 int8 sse4.1#6687
Edwardssss wants to merge 2 commits intoTencent:masterfrom
Edwardssss:opt-innerproduct-x86-int8-sse41

Edwardssss commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Edwardssss commented Apr 19, 2026

Description

Benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant