SIMDe 0.7.6 Released

I’m pleased to announce the availability of the latest release of SIMD Everywhere (SIMDe), version 0.7.6, representing more than two years of work by over 30 developers since version 0.7.2. (I also released 0.7.4 two weeks ago, but it needed a few more fixes; thanks go to the early adopters who helped me out.)

SIMDe is a permissively-licensed (MIT) header-only library which provides fast, portable implementations of SIMD intrinsics for platforms which aren’t natively supported by the API in question.

For example, with SIMDe you can use SSE, SSE2, SSE3, SSE4.1 and 4.2, AVX, AVX2, and many AVX-512 intrinsics on ARM, POWER, WebAssembly, or almost any platform with a C compiler. That includes, of course, x86 CPUs which don’t support the ISA extension in question (e.g., calling AVX-512F functions on a CPU which doesn’t natively support them).

If the target natively supports the SIMD extension in question there is no performance penalty for using SIMDe. Otherwise, accelerated implementations, such as NEON on ARM, AltiVec on POWER, WASM SIMD on WebAssembly, etc., are used when available to provide good performance.

SIMDe has already been used to port several packages to additional architectures through either upstream support or distribution packages, particularly on Debian.

What’s new in 0.7.4 / 0.7.6

40 new ARM NEON families implemented
Initial support for ARM SVE API implementation (14 families)
Complete support for x86 F16C API
Initial support for MIPS MSA API
Nearly complete support for WASM SIMD128 C/C++ API
Initial support for the E2K (Elbrus) architecture
Initial support for LoongArch LASX/LSX and optimized implementations of some SSE intrinsics
MSVC has many fixes, now compiled in CI using /ARCH:AVX, /ARCH:AVX2, and /ARCH:AVX512
Minimum meson version is now 0.54

As always, we have an extensive test suite to verify our implementations.

For a complete list of changes, check out the 0.7.4 and 0.7.6 release notes.

Below are some additional highlights:

X86

There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)

Completely supported functions families

Newly added function families

AVX512CD: 21 of 42 (50.00%)
AVX512VPOPCNTDQ: 18 of 18 💯%!
AVX512_4VNNIW: 6 of 6 💯%!
AVX512_BF16: 9 of 38 (23.68%)
AVX512_BITALG: 24 of 24 💯%!
AVX512_FP16: 2 of 1105 (0.18%)
AVX512_VBMI2 3 of 150 (2.00%)
AVX512_VNNI: 36 of 36 💯%!
AVX_VNNI: 8 of 16 (50.00%)

Additions to existing families

AVX512F: 579 additional, 856 total of 2660 (31.80%)
AVX512BW: 178 additional, 335 total of 828 (40.46%)
AVX512DQ: 77 additional, 111 total of 399 (27.82%)
AVX512_VBMI: 9 additional, 30 total of 30 💯%!
KNCNI: 113 additional, 114 total of 595 (19.16%)
VPCLMULQDQ: 1 additional, 2 total of 2 💯%!

Neon

SIMDe currently implements 56.46% of the ARM NEON functions (3766 out of 6670). If you don’t count 16-bit floats and poly types, it’s 75.95% (3766 / 4969).

Newly added families

addhn
bcax
cage
cmla
cmla_rot90
cmla_rot180
cmla_rot270
cvtn
fma
fma_lane
fma_n
ld2
ld4_lane
mla_lane
mlal_high_n
mlal_lane
mls_n
mlsl_high_n
mlsl_lane
mull_lane
qdmulh_lane
qdmulh_n
qrdmulh_lane
qrshrn_n
qrshrun_n
qshlu_n
qshrn_n
qshrun_n
recpe
recps
rshrn_n
rsqrte
rsqrts
shll_n
shrn_n
sqadd
sri_n
st2
st2_lane
st3_lane
st4_lane
subhn
subl_high
xar

MSA

Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.

What is coming next

Work on SIMDe is proceeding rapidly, but there are a lot of functions to implement… x86 alone has about 8,000 SIMD functions, and we’ve implemented about 3,000 of them. We will keep adding more functions and improving the implementations we already have.

If you’re interested in using SIMDe but need some specific functions to be implemented first, please file an issue and we may be able to prioritize those functions.

Getting Involved

If you’re interested in helping out please get in touch. We have a chat room on Matrix/Element which is fairly active if you have questions, or of course you can just dive right in on the issue tracker.