I’m pleased to announce the availability of the latest release of SIMD Everywhere (SIMDe), version 0.7.6, representing more than two years of work by over 30 developers since version 0.7.2. (I also released 0.7.4 two weeks ago, but it needed a few more fixes; thanks go to the early adopters who helped me out.)

SIMDe is a permissively-licensed (MIT) header-only library which provides fast, portable implementations of SIMD intrinsics for platforms which aren’t natively supported by the API in question.

For example, with SIMDe you can use SSE, SSE2, SSE3, SSE4.1 and 4.2, AVX, AVX2, and many AVX-512 intrinsics on ARM, POWER, WebAssembly, or almost any platform with a C compiler. That includes, of course, x86 CPUs which don’t support the ISA extension in question (e.g., calling AVX-512F functions on a CPU which doesn’t natively support them).

If the target natively supports the SIMD extension in question there is no performance penalty for using SIMDe. Otherwise, accelerated implementations, such as NEON on ARM, AltiVec on POWER, WASM SIMD on WebAssembly, etc., are used when available to provide good performance.

SIMDe has already been used to port several packages to additional architectures through either upstream support or distribution packages, particularly on Debian.

What’s new in 0.7.4 / 0.7.6

  • 40 new ARM NEON families implemented
  • Initial support for ARM SVE API implementation (14 families)
  • Complete support for x86 F16C API
  • Initial support for MIPS MSA API
  • Nearly complete support for WASM SIMD128 C/C++ API
  • Initial support for the E2K (Elbrus) architecture
  • Initial support for LoongArch LASX/LSX and optimized implementations of some SSE intrinsics
  • MSVC has many fixes, now compiled in CI using /ARCH:AVX, /ARCH:AVX2, and /ARCH:AVX512
  • Minimum meson version is now 0.54

As always, we have an extensive test suite to verify our implementations.

For a complete list of changes, check out the 0.7.4 and 0.7.6 release notes.

Below are some additional highlights:

X86

There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)

Completely supported functions families

Newly added function families

Additions to existing families

  • AVX512F: 579 additional, 856 total of 2660 (31.80%)
  • AVX512BW: 178 additional, 335 total of 828 (40.46%)
  • AVX512DQ: 77 additional, 111 total of 399 (27.82%)
  • AVX512_VBMI: 9 additional, 30 total of 30 💯%!
  • KNCNI: 113 additional, 114 total of 595 (19.16%)
  • VPCLMULQDQ: 1 additional, 2 total of 2 💯%!

Neon

SIMDe currently implements 56.46% of the ARM NEON functions (3766 out of 6670). If you don’t count 16-bit floats and poly types, it’s 75.95% (3766 / 4969).

Newly added families

  • addhn
  • bcax
  • cage
  • cmla
  • cmla_rot90
  • cmla_rot180
  • cmla_rot270
  • cvtn
  • fma
  • fma_lane
  • fma_n
  • ld2
  • ld4_lane
  • mla_lane
  • mlal_high_n
  • mlal_lane
  • mls_n
  • mlsl_high_n
  • mlsl_lane
  • mull_lane
  • qdmulh_lane
  • qdmulh_n
  • qrdmulh_lane
  • qrshrn_n
  • qrshrun_n
  • qshlu_n
  • qshrn_n
  • qshrun_n
  • recpe
  • recps
  • rshrn_n
  • rsqrte
  • rsqrts
  • shll_n
  • shrn_n
  • sqadd
  • sri_n
  • st2
  • st2_lane
  • st3_lane
  • st4_lane
  • subhn
  • subl_high
  • xar

MSA

Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.

What is coming next

Work on SIMDe is proceeding rapidly, but there are a lot of functions to implement… x86 alone has about 8,000 SIMD functions, and we’ve implemented about 3,000 of them. We will keep adding more functions and improving the implementations we already have.

If you’re interested in using SIMDe but need some specific functions to be implemented first, please file an issue and we may be able to prioritize those functions.

Getting Involved

If you’re interested in helping out please get in touch. We have a chat room on Matrix/Element which is fairly active if you have questions, or of course you can just dive right in on the issue tracker.