Jekyll2023-05-31T12:06:32+00:00https://simde-everywhere.github.io/blog/feed.xmlSIMD Everywhere: The BlogImplementations of SIMD instruction sets for systems which don't natively support them.SIMDe 0.7.6 Released2023-05-19T07:00:00+00:002023-05-19T07:00:00+00:00https://simde-everywhere.github.io/blog/2023/05/19/0.7.6-release<p>I’m pleased to announce the availability of the latest release of <a href="https://github.com/simd-everywhere/simde">SIMD
Everywhere</a> (SIMDe),
<a href="https://github.com/simd-everywhere/simde/releases">version 0.7.6</a>,
representing more than two years of work by over 30 developers since
version 0.7.2. (I also released 0.7.4 two weeks ago, but it needed a few more
fixes; thanks go to the early adopters who helped me out.)</p>
<p>SIMDe is a permissively-licensed (MIT) header-only library which
provides fast, portable implementations of
<a href="https://en.wikipedia.org/wiki/SIMD">SIMD</a> intrinsics for platforms
which aren’t natively supported by the API in question.</p>
<p>For example, with SIMDe you can use
<a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a>, SSE2, SSE3,
SSE4.1 and 4.2, AVX, AVX2, and many AVX-512 intrinsics on
<a href="https://en.wikipedia.org/wiki/ARM_architecture">ARM</a>,
<a href="https://en.wikipedia.org/wiki/IBM_POWER_instruction_set_architecture">POWER</a>,
<a href="https://webassembly.org/">WebAssembly</a>, or almost any platform with a
C compiler. That includes, of course, x86 CPUs which don’t support
the ISA extension in question (<em>e.g.</em>, calling AVX-512F functions on a
CPU which doesn’t natively support them).</p>
<p>If the target natively supports the SIMD extension in question there
is no performance penalty for using SIMDe. Otherwise, accelerated
implementations, such as NEON on ARM, AltiVec on POWER, WASM SIMD on
WebAssembly, etc., are used when available to provide good
performance.</p>
<p>SIMDe has already been used to port several packages to additional
architectures through either upstream support or distribution
packages, <a href="https://wiki.debian.org/SIMDEverywhere">particularly on
Debian</a>.</p>
<h2 id="whats-new-in-074--076">What’s new in 0.7.4 / 0.7.6</h2>
<ul>
<li>40 new ARM NEON families implemented</li>
<li>Initial support for ARM <a href="https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions">SVE API</a> implementation (<a href="https://github.com/simd-everywhere/simde/issues/609">14 families</a>)</li>
<li>Complete support for x86 <a href="https://en.wikipedia.org/wiki/F16C">F16C</a> API</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/msa.md">Initial support</a> for MIPS MSA API</li>
<li>Nearly complete support for <a href="https://github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md">WASM SIMD128</a> <a href="https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/wasm_simd128.h">C/C++ API</a></li>
<li>Initial support for the E2K (Elbrus) architecture</li>
<li>Initial support for LoongArch LASX/LSX and optimized implementations of some SSE intrinsics</li>
<li>MSVC has many fixes, now compiled in CI using <code class="language-plaintext highlighter-rouge">/ARCH:AVX</code>, <code class="language-plaintext highlighter-rouge">/ARCH:AVX2</code>, and <code class="language-plaintext highlighter-rouge">/ARCH:AVX512</code></li>
<li>Minimum meson version is now 0.54</li>
</ul>
<p>As always, we have an extensive test suite to verify our
implementations.</p>
<p>For a complete list of changes, check out the <a href="https://github.com/simd-everywhere/simde/releases/tag/v0.7.4">0.7.4</a>
and <a href="https://github.com/simd-everywhere/simde/releases/tag/v0.7.6">0.7.6</a> release notes.</p>
<p>Below are some additional highlights:</p>
<h3 id="x86"><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md">X86</a></h3>
<p>There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far.
Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)</p>
<h4 id="completely-supported-functions-families">Completely supported functions families</h4>
<ul>
<li><a href="https://en.wikipedia.org/wiki/MMX_(instruction_set)">MMX</a></li>
<li><a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSE2">SSE2</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSE3">SSE3</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSSE3">SSSE3</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSE4#SSE4.1">SSE4.1</a></li>
<li><a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions">AVX</a></li>
<li><a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2">AVX2</a></li>
<li><a href="https://en.wikipedia.org/wiki/F16C">F16C</a></li>
<li><a href="https://en.wikipedia.org/wiki/FMA_instruction_set">FMA</a></li>
<li><a href="https://en.wikipedia.org/wiki/AVX-512#GFNI">GFNI</a></li>
<li><a href="https://en.wikipedia.org/wiki/CLMUL_instruction_set">CLMUL</a></li>
<li><a href="https://en.wikipedia.org/wiki/XOP_instruction_set">XOP</a></li>
<li><a href="https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-vector-extensions-512-intel-avx-512-instructions/intrinsics-for-arithmetic-operations-1/intrinsics-for-short-vector-math-library-svml-operations.html">SVML</a></li>
</ul>
<h4 id="newly-added-function-families">Newly added function families</h4>
<ul>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512cd">AVX512CD</a>: 21 of 42 (50.00%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512vpopcntdq">AVX512VPOPCNTDQ</a>: 18 of 18 💯%!</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_4vnniw">AVX512_4VNNIW</a>: 6 of 6 💯%!</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_bf16">AVX512_BF16</a>: 9 of 38 (23.68%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_bitalg">AVX512_BITALG</a>: 24 of 24 💯%!</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_fp16">AVX512_FP16</a>: 2 of 1105 (0.18%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_vbmi2">AVX512_VBMI2</a> 3 of 150 (2.00%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_vnni">AVX512_VNNI</a>: 36 of 36 💯%!</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx_vnni">AVX_VNNI</a>: 8 of 16 (50.00%)</li>
</ul>
<h4 id="additions-to-existing-families">Additions to existing families</h4>
<ul>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512f">AVX512F</a>: 579 additional, 856 total of 2660 (31.80%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512bw">AVX512BW</a>: 178 additional, 335 total of 828 (40.46%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512dq">AVX512DQ</a>: 77 additional, 111 total of 399 (27.82%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_vbmi">AVX512_VBMI</a>: 9 additional, 30 total of 30 💯%!</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#kncni">KNCNI</a>: 113 additional, 114 total of 595 (19.16%)</li>
<li><a href="https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#vpclmulqdq">VPCLMULQDQ</a>: 1 additional, 2 total of 2 💯%!</li>
</ul>
<h3 id="neon"><a href="https://github.com/simd-everywhere/implementation-status/blob/main/neon.md">Neon</a></h3>
<p>SIMDe currently implements 56.46% of the ARM NEON functions (3766 out of 6670). If you don’t count 16-bit floats and poly types, it’s 75.95% (3766 / 4969).</p>
<h4 id="newly-added-families">Newly added families</h4>
<ul>
<li>addhn</li>
<li>bcax</li>
<li>cage</li>
<li>cmla</li>
<li>cmla_rot90</li>
<li>cmla_rot180</li>
<li>cmla_rot270</li>
<li>cvtn</li>
<li>fma</li>
<li>fma_lane</li>
<li>fma_n</li>
<li>ld2</li>
<li>ld4_lane</li>
<li>mla_lane</li>
<li>mlal_high_n</li>
<li>mlal_lane</li>
<li>mls_n</li>
<li>mlsl_high_n</li>
<li>mlsl_lane</li>
<li>mull_lane</li>
<li>qdmulh_lane</li>
<li>qdmulh_n</li>
<li>qrdmulh_lane</li>
<li>qrshrn_n</li>
<li>qrshrun_n</li>
<li>qshlu_n</li>
<li>qshrn_n</li>
<li>qshrun_n</li>
<li>recpe</li>
<li>recps</li>
<li>rshrn_n</li>
<li>rsqrte</li>
<li>rsqrts</li>
<li>shll_n</li>
<li>shrn_n</li>
<li>sqadd</li>
<li>sri_n</li>
<li>st2</li>
<li>st2_lane</li>
<li>st3_lane</li>
<li>st4_lane</li>
<li>subhn</li>
<li>subl_high</li>
<li>xar</li>
</ul>
<h3 id="msa"><a href="https://github.com/simd-everywhere/implementation-status/blob/main/msa.md">MSA</a></h3>
<p>Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.</p>
<h2 id="what-is-coming-next">What is coming next</h2>
<p>Work on SIMDe is proceeding rapidly, but there are a lot of functions
to implement… x86 alone has about 8,000 SIMD functions, and we’ve
implemented about 3,000 of them. We will keep adding more functions
and improving the implementations we already have.</p>
<p>If you’re interested in using SIMDe but need some specific functions
to be implemented first, please <a href="https://github.com/simd-everywhere/simde/issues/new">file an
issue</a> and we may
be able to prioritize those functions.</p>
<h2 id="getting-involved">Getting Involved</h2>
<p>If you’re interested in helping out please get in touch. We have <a href="https://gitter.im/simd-everywhere/community">a
chat room on Matrix/Element</a>
which is fairly active if you have questions, or of course you can
just dive right in on <a href="https://github.com/simd-everywhere/simde/issues">the issue
tracker</a>.</p>Michael R. CrusoeI’m pleased to announce the availability of the latest release of SIMD Everywhere (SIMDe), version 0.7.6, representing more than two years of work by over 30 developers since version 0.7.2. (I also released 0.7.4 two weeks ago, but it needed a few more fixes; thanks go to the early adopters who helped me out.)Transitioning SSE/AVX code to NEON with SIMDe2020-06-22T19:28:03+00:002020-06-22T19:28:03+00:00https://simde-everywhere.github.io/blog/2020/06/22/transitioning-to-arm-with-simde<p>Now that Apple <a href="https://arstechnica.com/gadgets/2020/06/this-is-apples-roadmap-for-moving-the-first-macs-away-from-intel/">has
announced</a>
that they will be moving away from x86 to their own ARM-based CPUs,
lots of people will be stuck with SIMD code targeting x86 ISA
extensions like SSE, SSE2, AVX, etc., which won’t run on Apple’s new
machines.</p>
<p>Arm CPUs do have support for SIMD, but instead of Intel technologies
like SSE and AVX, Arm has
<a href="https://developer.arm.com/architectures/instruction-sets/simd-isas/neon">NEON</a>.
NEON is an improvement over the x86 APIs in a lot of ways, and a
regression in others, but it is undeniably <em>different</em> and you can’t
just recompile your application on Arm and expect it to work.</p>
<p>Or can you? SIMD Everywhere (SIMDe) provides fast, portable,
permissively-licensed (MIT) implementations of the x86 APIs which
allow you to run code designed for x86/x86_64 CPUs pretty much
anywhere, including on Arm (using NEON if available). With almost no
source code changes, you can recompile your x86 SIMD code for Arm (or
POWER, or WebAssembly, etc.).</p>
<p>If NEON is available, SIMDe will even use it to provide the x86
functions. For example, <code class="language-plaintext highlighter-rouge">_mm_add_ps</code> from SSE can be implemented
using NEON’s <code class="language-plaintext highlighter-rouge">vaddq_ps</code> function, so that’s exactly what SIMDe does.
For more complicated functions without direct analogs in NEON SIMDe
will use the fastest implementation we can. Hopefully that means
calling multiple NEON functions, but in the worst case scenario SIMDe
has completely portable C99 fallbacks.</p>
<p>If you’d like to take SIMDe for a test drive, it is usable <a href="https://godbolt.org/z/wf4t42">on
Compiler Explorer</a>. Compilation is a
bit slow, due to having to transfer large files, but it’s quite
usable.</p>
<p>Before I continue, it’s worth noting that SIMDe has <a href="https://gitter.im/simd-everywhere/community">an active chat
room</a>, <a href="https://groups.google.com/forum/#!forum/simde">a less-active
mailing list</a>, and <a href="https://github.com/simd-everywhere/simde/issues">a
very active issue
tracker</a> where
questions are welcome. If you have any questions, problems, concerns,
etc., please get in touch!</p>
<h2 id="getting-simde">Getting SIMDe</h2>
<p>I mentioned earlier that “almost no source code changes” are required.
Many of you are probably worried about the word “almost”, so let’s
discuss that for a bit.</p>
<p>First, you’ll need to get SIMDe. If you’re on Debian there is a
<a href="https://packages.debian.org/bullseye/libsimde-dev">libsimde-dev</a>
package, or on Fedora/Red Hat/etc. there is a
<a href="https://koji.fedoraproject.org/koji/packageinfo?packageID=31156">simde</a>
package. Both are pretty new, though, so they may not be available to
you yet.</p>
<p>If that doesn’t work for you, you can drop a copy of SIMDe into your
project. If you want to use a git submodule that will work, but the
<a href="https://github.com/simd-everywhere/simde">main repository</a> is pretty
big thanks to all the tests. If you want something a bit smaller we
also have a <a href="https://github.com/simd-everywhere/simde-no-tests/">simde-no-tests
repository</a> which
is basically a mirror of only the implementations that is updated
automatically whenever SIMDe is updated.</p>
<p>SIMDe is a header-only library, and doesn’t require any build system
integration; simply including the relevant headers is enough. That
said, we do recommend aggressive optimizations (like <code class="language-plaintext highlighter-rouge">-O3</code>), and
enabling OpenMP SIMD (which does not introduce a run-time dependency
on OpenMP) with <code class="language-plaintext highlighter-rouge">-fopenmp-simd</code> on GCC and clang, or <code class="language-plaintext highlighter-rouge">-qopenmp-simd</code>
on ICC. If you do enable OpenMP SIMD, please let SIMDe know by also
passing <code class="language-plaintext highlighter-rouge">-DSIMDE_ENABLE_OPENMP</code> (not necessary if you enable <em>full</em>
OpenMP, i.e. <code class="language-plaintext highlighter-rouge">-fopenmp</code> instead of <code class="language-plaintext highlighter-rouge">-fopenmp-simd</code>).</p>
<h2 id="source-level-changes">Source-level changes</h2>
<p>As far as source-level changes are concerned, all you need to do is
define the <code class="language-plaintext highlighter-rouge">SIMDE_ENABLE_NATIVE_ALIASES</code> macro, and include a SIMDe
header instead of *mmintrin.h. SIMDe headers are named according to
the ISA extension they supply, so you don’t need to remember which
letter corresponds to which ISA extension when writing your code, but
if you already have *mmintrin.h scattered around here is how they map
to SIMDe headers:</p>
<ul>
<li>mmintrin.h → simde/x86/mmx.h</li>
<li>xmmintrin.h → simde/x86/sse.h</li>
<li>emmintrin.h → simde/x86/sse2.h</li>
<li>pmmintrin.h → simde/x86/sse3.h</li>
<li>tmmintrin.h → simde/x86/ssse3.h</li>
<li>smmintrin.h → simde/x86/sse4.1.h</li>
<li>nmmintrin.h → simde/x86/sse4.2.h</li>
</ul>
<p>Starting with AVX Intel started using immintrin.h to just include
everything, so if you’re using immintrin.h just include the header
for the “greatest” ISA extension you use; for example, if you want
AVX-512F, include simde/x86/avx512f.h.</p>
<p>Let’s take a look at that <code class="language-plaintext highlighter-rouge">SIMDE_ENABLE_NATIVE_ALIASES</code> macro. If you
don’t define it, SIMDe will only define functions in its own <code class="language-plaintext highlighter-rouge">simde_*</code>
namespace. For example, instead of <code class="language-plaintext highlighter-rouge">_mm_add_ps</code> you would need to use
<code class="language-plaintext highlighter-rouge">simde_mm_add_ps</code>. If you <em>do</em> define <code class="language-plaintext highlighter-rouge">SIMDE_ENABLE_NATIVE_ALIASES</code>,
SIMDe will also use a function-like macro to create an alias:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define _mm_add_ps(a, b) simde_mm_add_ps(a, b)
</span></code></pre></div></div>
<p>While that works <em>most</em> of the time, there are a few things you’ll
want to be aware of. Perhaps the biggest problem is that Intel
doesn’t use fixed-width types (<code class="language-plaintext highlighter-rouge">int8_t</code>, <code class="language-plaintext highlighter-rouge">int32_t</code>, <code class="language-plaintext highlighter-rouge">uint16_t</code>, etc.)
in their APIs, they instead assume specific characteristics of
standard types which are true on x86 but may not be true on other
platforms. For example, on many Arm platforms, <code class="language-plaintext highlighter-rouge">char</code> is <em>unsigned</em>,
but Intel uses <code class="language-plaintext highlighter-rouge">char</code> to represent a <em>signed</em> 8-bit integer.</p>
<p>SIMDe deals with this by using fixed-width types in our
implementations so they work eveywhere, but if your code is using
<code class="language-plaintext highlighter-rouge">char</code> to mean <code class="language-plaintext highlighter-rouge">signed 8-bit integer</code> you may encounter problems when
attempting to use SIMDe functions on some platforms. The good news is
that you can generally just change your code to use <code class="language-plaintext highlighter-rouge">int8_t</code> instead
of <code class="language-plaintext highlighter-rouge">char</code>; it will work exactly the same on x86 (<code class="language-plaintext highlighter-rouge">int8_t</code> is likely
just a typedef to <code class="language-plaintext highlighter-rouge">char</code>), and it will also work on other platforms.</p>
<h2 id="completeness">Completeness</h2>
<p>SIMD APIs are big. Very big. x86/x86_64 alone currently has a bit
over 6,000 functions, of which SIMDe has implemented around 2,000.</p>
<p>That said, most of those are AVX-512F extensions which aren’t widely
used yet. Odds are quite good that you’re only using extensions for
which SIMDe already has complete support (as of v0.5.0, released
2020-06-22):</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/MMX_(instruction_set)">MMX</a></li>
<li><a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSE2">SSE2</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSE3">SSE3</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSSE3">SSSE3</a></li>
<li><a href="https://en.wikipedia.org/wiki/SSE4#SSE4.1">SSE4.1</a></li>
<li><a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions">AVX</a></li>
<li><a href="https://en.wikipedia.org/wiki/FMA_instruction_set">FMA</a></li>
<li><a href="https://en.wikipedia.org/wiki/AVX-512#GFNI">GFNI</a></li>
</ul>
<p>We also have a very good start on many other extensions, including
AVX2, AVX-512F, AVX-512BW, AVX-512VL, and NEON (portable
implementations of NEON that can run on x86, or anywhere else). Also,
it’s not really a CPU extension, but our implementation of SVML is
coming along nicely.</p>
<p>If SIMDe is missing a particular function you need, please <a href="https://github.com/simd-everywhere/simde/issues">file an
issue</a> and we may be
able to prioritize an implementation. We’re planning to implement all
functions anyways, and doing so in a slightly different order doesn’t
generally create any extra work, so if it would help your project
we’re generally happy to oblige. Of course, if you’re interested in
implementing something yourself instead of waiting for us patches are
always welcome!</p>
<h2 id="debugging">Debugging</h2>
<p>SIMDe can be a fantastic tool for debugging. Not only can you see
inside of the function to understand how it really works, you can also
run the code on your development machine in your native environment
<em>without an emulator</em>. Obviously you’ll eventually want to check
everything at least in an emulator, or preferably on real hardware,
but during development SIMDe can be immensely helpful.</p>
<h2 id="performance">Performance</h2>
<p>Honestly, it’s pretty good. When there is a NEON function that
implements exactly the same functionality there is no cost for using
SIMDe instead of calling the NEON function directly; the compiler
translates it to exactly the same code.</p>
<p>Even when we hit a portable fallback, the compiler is often smart
enough to auto-vectorize the code, especially if you have aggressive
optimizations (think <code class="language-plaintext highlighter-rouge">-O3</code>) enabled. We use compiler-specific
functionality like <a href="https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html">GCC-style vector
extensions</a>
(supported by pretty much every compiler except for MSVC), builtins
like
<a href="http://clang.llvm.org/docs/LanguageExtensions.html#langext-builtin-shufflevector"><code class="language-plaintext highlighter-rouge">__builtin_shufflevector</code></a>,
<a href="https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html#Vector-Extensions"><code class="language-plaintext highlighter-rouge">__builtin_shuffle</code></a>,
<a href="http://clang.llvm.org/docs/LanguageExtensions.html#langext-builtin-convertvector"><code class="language-plaintext highlighter-rouge">__builtin_convertvector</code></a>,
etc. wherever possible, which generaly results in optimal
implementations. Even when we hit portable fallbacks, they are
decorated with pragmas from OpenMP 4 SIMD, Cilk+, or copmiler-specific
hints like <a href="https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html">GCC loop-specific
pragmas</a>
or <a href="http://llvm.org/docs/Vectorizers.html#pragma-loop-hint-directives">clang pragma loop hint
directives</a>.
We try very hard to make sure that even the fallbacks are fast.</p>
<p>SIMDe will never make your project slower, only more portable.
Performance likely won’t be as good as a manual rewrite by someone who
knows NEON well, but you can get a port up and running at almost no
cost in terms of developer resources, and once it’s done you’re free
to mix, for example, SSE and NEON code at will. That means you can
gradually port specific portions of your code which are particularly
hot, or where SIMDe doesn’t do a good job (though in that case please
<a href="https://github.com/simd-everywhere/simde/issues">file an issue</a> too),
while leaving areas where SIMDe performance is adequate alone instead
of wasting development time and resources.</p>
<h2 id="still-have-questions">Still have questions?</h2>
<p>The <a href="https://github.com/simd-everywhere/simde/wiki/FAQ">F.A.Q.</a> has
some information which may help.</p>
<p>If that doesn’t answer your question please feel free to ask in <a href="https://gitter.im/simd-everywhere/community">our
chat room</a>, on <a href="https://groups.google.com/forum/#!forum/simde">our
mailing list</a>, or on
<a href="https://github.com/simd-everywhere/simde/issues">our issue tracker</a>;
if you have questions our documentation hasn’t answered, it’s a bug in
our documentation, so don’t worry about using the issue tracker!</p>Evan NemersonNow that Apple has announced that they will be moving away from x86 to their own ARM-based CPUs, lots of people will be stuck with SIMD code targeting x86 ISA extensions like SSE, SSE2, AVX, etc., which won’t run on Apple’s new machines.SIMDe 0.5.0 Released2020-06-21T07:00:00+00:002020-06-21T07:00:00+00:00https://simde-everywhere.github.io/blog/announcements/release/2020/06/21/0.5.0-release<p>I’m pleased to announce the availability of the first release of <a href="https://github.com/simd-everywhere/simde">SIMD
Everywhere</a> (SIMDe),
<a href="https://github.com/simd-everywhere/simde/releases">version 0.5.0</a>,
representing more than three years of work by over a dozen developers.</p>
<p>SIMDe is a permissively-licensed (MIT) header-only library which
provides fast, portable implementations of
<a href="https://en.wikipedia.org/wiki/SIMD">SIMD</a> intrinsics for platforms
which aren’t natively supported by the API in question.</p>
<p>For example, with SIMDe you can use
<a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a> on
<a href="https://en.wikipedia.org/wiki/ARM_architecture">ARM</a>,
<a href="https://en.wikipedia.org/wiki/IBM_POWER_instruction_set_architecture">POWER</a>,
<a href="https://webassembly.org/">WebAssembly</a>, or almost any platform with a
C compiler. That includes, of course, x86 CPUs which don’t support
the ISA extension in question (<em>e.g.</em>, calling AVX-512F functions on a
CPU which doesn’t natively support them).</p>
<p>If the target natively supports the SIMD extension in question there
is no performance penalty for using SIMDe. Otherwise, accelerated
implementations, such as NEON on ARM, AltiVec on POWER, WASM SIMD on
WebAssembly, etc., are used when available to provide good
performance.</p>
<p>SIMDe has already been used to port several packages to additional
architectures through either upstream support or distribution
packages, <a href="https://wiki.debian.org/SIMDEverywhere">particularly on
Debian</a>.</p>
<p>If you’d like to play with SIMDe online, you can do so <a href="https://simde.netlify.app/godbolt/demo">on Compiler
Explorer</a>.</p>
<h2 id="what-is-in-050">What is in 0.5.0</h2>
<p>The 0.5.0 release is SIMDe’s first release. It includes complete
implementations of:</p>
<ul>
<li>MMX</li>
<li>SSE</li>
<li>SSE2</li>
<li>SSE3</li>
<li>SSSE3</li>
<li>SSE4.1</li>
<li>AVX</li>
<li>FMA</li>
<li>GFNI</li>
</ul>
<p>We also have rapidly progressing implementations of many other
extensions including NEON, AVX2, SVML, and several AVX-512 extensions
(AVX-512F, AVX-512BW, AVX-512VL, etc.).</p>
<p>Additionally, we have an extensive test suite to verify our
implementations.</p>
<h2 id="what-is-coming-next">What is coming next</h2>
<p>Work on SIMDe is proceeding rapidly, but there are a lot of functions
to implement… x86 alone has about 6,000 SIMD functions, and we’ve
implemented about 2,000 of them. We will keep adding more functions
and improving the implementations we already have.</p>
<p>Our NEON implementation is being worked on very actively right now
by Sean Maher and Christopher Moore, and is expected to continue
progressing rapidly.</p>
<p>We currently have two Google Summer of Code students working on the
project as well; <a href="https://masterchef2209.wordpress.com/2020/06/17/guide-to-intel-sse4-2-crc-intrinisics-implementation-for-simde/">Hidayat
Khan</a>
is working on finishing up AVX2, and <a href="https://medium.com/@himanshi18037">Himanshi
Mathur</a> is focused on SVML.</p>
<p>If you’re interested in using SIMDe but need some specific functions
to be implemented first, please <a href="https://github.com/simd-everywhere/simde/issues/new">file an
issue</a> and we may
be able to prioritize those functions.</p>
<h2 id="getting-involved">Getting Involved</h2>
<p>If you’re interested in helping out please get in touch. We have <a href="https://gitter.im/simd-everywhere/community">a
chat room on Gitter</a>
which is fairly active if you have questions, or of course you can
just dive right in on <a href="https://github.com/simd-everywhere/simde/issues">the issue
tracker</a>.</p>Evan NemersonI’m pleased to announce the availability of the first release of SIMD Everywhere (SIMDe), version 0.5.0, representing more than three years of work by over a dozen developers.