fearless_simd/lib.rs
1// Copyright 2024 the Fearless_SIMD Authors
2// SPDX-License-Identifier: Apache-2.0 OR MIT
3
4//! A helper library to make SIMD more friendly.
5//!
6//! Fearless SIMD exposes safe SIMD with ergonomic multi-versioning in Rust.
7//!
8//! Fearless SIMD uses "marker values" which serve as proofs of which target features are available on the current CPU.
9//! These each implement the [`Simd`] trait, which exposes a core set of SIMD operations which are implemented as
10//! efficiently as possible on each target platform.
11//!
12//! Additionally, there are types for packed vectors of a specific width and element type (such as [`f32x4`]).
13//! Fearless SIMD does not currently support vectors of less than 128 bits.
14//! These vector types implement some standard arithmetic traits (i.e. they can be added together using
15//! `+`, multiplied by a scalar using `*`, among others), which are implemented as efficiently
16//! as possible using SIMD instructions.
17//! These can be created in a SIMD context using the [`SimdFrom`] trait, or the
18//! [`from_slice`][SimdBase::from_slice] associated function.
19//!
20//! To call a function with the best available target features and get the associated `Simd`
21//! implementation, use the [`dispatch!()`] macro:
22//!
23//! ```rust
24//! use fearless_simd::{Level, Simd, dispatch};
25//!
26//! #[inline(always)]
27//! fn sigmoid<S: Simd>(simd: S, x: &[f32], out: &mut [f32]) { /* ... */ }
28//!
29//! // The stored level, which you should only construct once in your application.
30//! let level = Level::new();
31//!
32//! dispatch!(level, simd => sigmoid(simd, &[/*...*/], &mut [/*...*/]));
33//! ```
34//!
35//! A few things to note:
36//!
37//! 1) `sigmoid` is generic over any `Simd` type.
38//! 2) The [`dispatch`] macro is used to invoke the given function with the target features associated with the supplied [`Level`].
39//! 3) The function or closure passed to [`dispatch!()`] should be `#[inline(always)]`.
40//! The performance of the SIMD implementation may be poor if that isn't the case. See [the section on inlining for details](#inlining)
41//!
42//! The first parameter to [`dispatch!()`] is the [`Level`].
43//! If you are writing an application, you should create this once (using [`Level::new`]), and pass it to any function which wants to use SIMD.
44//! This type stores which instruction sets are available for the current process, which is used
45//! in the macro to dispatch to the most optimal variant of the supplied function for this process.
46//!
47//! # Inlining
48//!
49//! Fearless SIMD relies heavily on Rust's inlining support to create functions which have the
50//! given target features enabled.
51//! As such, most functions which you write when using Fearless SIMD should have the `#[inline(always)]` attribute.
52//!
53//! There is a rule of thumb for how to achieve things in Fearless SIMD:
54//!
55//! - All SIMD functions need `#[inline(always)]`.
56//! - Use [`dispatch!`] when calling SIMD code from non-SIMD code.
57//! - Use [`vectorize()`](Simd::vectorize) when calling SIMD from SIMD if you don't want to force inlining.
58//!
59//! We currently don't have docs explaining why this is the case.
60//! You can read [this Zulip conversation](https://xi.zulipchat.com/#narrow/channel/514230-simd/topic/inlining/with/546913433)
61//! for some train of thought explanation.
62//!
63//! <!--
64//! TODO: Also have concrete examples of each of these.
65//!
66//! TODO: This is a really subtle point, and we do need there to be a well-written explanation available.
67//! E.g. We might want names for these, e.g.:
68//!
69//! # Kernels vs not kernels
70//!
71//! TODO: Talk about writing versions of functions which can be called in other `S: Simd` functions.
72//! -->
73//!
74//! # WebAssembly
75//!
76//! WASM SIMD doesn't have feature detection, and so you need to compile two versions of your bundle for WASM, one with SIMD and one without,
77//! then select the appropriate one for your user's browser. This can be done via [the `wasm-feature-detect`
78//! library](https://github.com/GoogleChromeLabs/wasm-feature-detect).
79//!
80//! You can compile WebAssembly with the SIMD128 feature enabled via the `RUSTFLAGS` environment variable
81//! (`RUSTFLAGS="-Ctarget-feature=+simd128"`), or by adding the compiler flags in your [Cargo
82//! config.toml](https://doc.rust-lang.org/cargo/reference/config.html):
83//!
84//! ```toml
85//! [target.'cfg(target_arch = "wasm32")']
86//! rustflags = ["-Ctarget-feature=+simd128"]
87//! rustdocflags = ["-Ctarget-feature=+simd128"]
88//! ```
89//!
90//! If you want to compile both SIMD and non-SIMD versions of your WebAssembly library, your best option right now is to create a shell script
91//! that builds it once with the `RUSTFLAGS` specified, and once without. [Cargo currently does not allow specifying compiler flags
92//! per-profile.](https://github.com/rust-lang/cargo/issues/10271)
93//!
94//! ## Relaxed SIMD
95//!
96//! Fearless SIMD can make use of the [relaxed SIMD](https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md)
97//! WebAssembly instructions, if the requisite target feature is enabled. These instructions can return implementation-dependent results
98//! depending on what is fastest on the underlying hardware. They are only used for operations where we already give hardware-dependent results.
99//!
100//! At the time of writing, relaxed SIMD is only supported in Chrome. To make use of it, you'll need to build two versions of your library, one
101//! with relaxed SIMD enabled (`RUSTFLAGS="-Ctarget-feature=+simd128,+relaxed-simd"`) and one with it disabled, and then feature-detect at
102//! runtime.
103//!
104//! # Credits
105//!
106//! This crate was inspired by [`pulp`], [`std::simd`], among others in the Rust ecosystem, though makes many decisions differently.
107//! It benefited from conversations with Luca Versari, though he is not responsible for any of the mistakes or bad decisions.
108//!
109//! # Feature Flags
110//!
111//! The following crate [feature flags](https://doc.rust-lang.org/cargo/reference/features.html#dependency-features) are available:
112//!
113//! - `std` (enabled by default): Get floating point functions from the standard library (likely using your target's libc).
114//! Also allows using [`Level::new`] on all platforms, to detect which target features are enabled.
115//! - `libm`: Use floating point implementations from [libm].
116//! - `safe_wrappers`: Include safe wrappers for (some) target feature specific intrinsics,
117//! beyond the basic SIMD operations abstracted on all platforms.
118//! - `force_support_fallback`: Force scalar fallback, to be supported, even if your compilation target has a better baseline.
119//!
120//! At least one of `std` and `libm` is required; `std` overrides `libm`.
121//!
122//! [`pulp`]: https://crates.io/crates/pulp
123// LINEBENDER LINT SET - lib.rs - v3
124// See https://linebender.org/wiki/canonical-lints/
125// These lints shouldn't apply to examples or tests.
126#![cfg_attr(not(test), warn(unused_crate_dependencies))]
127// These lints shouldn't apply to examples.
128#![warn(clippy::print_stdout, clippy::print_stderr)]
129// Targeting e.g. 32-bit means structs containing usize can give false positives for 64-bit.
130#![cfg_attr(target_pointer_width = "64", warn(clippy::trivially_copy_pass_by_ref))]
131// END LINEBENDER LINT SET
132#![cfg_attr(docsrs, feature(doc_cfg))]
133#![allow(non_camel_case_types, reason = "TODO")]
134#![expect(clippy::unused_unit, reason = "easier for code generation")]
135#![no_std]
136
137#[cfg(feature = "std")]
138extern crate std;
139
140#[cfg(all(not(feature = "libm"), not(feature = "std")))]
141compile_error!("fearless_simd requires either the `std` or `libm` feature");
142
143// Suppress the unused_crate_dependencies lint when both std and libm are specified.
144#[cfg(all(feature = "std", feature = "libm"))]
145use libm as _;
146
147pub mod core_arch;
148mod impl_macros;
149
150mod generated;
151mod macros;
152mod support;
153mod traits;
154
155pub use generated::*;
156pub use traits::*;
157
158/// This prelude module re-exports every SIMD trait defined in this library. It's useful for accessing trait methods.
159///
160/// Only traits are exported through the prelude; types must be exported separately.
161pub mod prelude {
162 pub use crate::generated::simd_trait::*;
163 pub use crate::traits::*;
164}
165
166/// Implementations of [`Simd`] for 64 bit ARM.
167#[cfg(target_arch = "aarch64")]
168pub mod aarch64 {
169 pub use crate::generated::Neon;
170}
171
172/// Implementations of [`Simd`] for webassembly.
173#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
174pub mod wasm32 {
175 pub use crate::generated::WasmSimd128;
176}
177
178/// Implementations of [`Simd`] on x86 architectures (both 32 and 64 bit).
179#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
180pub mod x86 {
181 pub use crate::generated::Avx2;
182 pub use crate::generated::Sse4_2;
183}
184
185/// The level enum with the specific SIMD capabilities available.
186///
187/// The contained values serve as a proof that the associated target
188/// feature is available.
189#[derive(Clone, Copy, Debug)]
190#[non_exhaustive]
191pub enum Level {
192 /// Scalar fallback level, i.e. no supported SIMD features are to be used.
193 ///
194 /// This can be created with [`Level::fallback`].
195 // We only want to compile the fallback implementation if:
196 // - We're on a supported architecture, but don't statically support the lowest alternative level; OR
197 // - We're on an unsupported architecture; OR
198 // - The fallback is forcibly enabled
199 #[cfg(any(
200 all(target_arch = "aarch64", not(target_feature = "neon")),
201 all(
202 any(target_arch = "x86", target_arch = "x86_64"),
203 not(all(
204 target_feature = "sse4.2",
205 target_feature = "cmpxchg16b",
206 target_feature = "popcnt"
207 ))
208 ),
209 all(target_arch = "wasm32", not(target_feature = "simd128")),
210 not(any(
211 target_arch = "x86",
212 target_arch = "x86_64",
213 target_arch = "aarch64",
214 target_arch = "wasm32"
215 )),
216 feature = "force_support_fallback"
217 ))]
218 Fallback(Fallback),
219 /// The Neon instruction set on 64 bit ARM.
220 #[cfg(target_arch = "aarch64")]
221 Neon(Neon),
222 /// The SIMD 128 instructions on 32-bit WebAssembly.
223 #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
224 WasmSimd128(WasmSimd128),
225 /// The SSE4.2 instruction set on (32 and 64 bit) x86, plus `popcnt` and `cmpxchg16b`.
226 /// Also known as x86-64-v2.
227 ///
228 /// All production CPUs with SSE4.2 also support the other two extensions, so it is safe to require them.
229 // We don't need to support this if the compilation target definitely supports something better.
230 #[cfg(all(
231 any(target_arch = "x86", target_arch = "x86_64"),
232 not(all(
233 target_feature = "avx2",
234 target_feature = "bmi1",
235 target_feature = "bmi2",
236 target_feature = "cmpxchg16b",
237 target_feature = "f16c",
238 target_feature = "fma",
239 target_feature = "lzcnt",
240 target_feature = "movbe",
241 target_feature = "popcnt",
242 target_feature = "xsave"
243 ))
244 ))]
245 Sse4_2(Sse4_2),
246 /// The x86-64-v3 instruction set on (32 and 64 bit) x86, including AVX2 and FMA.
247 #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
248 Avx2(Avx2),
249 // If new variants are added, make sure to handle them in `Level::dispatch`
250 // and `dispatch!()`
251}
252
253impl Level {
254 /// Detect the available features on the current CPU, and returns the best level.
255 ///
256 /// If no SIMD instruction set is available, a scalar fallback will be used instead.
257 ///
258 /// This function requires the standard library, to use the
259 /// [`is_x86_feature_detected`](std::arch::is_x86_feature_detected)
260 /// or [`is_aarch64_feature_detected`](std::arch::is_aarch64_feature_detected).
261 /// On wasm32, this requirement does not apply, so the standard library isn't required.
262 ///
263 /// Note that in most cases, this function should only be called by end-user applications.
264 /// Libraries should instead accept a `Level` argument, probably as they are
265 /// creating their data structures, then storing the level for any computations.
266 /// Libraries which wish to abstract away SIMD usage for their common-case clients,
267 /// should make their non-`Level` entrypoint match this function's `cfg`; to instead
268 /// handle this at runtime, they can use [`try_detect`](Self::try_detect),
269 /// handling the `None` case as they deem fit (probably panicking).
270 /// This strategy avoids users of the library inadvertently using the fallback level,
271 /// even if the requisite target features are available.
272 ///
273 /// If you are on an embedded device where these macros are not supported,
274 /// you should construct the relevant variants yourself, using whatever
275 /// way your specific chip supports accessing the current level.
276 ///
277 /// This value should be passed to [`dispatch!()`].
278 #[cfg(any(feature = "std", target_arch = "wasm32"))]
279 #[must_use]
280 #[expect(
281 clippy::new_without_default,
282 reason = "The `Level::new()` function is not always available, and we also want to be explicit about when runtime feature detection happens"
283 )]
284 pub fn new() -> Self {
285 #[cfg(target_arch = "aarch64")]
286 if std::arch::is_aarch64_feature_detected!("neon") {
287 return unsafe { Self::Neon(Neon::new_unchecked()) };
288 }
289 #[cfg(target_arch = "wasm32")]
290 {
291 // WASM always either has the SIMD feature compiled in or not.
292 #[cfg(target_feature = "simd128")]
293 return Self::WasmSimd128(WasmSimd128::new_unchecked());
294 }
295 #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
296 {
297 // Feature list sourced from `rustc --print=cfg --target x86_64-unknown-linux-gnu -C target-cpu=x86-64-v3`
298 // However, the following features are implied by avx2 and do not need to be spelled out:
299 // avx,fxsr,sse,sse2,sse3,sse4.1,sse4.2,ssse3
300 // This can be verified by running:
301 // rustc --print=cfg --target x86_64-unknown-linux-gnu -C target-feature='+avx2'
302 if std::arch::is_x86_feature_detected!("avx2")
303 && std::arch::is_x86_feature_detected!("bmi1")
304 && std::arch::is_x86_feature_detected!("bmi2")
305 && std::arch::is_x86_feature_detected!("cmpxchg16b")
306 && std::arch::is_x86_feature_detected!("f16c")
307 && std::arch::is_x86_feature_detected!("fma")
308 && std::arch::is_x86_feature_detected!("lzcnt")
309 && std::arch::is_x86_feature_detected!("movbe")
310 && std::arch::is_x86_feature_detected!("popcnt")
311 && std::arch::is_x86_feature_detected!("xsave")
312 {
313 return unsafe { Self::Avx2(Avx2::new_unchecked()) };
314 // All x86 CPUs that ever shipped with sse4.2 also have cmpxchg16b and popcnt:
315 // Intel Nehalem, AMD Bulldozer and VIA Isaiah II were the first with SSE4.2
316 // and have these extensions already.
317 } else if std::arch::is_x86_feature_detected!("sse4.2")
318 && std::arch::is_x86_feature_detected!("cmpxchg16b")
319 && std::arch::is_x86_feature_detected!("popcnt")
320 {
321 #[cfg(not(all(
322 target_feature = "avx2",
323 target_feature = "bmi1",
324 target_feature = "bmi2",
325 target_feature = "cmpxchg16b",
326 target_feature = "f16c",
327 target_feature = "fma",
328 target_feature = "lzcnt",
329 target_feature = "movbe",
330 target_feature = "popcnt",
331 target_feature = "xsave"
332 )))]
333 return unsafe { Self::Sse4_2(Sse4_2::new_unchecked()) };
334 }
335 }
336 #[cfg(any(
337 all(target_arch = "aarch64", not(target_feature = "neon")),
338 all(
339 any(target_arch = "x86", target_arch = "x86_64"),
340 not(all(
341 target_feature = "sse4.2",
342 target_feature = "cmpxchg16b",
343 target_feature = "popcnt"
344 ))
345 ),
346 all(target_arch = "wasm32", not(target_feature = "simd128")),
347 not(any(
348 target_arch = "x86",
349 target_arch = "x86_64",
350 target_arch = "aarch64",
351 target_arch = "wasm32"
352 )),
353 ))]
354 {
355 return Self::Fallback(Fallback::new());
356 }
357 #[allow(
358 unreachable_code,
359 reason = "`is_x86_feature_detected` or equivalents will have returned `true`, or Fallback was used."
360 )]
361 {
362 unreachable!()
363 }
364 }
365
366 /// Get the target feature level suitable for this run.
367 ///
368 /// Should be used in libraries if they wish to handle the case where
369 /// target features cannot be detected at runtime.
370 /// Most users should prefer [`new`](Self::new).
371 /// This is discussed in more detail in `new`'s documentation.
372 #[allow(clippy::allow_attributes, reason = "Only needed in some cfgs.")]
373 #[allow(unreachable_code, reason = "Fallback unreachable in some cfgs.")]
374 pub fn try_detect() -> Option<Self> {
375 #[cfg(any(feature = "std", target_arch = "wasm32"))]
376 return Some(Self::new());
377 None
378 }
379
380 /// Check whether this is the `Fallback` level; that is, whether no better feature level could
381 /// be statically or dynamically detected. This is useful if there's a scalarized version of
382 /// your algorithm that runs faster if SIMD isn't supported.
383 ///
384 /// This method is always available, even in cases where `Fallback` is not; for instance, if
385 /// you're targeting a platform that always supports some level of SIMD. In such cases, it will
386 /// always return false.
387 pub fn is_fallback(self) -> bool {
388 #[cfg(any(
389 all(target_arch = "aarch64", not(target_feature = "neon")),
390 all(
391 any(target_arch = "x86", target_arch = "x86_64"),
392 not(all(
393 target_feature = "sse4.2",
394 target_feature = "cmpxchg16b",
395 target_feature = "popcnt"
396 ))
397 ),
398 all(target_arch = "wasm32", not(target_feature = "simd128")),
399 not(any(
400 target_arch = "x86",
401 target_arch = "x86_64",
402 target_arch = "aarch64",
403 target_arch = "wasm32"
404 )),
405 feature = "force_support_fallback"
406 ))]
407 return matches!(self, Self::Fallback(_));
408
409 #[allow(unreachable_code, reason = "Fallback unreachable in some cfgs.")]
410 false
411 }
412
413 /// If this is a proof that Neon (or better) is available, access that instruction set.
414 ///
415 /// This method should be preferred over matching against the `Neon` variant of self,
416 /// because if Fearless SIMD gets support for an instruction set which is a superset of Neon,
417 /// this method will return a value even if that "better" instruction set is available.
418 ///
419 /// This can be used in combination with the `safe_wrappers` feature to gain checked access to
420 /// the level-specific SIMD capabilities.
421 #[cfg(target_arch = "aarch64")]
422 #[inline]
423 pub fn as_neon(self) -> Option<Neon> {
424 #[allow(
425 unreachable_patterns,
426 reason = "On machines which statically support `neon`, there is only one variant."
427 )]
428 match self {
429 Self::Neon(neon) => Some(neon),
430 _ => None,
431 }
432 }
433
434 /// If this is a proof that SIMD 128 (or better) is available, access that instruction set.
435 ///
436 /// This method should be preferred over matching against the `WasmSimd128` variant of self,
437 /// because if Fearless SIMD gets support for an instruction set which is a superset of SIMD 128,
438 /// this method will return a value even if that "better" instruction set is available.
439 ///
440 /// This can be used in combination with the `safe_wrappers` feature to gain checked access to
441 /// the level-specific SIMD capabilities.
442 #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
443 #[inline]
444 pub fn as_wasm_simd128(self) -> Option<WasmSimd128> {
445 #[allow(
446 unreachable_patterns,
447 reason = "On machines which statically support `simd128`, there is only one variant."
448 )]
449 match self {
450 Self::WasmSimd128(simd128) => Some(simd128),
451 _ => None,
452 }
453 }
454
455 /// If this is a proof that SSE4.2 (or better) is available, access that instruction set.
456 ///
457 /// This method should be preferred over matching against the `Sse4_2` variant of self,
458 /// because if Fearless SIMD gets support for an instruction set which is a superset of SSE4.2,
459 /// this method will return a value even if that "better" instruction set is available.
460 ///
461 /// This can be used in combination with the `safe_wrappers` feature to gain checked access to
462 /// the level-specific SIMD capabilities.
463 #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
464 #[inline]
465 pub fn as_sse4_2(self) -> Option<Sse4_2> {
466 match self {
467 // Safety: The Avx2 struct represents the x86-64-v3 feature set being enabled, which
468 // includes the `sse4.2`, `cmpxchg16b`, and `popcnt` features required by Sse4_2.
469 Self::Avx2(_avx) => unsafe { Some(Sse4_2::new_unchecked()) },
470 #[cfg(not(all(
471 target_feature = "avx2",
472 target_feature = "bmi1",
473 target_feature = "bmi2",
474 target_feature = "cmpxchg16b",
475 target_feature = "f16c",
476 target_feature = "fma",
477 target_feature = "lzcnt",
478 target_feature = "movbe",
479 target_feature = "popcnt",
480 target_feature = "xsave"
481 )))]
482 Self::Sse4_2(sse42) => Some(sse42),
483 #[allow(
484 unreachable_patterns,
485 reason = "This arm is reachable on baseline x86/x86_64."
486 )]
487 _ => None,
488 }
489 }
490
491 /// If this is a proof that the x86-64-v3 feature set (or better) is available, access that
492 /// instruction set.
493 ///
494 /// This method should be preferred over matching against the `AVX2` variant of self,
495 /// because if Fearless SIMD gets support for an instruction set which is a superset of AVX2,
496 /// this method will return a value even if that "better" instruction set is available.
497 ///
498 /// This can be used in combination with the `safe_wrappers` feature to gain checked access to
499 /// the level-specific SIMD capabilities.
500 #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
501 #[inline]
502 pub fn as_avx2(self) -> Option<Avx2> {
503 #[allow(
504 unreachable_patterns,
505 reason = "On machines which statically support `avx2`, there is only one variant."
506 )]
507 match self {
508 Self::Avx2(avx2) => Some(avx2),
509 _ => None,
510 }
511 }
512
513 /// Get the strongest statically supported SIMD level.
514 ///
515 /// That is, if your compilation run ambiently declares that a target feature is enabled,
516 /// this method will take that into account.
517 /// In most cases, you should use [`Level::new`] or [`Level::try_detect`].
518 /// This method is mainly useful for libraries, where:
519 ///
520 /// 1) Your crate features request that you not use the standard library, i.e. doesn't enable
521 /// your `"std"` crate feature reason (so you can't use [`Level::new`] and
522 /// [`Level::try_detect`] returns `None`); AND
523 /// 2) Your caller does not provide a [`Level`]; AND
524 /// 3) The library doesn't want to panic when it can't find a SIMD level.
525 ///
526 /// Note that in these cases, the library should clearly inform the integrator
527 /// that it is using a fallback and so not getting optimal performance (e.g. by panicking if
528 /// `debug_assertions` are enabled, and emitting a log with the "error" level otherwise).
529 /// The messages given should also provide actionable fixes, such as pointing to the
530 /// entry-point which provides a `Level`, or your `"std"` feature.
531 ///
532 /// Note that this is unaffected by the `force-support-fallback` feature.
533 /// Instead, you should use [`Level::fallback`] if you require the fallback level.
534 pub const fn baseline() -> Self {
535 // TODO: How do we possibly test that this method works in all cases?
536 // Note that you can use the `check_targets.sh` script to at least ensure that it compiles in all reasonable cases.
537 #[cfg(not(any(
538 target_arch = "x86",
539 target_arch = "x86_64",
540 target_arch = "aarch64",
541 target_arch = "wasm32"
542 )))]
543 {
544 return Self::Fallback(Fallback::new());
545 }
546 #[cfg(target_arch = "aarch64")]
547 {
548 #[cfg(target_feature = "neon")]
549 return unsafe { Self::Neon(Neon::new_unchecked()) };
550 #[cfg(not(target_feature = "neon"))]
551 return Self::Fallback(Fallback::new());
552 }
553 #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
554 {
555 #[cfg(all(
556 target_feature = "avx2",
557 target_feature = "bmi1",
558 target_feature = "bmi2",
559 target_feature = "cmpxchg16b",
560 target_feature = "f16c",
561 target_feature = "fma",
562 target_feature = "lzcnt",
563 target_feature = "movbe",
564 target_feature = "popcnt",
565 target_feature = "xsave"
566 ))]
567 return unsafe { Self::Avx2(Avx2::new_unchecked()) };
568 #[cfg(all(
569 all(
570 target_feature = "sse4.2",
571 target_feature = "cmpxchg16b",
572 target_feature = "popcnt"
573 ),
574 not(all(
575 target_feature = "avx2",
576 target_feature = "bmi1",
577 target_feature = "bmi2",
578 target_feature = "cmpxchg16b",
579 target_feature = "f16c",
580 target_feature = "fma",
581 target_feature = "lzcnt",
582 target_feature = "movbe",
583 target_feature = "popcnt",
584 target_feature = "xsave"
585 ))
586 ))]
587 return unsafe { Self::Sse4_2(Sse4_2::new_unchecked()) };
588 #[cfg(not(all(
589 target_feature = "sse4.2",
590 target_feature = "cmpxchg16b",
591 target_feature = "popcnt"
592 )))]
593 return Self::Fallback(Fallback::new());
594 }
595 #[cfg(target_arch = "wasm32")]
596 {
597 #[cfg(target_feature = "simd128")]
598 return Self::WasmSimd128(WasmSimd128::new_unchecked());
599 #[cfg(not(target_feature = "simd128"))]
600 return Self::Fallback(Fallback::new());
601 }
602 }
603
604 /// Create a scalar fallback level, which uses no SIMD instructions.
605 ///
606 /// This is primarily intended for tests; most users should prefer [`Level::new`] or [`Level::baseline`].
607 ///
608 /// Note that enabling the scalar fallback does *not* mean that the fallback branch will not
609 /// contain SIMD instructions. This is because the "ambient" compilation environment has SIMD
610 /// instructions available, which may be utilised by LLVM to auto-vectorise that path.
611 #[inline]
612 #[cfg(feature = "force_support_fallback")]
613 pub const fn fallback() -> Self {
614 Self::Fallback(Fallback::new())
615 }
616
617 /// Dispatch `f` to a context where the target features which this `Level` proves are available are [enabled].
618 ///
619 /// Most users of Fearless SIMD should prefer to use [`dispatch!()`] to
620 /// explicitly vectorize a function. That has a better developer experience
621 /// than an implementation of `WithSimd`, and is less likely to miss a vectorization
622 /// opportunity.
623 ///
624 /// This has two use cases:
625 /// 1) To call a manually written implementation of [`WithSimd`].
626 /// 2) To ask the compiler to auto-vectorize scalar code.
627 ///
628 /// For the second case to work, the provided function *must* be attributed with `#[inline(always)]`.
629 /// Note also that any calls that function makes to other functions will likely not be auto-vectorized,
630 /// unless they are also `#[inline(always)]`.
631 ///
632 /// [enabled]: https://doc.rust-lang.org/reference/attributes/codegen.html#the-target_feature-attribute
633 #[inline]
634 #[expect(
635 unreachable_patterns,
636 reason = "Level is `non_exhaustive`, but we are in the crate it's defined."
637 )]
638 pub fn dispatch<W: WithSimd>(self, f: W) -> W::Output {
639 dispatch!(self, simd => f.with_simd(simd))
640 }
641}
642
643#[cfg(test)]
644mod tests {
645 use crate::Level;
646
647 const fn assert_is_send_sync<T: Send + Sync>() {}
648 /// If this test compiles, we know that [`Level`] is properly `Send` and `Sync`.
649 #[test]
650 fn level_is_send_sync() {
651 assert_is_send_sync::<Level>();
652 }
653}