#[repr(transparent)]pub struct bf16(u16);
Expand description
A 16-bit floating point type implementing the bfloat16
format.
The bfloat16
floating point format is a truncated 16-bit version of the IEEE 754 standard
binary32
, a.k.a f32
. bf16
has approximately the same dynamic range as f32
by
having a lower precision than f16
. While f16
has a precision of
11 bits, bf16
has a precision of only 8 bits.
Tuple Fields§
§0: u16
Implementations§
source§impl bf16
impl bf16
sourcepub fn from_f32(value: f32) -> bf16
pub fn from_f32(value: f32) -> bf16
Constructs a bf16
value from a 32-bit floating point value.
This operation is lossy. If the 32-bit value is too large to fit, ±∞ will result. NaN values are preserved. Subnormal values that are too tiny to be represented will result in ±0. All other values are truncated and rounded to the nearest representable value.
sourcepub const fn from_f32_const(value: f32) -> bf16
pub const fn from_f32_const(value: f32) -> bf16
Constructs a bf16
value from a 32-bit floating point value.
This function is identical to from_f32
except it never uses hardware
intrinsics, which allows it to be const
. from_f32
should be preferred
in any non-const
context.
This operation is lossy. If the 32-bit value is too large to fit, ±∞ will result. NaN values are preserved. Subnormal values that are too tiny to be represented will result in ±0. All other values are truncated and rounded to the nearest representable value.
sourcepub fn from_f64(value: f64) -> bf16
pub fn from_f64(value: f64) -> bf16
Constructs a bf16
value from a 64-bit floating point value.
This operation is lossy. If the 64-bit value is to large to fit, ±∞ will result. NaN values are preserved. 64-bit subnormal values are too tiny to be represented and result in ±0. Exponents that underflow the minimum exponent will result in subnormals or ±0. All other values are truncated and rounded to the nearest representable value.
sourcepub const fn from_f64_const(value: f64) -> bf16
pub const fn from_f64_const(value: f64) -> bf16
Constructs a bf16
value from a 64-bit floating point value.
This function is identical to from_f64
except it never uses hardware
intrinsics, which allows it to be const
. from_f64
should be preferred
in any non-const
context.
This operation is lossy. If the 64-bit value is to large to fit, ±∞ will result. NaN values are preserved. 64-bit subnormal values are too tiny to be represented and result in ±0. Exponents that underflow the minimum exponent will result in subnormals or ±0. All other values are truncated and rounded to the nearest representable value.
sourcepub const fn to_le_bytes(self) -> [u8; 2]
pub const fn to_le_bytes(self) -> [u8; 2]
Returns the memory representation of the underlying bit representation as a byte array in little-endian byte order.
§Examples
let bytes = bf16::from_f32(12.5).to_le_bytes();
assert_eq!(bytes, [0x48, 0x41]);
sourcepub const fn to_be_bytes(self) -> [u8; 2]
pub const fn to_be_bytes(self) -> [u8; 2]
Returns the memory representation of the underlying bit representation as a byte array in big-endian (network) byte order.
§Examples
let bytes = bf16::from_f32(12.5).to_be_bytes();
assert_eq!(bytes, [0x41, 0x48]);
sourcepub const fn to_ne_bytes(self) -> [u8; 2]
pub const fn to_ne_bytes(self) -> [u8; 2]
Returns the memory representation of the underlying bit representation as a byte array in native byte order.
As the target platform’s native endianness is used, portable code should use
to_be_bytes
or to_le_bytes
, as appropriate,
instead.
§Examples
let bytes = bf16::from_f32(12.5).to_ne_bytes();
assert_eq!(bytes, if cfg!(target_endian = "big") {
[0x41, 0x48]
} else {
[0x48, 0x41]
});
sourcepub const fn from_le_bytes(bytes: [u8; 2]) -> bf16
pub const fn from_le_bytes(bytes: [u8; 2]) -> bf16
Creates a floating point value from its representation as a byte array in little endian.
§Examples
let value = bf16::from_le_bytes([0x48, 0x41]);
assert_eq!(value, bf16::from_f32(12.5));
sourcepub const fn from_be_bytes(bytes: [u8; 2]) -> bf16
pub const fn from_be_bytes(bytes: [u8; 2]) -> bf16
Creates a floating point value from its representation as a byte array in big endian.
§Examples
let value = bf16::from_be_bytes([0x41, 0x48]);
assert_eq!(value, bf16::from_f32(12.5));
sourcepub const fn from_ne_bytes(bytes: [u8; 2]) -> bf16
pub const fn from_ne_bytes(bytes: [u8; 2]) -> bf16
Creates a floating point value from its representation as a byte array in native endian.
As the target platform’s native endianness is used, portable code likely wants to use
from_be_bytes
or from_le_bytes
, as
appropriate instead.
§Examples
let value = bf16::from_ne_bytes(if cfg!(target_endian = "big") {
[0x41, 0x48]
} else {
[0x48, 0x41]
});
assert_eq!(value, bf16::from_f32(12.5));
sourcepub const fn to_f32_const(self) -> f32
pub const fn to_f32_const(self) -> f32
sourcepub const fn to_f64_const(self) -> f64
pub const fn to_f64_const(self) -> f64
sourcepub const fn is_nan(self) -> bool
pub const fn is_nan(self) -> bool
Returns true
if this value is NaN and false
otherwise.
§Examples
let nan = bf16::NAN;
let f = bf16::from_f32(7.0_f32);
assert!(nan.is_nan());
assert!(!f.is_nan());
sourcepub const fn is_infinite(self) -> bool
pub const fn is_infinite(self) -> bool
Returns true
if this value is ±∞ and false
otherwise.
§Examples
let f = bf16::from_f32(7.0f32);
let inf = bf16::INFINITY;
let neg_inf = bf16::NEG_INFINITY;
let nan = bf16::NAN;
assert!(!f.is_infinite());
assert!(!nan.is_infinite());
assert!(inf.is_infinite());
assert!(neg_inf.is_infinite());
sourcepub const fn is_finite(self) -> bool
pub const fn is_finite(self) -> bool
Returns true
if this number is neither infinite nor NaN.
§Examples
let f = bf16::from_f32(7.0f32);
let inf = bf16::INFINITY;
let neg_inf = bf16::NEG_INFINITY;
let nan = bf16::NAN;
assert!(f.is_finite());
assert!(!nan.is_finite());
assert!(!inf.is_finite());
assert!(!neg_inf.is_finite());
sourcepub const fn is_normal(self) -> bool
pub const fn is_normal(self) -> bool
Returns true
if the number is neither zero, infinite, subnormal, or NaN.
§Examples
let min = bf16::MIN_POSITIVE;
let max = bf16::MAX;
let lower_than_min = bf16::from_f32(1.0e-39_f32);
let zero = bf16::from_f32(0.0_f32);
assert!(min.is_normal());
assert!(max.is_normal());
assert!(!zero.is_normal());
assert!(!bf16::NAN.is_normal());
assert!(!bf16::INFINITY.is_normal());
// Values between 0 and `min` are subnormal.
assert!(!lower_than_min.is_normal());
sourcepub const fn classify(self) -> FpCategory
pub const fn classify(self) -> FpCategory
Returns the floating point category of the number.
If only one property is going to be tested, it is generally faster to use the specific predicate instead.
§Examples
use std::num::FpCategory;
let num = bf16::from_f32(12.4_f32);
let inf = bf16::INFINITY;
assert_eq!(num.classify(), FpCategory::Normal);
assert_eq!(inf.classify(), FpCategory::Infinite);
sourcepub const fn signum(self) -> bf16
pub const fn signum(self) -> bf16
Returns a number that represents the sign of self
.
- 1.0 if the number is positive, +0.0 or
INFINITY
- −1.0 if the number is negative, −0.0
or [
NEG_INFINITY`]bf16::NEG_INFINITY NAN
if the number is NaN
§Examples
let f = bf16::from_f32(3.5_f32);
assert_eq!(f.signum(), bf16::from_f32(1.0));
assert_eq!(bf16::NEG_INFINITY.signum(), bf16::from_f32(-1.0));
assert!(bf16::NAN.signum().is_nan());
sourcepub const fn is_sign_positive(self) -> bool
pub const fn is_sign_positive(self) -> bool
Returns true
if and only if self
has a positive sign, including +0.0, NaNs with a
positive sign bit and +∞.
§Examples
let nan = bf16::NAN;
let f = bf16::from_f32(7.0_f32);
let g = bf16::from_f32(-7.0_f32);
assert!(f.is_sign_positive());
assert!(!g.is_sign_positive());
// NaN can be either positive or negative
assert!(nan.is_sign_positive() != nan.is_sign_negative());
sourcepub const fn is_sign_negative(self) -> bool
pub const fn is_sign_negative(self) -> bool
Returns true
if and only if self
has a negative sign, including −0.0, NaNs with a
negative sign bit and −∞.
§Examples
let nan = bf16::NAN;
let f = bf16::from_f32(7.0f32);
let g = bf16::from_f32(-7.0f32);
assert!(!f.is_sign_negative());
assert!(g.is_sign_negative());
// NaN can be either positive or negative
assert!(nan.is_sign_positive() != nan.is_sign_negative());
sourcepub const fn copysign(self, sign: bf16) -> bf16
pub const fn copysign(self, sign: bf16) -> bf16
Returns a number composed of the magnitude of self
and the sign of sign
.
Equal to self
if the sign of self
and sign
are the same, otherwise equal to -self
.
If self
is NaN, then NaN with the sign of sign
is returned.
§Examples
let f = bf16::from_f32(3.5);
assert_eq!(f.copysign(bf16::from_f32(0.42)), bf16::from_f32(3.5));
assert_eq!(f.copysign(bf16::from_f32(-0.42)), bf16::from_f32(-3.5));
assert_eq!((-f).copysign(bf16::from_f32(0.42)), bf16::from_f32(3.5));
assert_eq!((-f).copysign(bf16::from_f32(-0.42)), bf16::from_f32(-3.5));
assert!(bf16::NAN.copysign(bf16::from_f32(1.0)).is_nan());
sourcepub fn max(self, other: bf16) -> bf16
pub fn max(self, other: bf16) -> bf16
Returns the maximum of the two numbers.
If one of the arguments is NaN, then the other argument is returned.
§Examples
let x = bf16::from_f32(1.0);
let y = bf16::from_f32(2.0);
assert_eq!(x.max(y), y);
sourcepub fn min(self, other: bf16) -> bf16
pub fn min(self, other: bf16) -> bf16
Returns the minimum of the two numbers.
If one of the arguments is NaN, then the other argument is returned.
§Examples
let x = bf16::from_f32(1.0);
let y = bf16::from_f32(2.0);
assert_eq!(x.min(y), x);
sourcepub fn clamp(self, min: bf16, max: bf16) -> bf16
pub fn clamp(self, min: bf16, max: bf16) -> bf16
Restrict a value to a certain interval unless it is NaN.
Returns max
if self
is greater than max
, and min
if self
is less than min
.
Otherwise this returns self
.
Note that this function returns NaN if the initial value was NaN as well.
§Panics
Panics if min > max
, min
is NaN, or max
is NaN.
§Examples
assert!(bf16::from_f32(-3.0).clamp(bf16::from_f32(-2.0), bf16::from_f32(1.0)) == bf16::from_f32(-2.0));
assert!(bf16::from_f32(0.0).clamp(bf16::from_f32(-2.0), bf16::from_f32(1.0)) == bf16::from_f32(0.0));
assert!(bf16::from_f32(2.0).clamp(bf16::from_f32(-2.0), bf16::from_f32(1.0)) == bf16::from_f32(1.0));
assert!(bf16::NAN.clamp(bf16::from_f32(-2.0), bf16::from_f32(1.0)).is_nan());
sourcepub fn total_cmp(&self, other: &Self) -> Ordering
pub fn total_cmp(&self, other: &Self) -> Ordering
Returns the ordering between self
and other
.
Unlike the standard partial comparison between floating point numbers,
this comparison always produces an ordering in accordance to
the totalOrder
predicate as defined in the IEEE 754 (2008 revision)
floating point standard. The values are ordered in the following sequence:
- negative quiet NaN
- negative signaling NaN
- negative infinity
- negative numbers
- negative subnormal numbers
- negative zero
- positive zero
- positive subnormal numbers
- positive numbers
- positive infinity
- positive signaling NaN
- positive quiet NaN.
The ordering established by this function does not always agree with the
PartialOrd
and PartialEq
implementations of bf16
. For example,
they consider negative and positive zero equal, while total_cmp
doesn’t.
The interpretation of the signaling NaN bit follows the definition in the IEEE 754 standard, which may not match the interpretation by some of the older, non-conformant (e.g. MIPS) hardware implementations.
§Examples
let mut v: Vec<bf16> = vec![];
v.push(bf16::ONE);
v.push(bf16::INFINITY);
v.push(bf16::NEG_INFINITY);
v.push(bf16::NAN);
v.push(bf16::MAX_SUBNORMAL);
v.push(-bf16::MAX_SUBNORMAL);
v.push(bf16::ZERO);
v.push(bf16::NEG_ZERO);
v.push(bf16::NEG_ONE);
v.push(bf16::MIN_POSITIVE);
v.sort_by(|a, b| a.total_cmp(&b));
assert!(v
.into_iter()
.zip(
[
bf16::NEG_INFINITY,
bf16::NEG_ONE,
-bf16::MAX_SUBNORMAL,
bf16::NEG_ZERO,
bf16::ZERO,
bf16::MAX_SUBNORMAL,
bf16::MIN_POSITIVE,
bf16::ONE,
bf16::INFINITY,
bf16::NAN
]
.iter()
)
.all(|(a, b)| a.to_bits() == b.to_bits()));
sourcepub const EPSILON: bf16 = _
pub const EPSILON: bf16 = _
bf16
machine epsilon value
This is the difference between 1.0 and the next largest representable number.
sourcepub const MANTISSA_DIGITS: u32 = 8u32
pub const MANTISSA_DIGITS: u32 = 8u32
Number of bf16
significant digits in base 2
sourcepub const MAX_10_EXP: i32 = 38i32
pub const MAX_10_EXP: i32 = 38i32
Maximum possible bf16
power of 10 exponent
sourcepub const MIN_10_EXP: i32 = -37i32
pub const MIN_10_EXP: i32 = -37i32
Minimum possible normal bf16
power of 10 exponent
sourcepub const MIN_EXP: i32 = -125i32
pub const MIN_EXP: i32 = -125i32
One greater than the minimum possible normal bf16
power of 2 exponent
sourcepub const MIN_POSITIVE: bf16 = _
pub const MIN_POSITIVE: bf16 = _
Smallest positive normal bf16
value
sourcepub const NEG_INFINITY: bf16 = _
pub const NEG_INFINITY: bf16 = _
bf16
negative infinity (-∞).
sourcepub const MIN_POSITIVE_SUBNORMAL: bf16 = _
pub const MIN_POSITIVE_SUBNORMAL: bf16 = _
Minimum positive subnormal bf16
value
sourcepub const MAX_SUBNORMAL: bf16 = _
pub const MAX_SUBNORMAL: bf16 = _
Maximum subnormal bf16
value
sourcepub const FRAC_1_SQRT_2: bf16 = _
pub const FRAC_1_SQRT_2: bf16 = _
bf16
1/√2
sourcepub const FRAC_2_SQRT_PI: bf16 = _
pub const FRAC_2_SQRT_PI: bf16 = _
bf16
2/√π
Trait Implementations§
source§impl AddAssign<&bf16> for bf16
impl AddAssign<&bf16> for bf16
source§fn add_assign(&mut self, rhs: &bf16)
fn add_assign(&mut self, rhs: &bf16)
+=
operation. Read moresource§impl AddAssign for bf16
impl AddAssign for bf16
source§fn add_assign(&mut self, rhs: Self)
fn add_assign(&mut self, rhs: Self)
+=
operation. Read moresource§impl DivAssign<&bf16> for bf16
impl DivAssign<&bf16> for bf16
source§fn div_assign(&mut self, rhs: &bf16)
fn div_assign(&mut self, rhs: &bf16)
/=
operation. Read moresource§impl DivAssign for bf16
impl DivAssign for bf16
source§fn div_assign(&mut self, rhs: Self)
fn div_assign(&mut self, rhs: Self)
/=
operation. Read moresource§impl MulAssign<&bf16> for bf16
impl MulAssign<&bf16> for bf16
source§fn mul_assign(&mut self, rhs: &bf16)
fn mul_assign(&mut self, rhs: &bf16)
*=
operation. Read moresource§impl MulAssign for bf16
impl MulAssign for bf16
source§fn mul_assign(&mut self, rhs: Self)
fn mul_assign(&mut self, rhs: Self)
*=
operation. Read moresource§impl PartialEq for bf16
impl PartialEq for bf16
source§impl PartialOrd for bf16
impl PartialOrd for bf16
source§impl RemAssign<&bf16> for bf16
impl RemAssign<&bf16> for bf16
source§fn rem_assign(&mut self, rhs: &bf16)
fn rem_assign(&mut self, rhs: &bf16)
%=
operation. Read moresource§impl RemAssign for bf16
impl RemAssign for bf16
source§fn rem_assign(&mut self, rhs: Self)
fn rem_assign(&mut self, rhs: Self)
%=
operation. Read moresource§impl SubAssign<&bf16> for bf16
impl SubAssign<&bf16> for bf16
source§fn sub_assign(&mut self, rhs: &bf16)
fn sub_assign(&mut self, rhs: &bf16)
-=
operation. Read moresource§impl SubAssign for bf16
impl SubAssign for bf16
source§fn sub_assign(&mut self, rhs: Self)
fn sub_assign(&mut self, rhs: Self)
-=
operation. Read more