[][src]Crate packed_simd

Portable packed SIMD vectors

This crate is proposed for stabilization as std::packed_simd in RFC2366: std::simd .

The examples available in the examples/ sub-directory of the crate showcase how to use the library in practice.

Table of contents

Introduction

This crate exports Simd<[T; N]>: a packed vector of N elements of type T as well as many type aliases for this type: for example, f32x4, which is just an alias for Simd<[f32; 4]>.

The operations on packed vectors are, by default, "vertical", that is, they are applied to each vector lane in isolation of the others:

let a = i32x4::new(1, 2, 3, 4);
let b = i32x4::new(5, 6, 7, 8);
assert_eq!(a + b, i32x4::new(6, 8, 10, 12));

Many "horizontal" operations are also provided:

assert_eq!(a.wrapping_sum(), 10);

In virtually all architectures vertical operations are fast, while horizontal operations are, by comparison, much slower. That is, the most portably-efficient way of performing a reduction over a slice is to collect the results into a vector using vertical operations, and performing a single horizontal operation at the end:

fn reduce(x: &[i32]) -> i32 {
    assert!(x.len() % 4 == 0);
    let mut sum = i32x4::splat(0); // [0, 0, 0, 0]
    for i in (0..x.len()).step_by(4) {
        sum += i32x4::from_slice_unaligned(&x[i..]);
    }
    sum.wrapping_sum()
}

let x = [0, 1, 2, 3, 4, 5, 6, 7];
assert_eq!(reduce(&x), 28);

Vector types

The vector type aliases are named according to the following scheme:

{element_type}x{number_of_lanes} == Simd<[element_type; number_of_lanes]>

where the following element types are supported:

Basic operations

// Sets all elements to `0`:
let a = i32x4::splat(0);

// Reads a vector from a slice:
let mut arr = [0, 0, 0, 1, 2, 3, 4, 5];
let b = i32x4::from_slice_unaligned(&arr);

// Reads the 4-th element of a vector:
assert_eq!(b.extract(3), 1);

// Returns a new vector where the 4-th element is replaced with `1`:
let a = a.replace(3, 1);
assert_eq!(a, b);

// Writes a vector to a slice:
let a = a.replace(2, 1);
a.write_to_slice_unaligned(&mut arr[4..]);
assert_eq!(arr, [0, 0, 0, 1, 0, 0, 1, 1]);

Conditional operations

One often needs to perform an operation on some lanes of the vector. Vector masks, like m32x4, allow selecting on which vector lanes an operation is to be performed:

let a = i32x4::new(1, 1, 2, 2);

// Add `1` to the first two lanes of the vector.
let m = m16x4::new(true, true, false, false);
let a = m.select(a + 1, a);
assert_eq!(a, i32x4::splat(2));

The elements of a vector mask are either true or false. Here true means that a lane is "selected", while false means that a lane is not selected.

All vector masks implement a mask.select(a: T, b: T) -> T method that works on all vectors that have the same number of lanes as the mask. The resulting vector contains the elements of a for those lanes for which the mask is true, and the elements of b otherwise.

The example constructs a mask with the first two lanes set to true and the last two lanes set to false. This selects the first two lanes of a + 1 and the last two lanes of a, producing a vector where the first two lanes have been incremented by 1.

note: mask select can be used on vector types that have the same number of lanes as the mask. The example shows this by using m16x4 instead of m32x4. It is typically more performant to use a mask element width equal to the element width of the vectors being operated upon. This is, however, not true for 512-bit wide vectors when targetting AVX-512, where the most efficient masks use only 1-bit per element.

All vertical comparison operations returns masks:

let a = i32x4::new(1, 1, 3, 3);
let b = i32x4::new(2, 2, 0, 0);

// ge: >= (Greater Eequal; see also lt, le, gt, eq, ne).
let m = a.ge(i32x4::splat(2));

if m.any() {
    // all / any / none allow coherent control flow
    let d = m.select(a, b);
    assert_eq!(d, i32x4::new(2, 2, 3, 3));
}

Conversions

Macros

shuffle

Shuffles vector elements.

Structs

LexicographicallyOrdered

Wrapper over T implementing a lexicoraphical order via the PartialOrd and/or Ord traits.

Simd

Packed SIMD vector type.

m8

8-bit wide mask.

m16

16-bit wide mask.

m32

32-bit wide mask.

m64

64-bit wide mask.

m128

128-bit wide mask.

msize

isize-wide mask.

Traits

Cast

Numeric cast from Self to T.

FromBits

Safe lossless bitwise conversion from T to Self.

FromCast

Numeric cast from T to Self.

IntoBits

Safe lossless bitwise conversion from Self to T.

Mask

This trait is implemented by all mask types

SimdArray

Trait implemented by arrays that can be SIMD types.

SimdVector

This trait is implemented by all SIMD vector types.

Type Definitions

cptrx2

A vector with 2 *const T lanes

cptrx4

A vector with 4 *const T lanes

cptrx8

A vector with 8 *const T lanes

f32x2

A 64-bit vector with 2 f32 lanes.

f32x4

A 128-bit vector with 4 f32 lanes.

f32x8

A 256-bit vector with 8 f32 lanes.

f32x16

A 512-bit vector with 16 f32 lanes.

f64x2

A 128-bit vector with 2 f64 lanes.

f64x4

A 256-bit vector with 4 f64 lanes.

f64x8

A 512-bit vector with 8 f64 lanes.

i128x1

A 128-bit vector with 1 i128 lane.

i128x2

A 256-bit vector with 2 i128 lanes.

i128x4

A 512-bit vector with 4 i128 lanes.

i16x2

A 32-bit vector with 2 i16 lanes.

i16x4

A 64-bit vector with 4 i16 lanes.

i16x8

A 128-bit vector with 8 i16 lanes.

i16x16

A 256-bit vector with 16 i16 lanes.

i16x32

A 512-bit vector with 32 i16 lanes.

i32x2

A 64-bit vector with 2 i32 lanes.

i32x4

A 128-bit vector with 4 i32 lanes.

i32x8

A 256-bit vector with 8 i32 lanes.

i32x16

A 512-bit vector with 16 i32 lanes.

i64x2

A 128-bit vector with 2 i64 lanes.

i64x4

A 256-bit vector with 4 i64 lanes.

i64x8

A 512-bit vector with 8 i64 lanes.

i8x2

A 16-bit vector with 2 i8 lanes.

i8x4

A 32-bit vector with 4 i8 lanes.

i8x8

A 64-bit vector with 8 i8 lanes.

i8x16

A 128-bit vector with 16 i8 lanes.

i8x32

A 256-bit vector with 32 i8 lanes.

i8x64

A 512-bit vector with 64 i8 lanes.

isizex2

A vector with 2 isize lanes.

isizex4

A vector with 4 isize lanes.

isizex8

A vector with 4 isize lanes.

m128x1

A 128-bit vector mask with 1 m128 lane.

m128x2

A 256-bit vector mask with 2 m128 lanes.

m128x4

A 512-bit vector mask with 4 m128 lanes.

m16x2

A 32-bit vector mask with 2 m16 lanes.

m16x4

A 64-bit vector mask with 4 m16 lanes.

m16x8

A 128-bit vector mask with 8 m16 lanes.

m16x16

A 256-bit vector mask with 16 m16 lanes.

m16x32

A 512-bit vector mask with 32 m16 lanes.

m32x2

A 64-bit vector mask with 2 m32 lanes.

m32x4

A 128-bit vector mask with 4 m32 lanes.

m32x8

A 256-bit vector mask with 8 m32 lanes.

m32x16

A 512-bit vector mask with 16 m32 lanes.

m64x2

A 128-bit vector mask with 2 m64 lanes.

m64x4

A 256-bit vector mask with 4 m64 lanes.

m64x8

A 512-bit vector mask with 8 m64 lanes.

m8x2

A 16-bit vector mask with 2 m8 lanes.

m8x4

A 32-bit vector mask with 4 m8 lanes.

m8x8

A 64-bit vector mask with 8 m8 lanes.

m8x16

A 128-bit vector mask with 16 m8 lanes.

m8x32

A 256-bit vector mask with 32 m8 lanes.

m8x64

A 512-bit vector mask with 64 m8 lanes.

mptrx2

A vector with 2 *mut T lanes

mptrx4

A vector with 4 *mut T lanes

mptrx8

A vector with 8 *mut T lanes

msizex2

A vector mask with 2 msize lanes.

msizex4

A vector mask with 4 msize lanes.

msizex8

A vector mask with 8 msize lanes.

u128x1

A 128-bit vector with 1 u128 lane.

u128x2

A 256-bit vector with 2 u128 lanes.

u128x4

A 512-bit vector with 4 u128 lanes.

u16x2

A 32-bit vector with 2 u16 lanes.

u16x4

A 64-bit vector with 4 u16 lanes.

u16x8

A 128-bit vector with 8 u16 lanes.

u16x16

A 256-bit vector with 16 u16 lanes.

u16x32

A 512-bit vector with 32 u16 lanes.

u32x2

A 64-bit vector with 2 u32 lanes.

u32x4

A 128-bit vector with 4 u32 lanes.

u32x8

A 256-bit vector with 8 u32 lanes.

u32x16

A 512-bit vector with 16 u32 lanes.

u64x2

A 128-bit vector with 2 u64 lanes.

u64x4

A 256-bit vector with 4 u64 lanes.

u64x8

A 512-bit vector with 8 u64 lanes.

u8x2

A 16-bit vector with 2 u8 lanes.

u8x4

A 32-bit vector with 4 u8 lanes.

u8x8

A 64-bit vector with 8 u8 lanes.

u8x16

A 128-bit vector with 16 u8 lanes.

u8x32

A 256-bit vector with 32 u8 lanes.

u8x64

A 512-bit vector with 64 u8 lanes.

usizex2

A vector with 2 usize lanes.

usizex4

A vector with 4 usize lanes.

usizex8

A vector with 8 usize lanes.