simd_feature_check

A modern C++20 library for CPU feature detection and SIMD vector operations. This library provides compile-time and runtime detection of SIMD instruction sets (SSE, AVX, AVX-512, AMX) along with a high-level vector abstraction layer for writing portable SIMD code.

Overview

simd_feature_check solves two fundamental challenges in SIMD programming:

Feature Detection: Determining which SIMD instruction sets are available on the target CPU, both at compile time and runtime
Vector Abstraction: Writing portable SIMD code that automatically adapts to the best available instruction set

The library uses CPUID instructions on x86/x86_64 to detect processor capabilities and provides a clean API for querying supported features. The vector abstraction layer automatically selects the optimal implementation based on detected capabilities.

Features

Comprehensive detection of 50+ SIMD features including SSE, AVX, AVX-512, and AMX
Compile-time detection via template metaprogramming
Runtime detection using CPUID instructions
Feature-to-string conversion for logging and debugging
Highest feature detection for capability reporting
Vector abstraction supporting multiple data types and sizes
Automatic dispatch to optimal SIMD implementations
Cross-platform support with architecture detection
Modern C++20 with concepts for type safety
Zero-overhead abstractions

Requirements

CMake 3.16 or newer
C++20 compatible compiler (GCC 10+, Clang 10+, MSVC 2019+)
Git

Compiler Support

Compiler	Minimum Version	Notes
GCC	10.0	Full support
Clang	10.0	Full support
MSVC	2019 (16.8)	Full support

Installation

Clone the repository:

git clone https://github.com/hun756/CPP-Starter-Template.git simd_feature_check
cd simd_feature_check

Create a build directory and configure:

mkdir build && cd build
cmake ..

Build the library:

cmake --build .

Run tests:

ctest

Build Options

The following CMake options are available:

Option	Default	Description
BUILD_SHARED_LIBS	OFF	Build shared libraries instead of static
BUILD_EXAMPLES	ON	Build example programs
BUILD_TESTS	ON	Build and enable tests
BUILD_BENCHMARKS	OFF	Build benchmarking programs
ENABLE_COVERAGE	OFF	Enable code coverage reporting
ENABLE_SANITIZERS	OFF	Enable sanitizers in debug builds
ENABLE_PCH	OFF	Enable precompiled headers
ENABLE_LTO	OFF	Enable Link Time Optimization

Example configuration with sanitizers enabled:

cmake .. -DENABLE_SANITIZERS=ON -DBUILD_EXAMPLES=ON

Library Architecture

The library is organized into several namespaces and components:

Core Namespace (simd)

The simd namespace contains feature detection functionality:

simd::Feature - Enum class listing all detectable features
simd::has_feature() - Runtime feature check
simd::compile_time:: - Compile-time feature detection
simd::runtime:: - Runtime feature detection
simd::FeatureDetector<T> - Template class for feature introspection

Vector SIMD Namespace (vector_simd)

The vector_simd namespace provides vector abstractions:

Vector<T, N> - Fixed-size SIMD vector class
Mask<T, N> - Mask type for predicate operations
Type aliases like float_v<4>, int32_v<8>, etc.
Native-width types like float_vn for optimal vector width

Implementation Details

The library uses a layered architecture:

simd/common.hpp - Architecture and compiler detection macros
simd/feature_check.hpp - Core feature detection implementation
simd/registers/types.hpp - Register type mappings for each ISA
simd/vector/vector.hpp - High-level vector abstraction
simd/impl/ - Architecture-specific implementations

Usage Examples

Basic Feature Detection

The simplest way to check for SIMD support:

#include "simd/feature_check.hpp"
#include <iostream>

int main()
{
    // Check for specific features
    if (simd::has_feature(simd::Feature::SSE)) {
        std::cout << "SSE is supported" << std::endl;
    }
    
    if (simd::has_feature(simd::Feature::AVX)) {
        std::cout << "AVX is supported" << std::endl;
    }
    
    if (simd::has_feature(simd::Feature::AVX2)) {
        std::cout << "AVX2 is supported" << std::endl;
    }
    
    if (simd::has_feature(simd::Feature::AVX512F)) {
        std::cout << "AVX-512 Foundation is supported" << std::endl;
    }
    
    // Get CPU vendor string
    std::cout << "CPU Vendor: " << simd::get_cpu_vendor() << std::endl;
    
    // Get the highest supported SIMD feature
    simd::Feature highest = simd::highest_feature();
    std::cout << "Highest SIMD feature: " 
              << simd::feature_to_string(highest) << std::endl;
    
    return 0;
}

Compile-Time Detection

When you need to know features at compile time for conditional compilation:

#include "simd/feature_check.hpp"
#include <iostream>

int main()
{
    // Check compile-time availability using boolean constants
    std::cout << "SSE compile-time: " 
              << (simd::compile_time::sse ? "Yes" : "No") << std::endl;
    
    std::cout << "AVX compile-time: " 
              << (simd::compile_time::avx ? "Yes" : "No") << std::endl;
    
    std::cout << "AVX2 compile-time: " 
              << (simd::compile_time::avx2 ? "Yes" : "No") << std::endl;
    
    // Template-based compile-time checks
    std::cout << "SSE2 available: " 
              << (simd::compile_time::has<simd::Feature::SSE2>() ? "Yes" : "No")
              << std::endl;
    
    std::cout << "AVX-512F available: " 
              << (simd::compile_time::has<simd::Feature::AVX512F>() ? "Yes" : "No")
              << std::endl;
    
    // Get maximum compile-time feature
    std::cout << "Max compile-time feature: " 
              << simd::feature_to_string(simd::compile_time::max_feature)
              << std::endl;
    
    return 0;
}

Using compile-time detection for conditional compilation:

#include "simd/feature_check.hpp"

void process_data(float* data, size_t size)
{
    if constexpr (simd::compile_time::has<simd::Feature::AVX512F>()) {
        // This code is only compiled if AVX-512F is available
        // Compiler will use AVX-512 intrinsics here
        process_avx512(data, size);
    }
    else if constexpr (simd::compile_time::has<simd::Feature::AVX2>()) {
        // Fallback to AVX2
        process_avx2(data, size);
    }
    else {
        // Scalar fallback
        process_scalar(data, size);
    }
}

Runtime Detection

For runtime checks that adapt to the executing CPU:

#include "simd/feature_check.hpp"
#include <iostream>

int main()
{
    // Runtime checks using template syntax
    std::cout << "SSE runtime: " 
              << (simd::runtime::has<simd::Feature::SSE>() ? "Yes" : "No")
              << std::endl;
    
    std::cout << "AVX runtime: " 
              << (simd::runtime::has<simd::Feature::AVX>() ? "Yes" : "No")
              << std::endl;
    
    std::cout << "AVX2 runtime: " 
              << (simd::runtime::has<simd::Feature::AVX2>() ? "Yes" : "No")
              << std::endl;
    
    // Get highest runtime feature
    std::cout << "Highest runtime feature: " 
              << simd::feature_to_string(simd::runtime::highest_feature())
              << std::endl;
    
    return 0;
}

FeatureDetector Class

The FeatureDetector template provides detailed introspection for any feature:

#include "simd/feature_check.hpp"
#include <iostream>

int main()
{
    // Create detectors for specific features
    using AVXDetector = simd::FeatureDetector<simd::Feature::AVX>;
    using AVX2Detector = simd::FeatureDetector<simd::Feature::AVX2>;
    using AVX512Detector = simd::FeatureDetector<simd::Feature::AVX512F>;
    
    // AVX feature information
    std::cout << "AVX Feature:" << std::endl;
    std::cout << "  Name: " << AVXDetector::name() << std::endl;
    std::cout << "  Compile-time support: " 
              << (AVXDetector::compile_time ? "Yes" : "No") << std::endl;
    std::cout << "  Runtime support: " 
              << (AVXDetector::available() ? "Yes" : "No") << std::endl;
    
    // AVX2 feature information
    std::cout << "AVX2 Feature:" << std::endl;
    std::cout << "  Name: " << AVX2Detector::name() << std::endl;
    std::cout << "  Compile-time support: " 
              << (AVX2Detector::compile_time ? "Yes" : "No") << std::endl;
    std::cout << "  Runtime support: " 
              << (AVX2Detector::available() ? "Yes" : "No") << std::endl;
    
    // AVX-512F feature information
    std::cout << "AVX-512F Feature:" << std::endl;
    std::cout << "  Name: " << AVX512Detector::name() << std::endl;
    std::cout << "  Compile-time support: " 
              << (AVX512Detector::compile_time ? "Yes" : "No") << std::endl;
    std::cout << "  Runtime support: " 
              << (AVX512Detector::available() ? "Yes" : "No") << std::endl;
    
    return 0;
}

Function Dispatch

Manual dispatch based on detected features:

#include "simd/feature_check.hpp"
#include <iostream>

// Scalar implementation (always available)
float* add_vectors_scalar(const float* a, const float* b, float* result,
                          size_t size)
{
    for (size_t i = 0; i < size; ++i) {
        result[i] = a[i] + b[i];
    }
    return result;
}

// AVX implementation (conditionally compiled)
#if SIMD_HAS_AVX
float* add_vectors_avx(const float* a, const float* b, float* result,
                       size_t size)
{
    // AVX-optimized implementation
    for (size_t i = 0; i < size; ++i) {
        result[i] = a[i] + b[i];
    }
    return result;
}
#endif

// AVX-512 implementation (conditionally compiled)
#if SIMD_HAS_AVX512F
float* add_vectors_avx512(const float* a, const float* b, float* result,
                          size_t size)
{
    // AVX-512 optimized implementation
    for (size_t i = 0; i < size; ++i) {
        result[i] = a[i] + b[i];
    }
    return result;
}
#endif

int main()
{
    using AddFunc = float* (*)(const float*, const float*, float*, size_t);
    
    AddFunc best_impl;
    
    // Select best implementation at runtime
    if (simd::has_feature(simd::Feature::AVX512F)) {
    #if SIMD_HAS_AVX512F
        best_impl = add_vectors_avx512;
    #else
        best_impl = add_vectors_scalar;
    #endif
    }
    else if (simd::has_feature(simd::Feature::AVX)) {
    #if SIMD_HAS_AVX
        best_impl = add_vectors_avx;
    #else
        best_impl = add_vectors_scalar;
    #endif
    }
    else {
        best_impl = add_vectors_scalar;
    }
    
    // Use the selected implementation
    float a[4] = {1.0f, 2.0f, 3.0f, 4.0f};
    float b[4] = {5.0f, 6.0f, 7.0f, 8.0f};
    float result[4];
    
    best_impl(a, b, result, 4);
    
    std::cout << "Result: [" << result[0] << ", " << result[1] 
              << ", " << result[2] << ", " << result[3] << "]" << std::endl;
    
    return 0;
}

Vector Operations

The library provides a high-level vector abstraction that automatically uses the best available SIMD instructions:

#include "simd/simd.hpp"
#include <iostream>

int main()
{
    using namespace vector_simd;
    
    // Create vectors with fixed size
    float_v<4> a{1.0f, 2.0f, 3.0f, 4.0f};
    float_v<4> b{5.0f, 6.0f, 7.0f, 8.0f};
    
    // Basic arithmetic operations
    float_v<4> sum = a + b;           // Element-wise addition
    float_v<4> diff = a - b;          // Element-wise subtraction
    float_v<4> prod = a * b;          // Element-wise multiplication
    float_v<4> quot = b / a;          // Element-wise division
    
    // Compound assignment operators
    float_v<4> c{0.0f, 0.0f, 0.0f, 0.0f};
    c += a;
    c *= b;
    
    // Extract and insert elements
    float val = sum.extract(0);       // Get first element
    sum.insert(0, 10.0f);             // Set first element
    
    // Store results to memory
    float result[4];
    sum.store(result);                // Unaligned store
    sum.store_aligned(result);        // Aligned store (faster)
    
    // Load from memory
    float_v<4> loaded = float_v<4>::load(result);
    float_v<4> aligned = float_v<4>::load_aligned(result);
    
    // Convert to std::array
    std::array<float, 4> arr = sum.to_array();
    
    return 0;
}

Mask Operations

Masks enable conditional operations and predication:

#include "simd/simd.hpp"
#include <iostream>

int main()
{
    using namespace vector_simd;
    
    float_v<4> a{1.0f, 5.0f, 3.0f, 8.0f};
    float_v<4> b{4.0f, 2.0f, 6.0f, 7.0f};
    
    // Comparison operations return masks
    auto mask_eq = (a == b);          // Equal comparison
    auto mask_ne = (a != b);          // Not equal
    auto mask_lt = (a < b);           // Less than
    auto mask_le = (a <= b);          // Less than or equal
    auto mask_gt = (a > b);           // Greater than
    auto mask_ge = (a >= b);          // Greater than or equal
    
    // Select based on mask
    float_v<4> min_vals = float_v<4>::select(mask_lt, a, b);  // Element-wise min
    float_v<4> max_vals = float_v<4>::select(mask_gt, a, b);  // Element-wise max
    
    // Blend vectors based on mask
    float_v<4> blended = a.blend(b, mask_gt);  // Take from b where a > b
    
    return 0;
}

Mathematical Functions

Comprehensive math operations with automatic SIMD optimization:

#include "simd/simd.hpp"
#include <iostream>
#include <cmath>

int main()
{
    using namespace vector_simd;
    
    float_v<4> a{1.0f, 4.0f, 9.0f, 16.0f};
    float_v<4> b{-2.5f, 3.7f, -1.2f, 0.0f};
    
    // Basic math functions
    float_v<4> abs_val = a.abs();         // Absolute value
    float_v<4> sqrt_val = a.sqrt();       // Square root
    
    // Trigonometric functions
    float_v<4> angles{0.0f, 1.5708f, 3.14159f, 4.71239f};
    float_v<4> sin_val = angles.sin();    // Sine
    float_v<4> cos_val = angles.cos();    // Cosine
    float_v<4> tan_val = angles.tan();    // Tangent
    
    // Exponential and logarithmic
    float_v<4> exp_val = a.exp();         // e^x
    float_v<4> log_val = a.log();         // Natural log
    
    // Rounding functions
    float_v<4> round_vals{1.2f, 3.7f, -2.3f, -4.8f};
    float_v<4> floor_val = round_vals.floor();   // Floor
    float_v<4> ceil_val = round_vals.ceil();     // Ceiling
    float_v<4> round_val = round_vals.round();   // Round
    float_v<4> trunc_val = round_vals.trunc();   // Truncate
    
    // Reciprocal and reciprocal square root (approximate, fast)
    float_v<4> rcp_val = a.rcp();         // 1/x
    float_v<4> rsqrt_val = a.rsqrt();     // 1/sqrt(x)
    
    // Fused multiply-add: a * b + c
    float_v<4> c{1.0f, 1.0f, 1.0f, 1.0f};
    float_v<4> fmadd_result = a.fmadd(b, c);
    
    // Fused multiply-subtract: a * b - c
    float_v<4> fmsub_result = a.fmadd(b, c);
    
    // Min and max
    float_v<4> min_val = a.min(b);
    float_v<4> max_val = a.max(b);
    
    // Clamp to range
    float_v<4> lo{0.0f, 0.0f, 0.0f, 0.0f};
    float_v<4> hi{10.0f, 10.0f, 10.0f, 10.0f};
    float_v<4> clamped = b.clamp(lo, hi);
    
    return 0;
}

Memory Operations

Efficient memory access patterns:

#include "simd/simd.hpp"
#include <iostream>

int main()
{
    using namespace vector_simd;
    
    alignas(64) float data[16] = {
        1.0f, 2.0f, 3.0f, 4.0f,
        5.0f, 6.0f, 7.0f, 8.0f,
        9.0f, 10.0f, 11.0f, 12.0f,
        13.0f, 14.0f, 15.0f, 16.0f
    };
    
    // Aligned load (fastest, requires aligned address)
    float_v<4> v1 = float_v<4>::load_aligned(data);
    
    // Unaligned load (works with any address)
    float_v<4> v2 = float_v<4>::load_unaligned(data + 1);
    
    // Aligned store
    alignas(64) float result[4];
    v1.store_aligned(result);
    
    // Unaligned store
    float unaligned_result[4];
    v2.store_unaligned(unaligned_result);
    
    // Non-temporal store (bypasses cache, useful for write-once data)
    alignas(64) float nt_buffer[4];
    v1.store_nt(nt_buffer);
    
    // Gather: load non-contiguous elements
    int32_v<4> indices{0, 2, 4, 6};
    float_v<4> gathered = float_v<4>::gather(data, indices);
    
    // Scatter: store to non-contiguous locations
    float output[16] = {0};
    gathered.scatter(output, indices);
    
    // Prefetch data into cache
    float_v<4>::prefetch(data + 8);
    
    return 0;
}

Type Conversions

Convert between different vector types:

#include "simd/simd.hpp"
#include <iostream>

int main()
{
    using namespace vector_simd;
    
    // Integer vectors
    int32_v<4> ints{1, 2, 3, 4};
    
    // Convert to float
    float_v<4> floats = ints.convert<float>();
    
    // Convert to double
    double_v<4> doubles = ints.convert<double>();
    
    // Different integer widths
    int16_v<8> shorts{1, 2, 3, 4, 5, 6, 7, 8};
    int32_v<8> expanded = shorts.convert<int32_t>();
    
    // Saturation arithmetic (prevents overflow)
    uint8_v<16> a{200, 200, 200, 200, 200, 200, 200, 200,
                  200, 200, 200, 200, 200, 200, 200, 200};
    uint8_v<16> b{100, 100, 100, 100, 100, 100, 100, 100,
                  100, 100, 100, 100, 100, 100, 100, 100};
    
    uint8_v<16> sat_add = a.add_sat(b);  // Saturates at 255
    uint8_v<16> sat_sub = b.sub_sat(a);  // Saturates at 0
    
    return 0;
}

Horizontal Operations

Reduce vector elements to scalar values:

#include "simd/simd.hpp"
#include <iostream>

int main()
{
    using namespace vector_simd;
    
    float_v<4> a{1.0f, 2.0f, 3.0f, 4.0f};
    
    // Horizontal sum: 1 + 2 + 3 + 4 = 10
    float sum = a.hsum();
    
    // Horizontal min: min(1, 2, 3, 4) = 1
    float min_val = a.hmin();
    
    // Horizontal max: max(1, 2, 3, 4) = 4
    float max_val = a.hmax();
    
    // Dot product
    float_v<4> b{1.0f, 1.0f, 1.0f, 1.0f};
    float dot = a.dot(b);
    
    // Reduce operations (alternative names)
    float reduce_sum = a.reduce_add();
    float reduce_min = a.reduce_min();
    float reduce_max = a.reduce_max();
    
    return 0;
}

Native Width Vectors

Use vectors sized for the best available instruction set:

#include "simd/simd.hpp"
#include <iostream>

int main()
{
    using namespace vector_simd;
    
    // These types automatically use the optimal width:
    // - SSE2: 4 floats (128-bit)
    // - AVX: 8 floats (256-bit)
    // - AVX-512: 16 floats (512-bit)
    
    float_vn a{1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f};
    float_vn b = a * 2.0f;
    
    // Process arrays in chunks of native width
    void process_array(float* data, size_t size)
    {
        size_t i = 0;
        
        // Process in native-width chunks
        for (; i + float_vn::size_value <= size; i += float_vn::size_value) {
            float_vn v = float_vn::load_aligned(data + i);
            v = v * 2.0f;
            v.store_aligned(data + i);
        }
        
        // Handle remaining elements
        for (; i < size; ++i) {
            data[i] *= 2.0f;
        }
    }
    
    return 0;
}

Supported Features

The library detects the following SIMD features:

Legacy SIMD

MMX
SSE, SSE2, SSE3, SSSE3
SSE4.1, SSE4.2

AVX Family

AVX
AVX2
FMA (Fused Multiply-Add)
F16C (Half-precision conversion)

AVX-512 Foundation and Extensions

AVX-512F (Foundation)
AVX-512CD (Conflict Detection)
AVX-512DQ (Doubleword and Quadword)
AVX-512BW (Byte and Word)
AVX-512VL (Vector Length)
AVX-512IFMA (Integer FMA)
AVX-512VBMI, VBMI2 (Vector Byte Manipulation)
AVX-512VNNI (Neural Network)
AVX-512BITALG (Bit Algorithms)
AVX-512VPOPCNTDQ (Vector Population Count)
AVX-512VP2INTERSECT
AVX-512BF16 (BFloat16)
AVX-512FP16 (Float16)
AVX-512_4VNNIW, AVX-512_4FMAPS

Intel AMX (Advanced Matrix Extensions)

AMX_TILE
AMX_INT8
AMX_BF16

Cryptographic Extensions

AES, VAES
PCLMULQDQ, VPCLMULQDQ
SHA

Bit Manipulation

POPCNT (Population Count)
LZCNT (Leading Zero Count)
BMI1, BMI2 (Bit Manipulation Instructions)

Other Extensions

MOVBE (Move Byte Swap)
RDRND, RDSEED (Random Number Generation)
ADX (Multi-Precision Add-Carry)
PREFETCHW, PREFETCHWT1
GFNI (Galois Field)
RDPID (Read Processor ID)
SGX (Software Guard Extensions)
CET_IBT, CET_SS (Control-flow Enforcement Technology)

Integration

Using as a CMake Subdirectory

Add this to your CMakeLists.txt:

add_subdirectory(path/to/simd_feature_check)
target_link_libraries(your_target PRIVATE simd_feature_check::simd_feature_check)

Using as an Installed Package

After installing the library:

find_package(simd_feature_check REQUIRED)
target_link_libraries(your_target PRIVATE simd_feature_check::simd_feature_check)

Using Macros in Your Code

The library defines convenience macros for conditional compilation:

#include "simd/common.hpp"

#if SIMD_HAS_AVX2
    // AVX2-specific code
#endif

#if SIMD_HAS_AVX512F
    // AVX-512-specific code
#endif

// Alternative syntax (equivalent)
#if SIMD_AVX2
    // AVX2-specific code
#endif

Testing

Run the test suite:

cd build
ctest --output-on-failure

Build and run tests with verbose output:

ctest -V

Run specific test categories:

ctest -R simd_features    # Run feature detection tests
ctest -R vector_ops       # Run vector operation tests

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
bench		bench
cmake		cmake
docs		docs
examples		examples
include/simd		include/simd
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
test_consistency.cpp		test_consistency.cpp

Folders and files

Latest commit

History

Repository files navigation

simd_feature_check

Table of Contents

Overview

Features

Requirements

Compiler Support

Installation

Build Options

Library Architecture

Core Namespace (simd)

Vector SIMD Namespace (vector_simd)

Implementation Details

Usage Examples

Basic Feature Detection

Compile-Time Detection

Runtime Detection

FeatureDetector Class

Function Dispatch

Vector Operations

Mask Operations

Mathematical Functions

Memory Operations

Type Conversions

Horizontal Operations

Native Width Vectors

Supported Features

Legacy SIMD

AVX Family

AVX-512 Foundation and Extensions

Intel AMX (Advanced Matrix Extensions)

Cryptographic Extensions

Bit Manipulation

Other Extensions

Integration

Using as a CMake Subdirectory

Using as an Installed Package

Using Macros in Your Code

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages