A modern C++20 library for CPU feature detection and SIMD vector operations. This library provides compile-time and runtime detection of SIMD instruction sets (SSE, AVX, AVX-512, AMX) along with a high-level vector abstraction layer for writing portable SIMD code.
- Overview
- Features
- Requirements
- Installation
- Build Options
- Library Architecture
- Usage Examples
- Supported Features
- Integration
- Testing
- License
simd_feature_check solves two fundamental challenges in SIMD programming:
- Feature Detection: Determining which SIMD instruction sets are available on the target CPU, both at compile time and runtime
- Vector Abstraction: Writing portable SIMD code that automatically adapts to the best available instruction set
The library uses CPUID instructions on x86/x86_64 to detect processor capabilities and provides a clean API for querying supported features. The vector abstraction layer automatically selects the optimal implementation based on detected capabilities.
- Comprehensive detection of 50+ SIMD features including SSE, AVX, AVX-512, and AMX
- Compile-time detection via template metaprogramming
- Runtime detection using CPUID instructions
- Feature-to-string conversion for logging and debugging
- Highest feature detection for capability reporting
- Vector abstraction supporting multiple data types and sizes
- Automatic dispatch to optimal SIMD implementations
- Cross-platform support with architecture detection
- Modern C++20 with concepts for type safety
- Zero-overhead abstractions
- CMake 3.16 or newer
- C++20 compatible compiler (GCC 10+, Clang 10+, MSVC 2019+)
- Git
| Compiler | Minimum Version | Notes |
|---|---|---|
| GCC | 10.0 | Full support |
| Clang | 10.0 | Full support |
| MSVC | 2019 (16.8) | Full support |
Clone the repository:
git clone https://github.com/hun756/CPP-Starter-Template.git simd_feature_check
cd simd_feature_checkCreate a build directory and configure:
mkdir build && cd build
cmake ..Build the library:
cmake --build .Run tests:
ctestThe following CMake options are available:
| Option | Default | Description |
|---|---|---|
| BUILD_SHARED_LIBS | OFF | Build shared libraries instead of static |
| BUILD_EXAMPLES | ON | Build example programs |
| BUILD_TESTS | ON | Build and enable tests |
| BUILD_BENCHMARKS | OFF | Build benchmarking programs |
| ENABLE_COVERAGE | OFF | Enable code coverage reporting |
| ENABLE_SANITIZERS | OFF | Enable sanitizers in debug builds |
| ENABLE_PCH | OFF | Enable precompiled headers |
| ENABLE_LTO | OFF | Enable Link Time Optimization |
Example configuration with sanitizers enabled:
cmake .. -DENABLE_SANITIZERS=ON -DBUILD_EXAMPLES=ONThe library is organized into several namespaces and components:
The simd namespace contains feature detection functionality:
simd::Feature- Enum class listing all detectable featuressimd::has_feature()- Runtime feature checksimd::compile_time::- Compile-time feature detectionsimd::runtime::- Runtime feature detectionsimd::FeatureDetector<T>- Template class for feature introspection
The vector_simd namespace provides vector abstractions:
Vector<T, N>- Fixed-size SIMD vector classMask<T, N>- Mask type for predicate operations- Type aliases like
float_v<4>,int32_v<8>, etc. - Native-width types like
float_vnfor optimal vector width
The library uses a layered architecture:
simd/common.hpp- Architecture and compiler detection macrossimd/feature_check.hpp- Core feature detection implementationsimd/registers/types.hpp- Register type mappings for each ISAsimd/vector/vector.hpp- High-level vector abstractionsimd/impl/- Architecture-specific implementations
The simplest way to check for SIMD support:
#include "simd/feature_check.hpp"
#include <iostream>
int main()
{
// Check for specific features
if (simd::has_feature(simd::Feature::SSE)) {
std::cout << "SSE is supported" << std::endl;
}
if (simd::has_feature(simd::Feature::AVX)) {
std::cout << "AVX is supported" << std::endl;
}
if (simd::has_feature(simd::Feature::AVX2)) {
std::cout << "AVX2 is supported" << std::endl;
}
if (simd::has_feature(simd::Feature::AVX512F)) {
std::cout << "AVX-512 Foundation is supported" << std::endl;
}
// Get CPU vendor string
std::cout << "CPU Vendor: " << simd::get_cpu_vendor() << std::endl;
// Get the highest supported SIMD feature
simd::Feature highest = simd::highest_feature();
std::cout << "Highest SIMD feature: "
<< simd::feature_to_string(highest) << std::endl;
return 0;
}When you need to know features at compile time for conditional compilation:
#include "simd/feature_check.hpp"
#include <iostream>
int main()
{
// Check compile-time availability using boolean constants
std::cout << "SSE compile-time: "
<< (simd::compile_time::sse ? "Yes" : "No") << std::endl;
std::cout << "AVX compile-time: "
<< (simd::compile_time::avx ? "Yes" : "No") << std::endl;
std::cout << "AVX2 compile-time: "
<< (simd::compile_time::avx2 ? "Yes" : "No") << std::endl;
// Template-based compile-time checks
std::cout << "SSE2 available: "
<< (simd::compile_time::has<simd::Feature::SSE2>() ? "Yes" : "No")
<< std::endl;
std::cout << "AVX-512F available: "
<< (simd::compile_time::has<simd::Feature::AVX512F>() ? "Yes" : "No")
<< std::endl;
// Get maximum compile-time feature
std::cout << "Max compile-time feature: "
<< simd::feature_to_string(simd::compile_time::max_feature)
<< std::endl;
return 0;
}Using compile-time detection for conditional compilation:
#include "simd/feature_check.hpp"
void process_data(float* data, size_t size)
{
if constexpr (simd::compile_time::has<simd::Feature::AVX512F>()) {
// This code is only compiled if AVX-512F is available
// Compiler will use AVX-512 intrinsics here
process_avx512(data, size);
}
else if constexpr (simd::compile_time::has<simd::Feature::AVX2>()) {
// Fallback to AVX2
process_avx2(data, size);
}
else {
// Scalar fallback
process_scalar(data, size);
}
}For runtime checks that adapt to the executing CPU:
#include "simd/feature_check.hpp"
#include <iostream>
int main()
{
// Runtime checks using template syntax
std::cout << "SSE runtime: "
<< (simd::runtime::has<simd::Feature::SSE>() ? "Yes" : "No")
<< std::endl;
std::cout << "AVX runtime: "
<< (simd::runtime::has<simd::Feature::AVX>() ? "Yes" : "No")
<< std::endl;
std::cout << "AVX2 runtime: "
<< (simd::runtime::has<simd::Feature::AVX2>() ? "Yes" : "No")
<< std::endl;
// Get highest runtime feature
std::cout << "Highest runtime feature: "
<< simd::feature_to_string(simd::runtime::highest_feature())
<< std::endl;
return 0;
}The FeatureDetector template provides detailed introspection for any feature:
#include "simd/feature_check.hpp"
#include <iostream>
int main()
{
// Create detectors for specific features
using AVXDetector = simd::FeatureDetector<simd::Feature::AVX>;
using AVX2Detector = simd::FeatureDetector<simd::Feature::AVX2>;
using AVX512Detector = simd::FeatureDetector<simd::Feature::AVX512F>;
// AVX feature information
std::cout << "AVX Feature:" << std::endl;
std::cout << " Name: " << AVXDetector::name() << std::endl;
std::cout << " Compile-time support: "
<< (AVXDetector::compile_time ? "Yes" : "No") << std::endl;
std::cout << " Runtime support: "
<< (AVXDetector::available() ? "Yes" : "No") << std::endl;
// AVX2 feature information
std::cout << "AVX2 Feature:" << std::endl;
std::cout << " Name: " << AVX2Detector::name() << std::endl;
std::cout << " Compile-time support: "
<< (AVX2Detector::compile_time ? "Yes" : "No") << std::endl;
std::cout << " Runtime support: "
<< (AVX2Detector::available() ? "Yes" : "No") << std::endl;
// AVX-512F feature information
std::cout << "AVX-512F Feature:" << std::endl;
std::cout << " Name: " << AVX512Detector::name() << std::endl;
std::cout << " Compile-time support: "
<< (AVX512Detector::compile_time ? "Yes" : "No") << std::endl;
std::cout << " Runtime support: "
<< (AVX512Detector::available() ? "Yes" : "No") << std::endl;
return 0;
}Manual dispatch based on detected features:
#include "simd/feature_check.hpp"
#include <iostream>
// Scalar implementation (always available)
float* add_vectors_scalar(const float* a, const float* b, float* result,
size_t size)
{
for (size_t i = 0; i < size; ++i) {
result[i] = a[i] + b[i];
}
return result;
}
// AVX implementation (conditionally compiled)
#if SIMD_HAS_AVX
float* add_vectors_avx(const float* a, const float* b, float* result,
size_t size)
{
// AVX-optimized implementation
for (size_t i = 0; i < size; ++i) {
result[i] = a[i] + b[i];
}
return result;
}
#endif
// AVX-512 implementation (conditionally compiled)
#if SIMD_HAS_AVX512F
float* add_vectors_avx512(const float* a, const float* b, float* result,
size_t size)
{
// AVX-512 optimized implementation
for (size_t i = 0; i < size; ++i) {
result[i] = a[i] + b[i];
}
return result;
}
#endif
int main()
{
using AddFunc = float* (*)(const float*, const float*, float*, size_t);
AddFunc best_impl;
// Select best implementation at runtime
if (simd::has_feature(simd::Feature::AVX512F)) {
#if SIMD_HAS_AVX512F
best_impl = add_vectors_avx512;
#else
best_impl = add_vectors_scalar;
#endif
}
else if (simd::has_feature(simd::Feature::AVX)) {
#if SIMD_HAS_AVX
best_impl = add_vectors_avx;
#else
best_impl = add_vectors_scalar;
#endif
}
else {
best_impl = add_vectors_scalar;
}
// Use the selected implementation
float a[4] = {1.0f, 2.0f, 3.0f, 4.0f};
float b[4] = {5.0f, 6.0f, 7.0f, 8.0f};
float result[4];
best_impl(a, b, result, 4);
std::cout << "Result: [" << result[0] << ", " << result[1]
<< ", " << result[2] << ", " << result[3] << "]" << std::endl;
return 0;
}The library provides a high-level vector abstraction that automatically uses the best available SIMD instructions:
#include "simd/simd.hpp"
#include <iostream>
int main()
{
using namespace vector_simd;
// Create vectors with fixed size
float_v<4> a{1.0f, 2.0f, 3.0f, 4.0f};
float_v<4> b{5.0f, 6.0f, 7.0f, 8.0f};
// Basic arithmetic operations
float_v<4> sum = a + b; // Element-wise addition
float_v<4> diff = a - b; // Element-wise subtraction
float_v<4> prod = a * b; // Element-wise multiplication
float_v<4> quot = b / a; // Element-wise division
// Compound assignment operators
float_v<4> c{0.0f, 0.0f, 0.0f, 0.0f};
c += a;
c *= b;
// Extract and insert elements
float val = sum.extract(0); // Get first element
sum.insert(0, 10.0f); // Set first element
// Store results to memory
float result[4];
sum.store(result); // Unaligned store
sum.store_aligned(result); // Aligned store (faster)
// Load from memory
float_v<4> loaded = float_v<4>::load(result);
float_v<4> aligned = float_v<4>::load_aligned(result);
// Convert to std::array
std::array<float, 4> arr = sum.to_array();
return 0;
}Masks enable conditional operations and predication:
#include "simd/simd.hpp"
#include <iostream>
int main()
{
using namespace vector_simd;
float_v<4> a{1.0f, 5.0f, 3.0f, 8.0f};
float_v<4> b{4.0f, 2.0f, 6.0f, 7.0f};
// Comparison operations return masks
auto mask_eq = (a == b); // Equal comparison
auto mask_ne = (a != b); // Not equal
auto mask_lt = (a < b); // Less than
auto mask_le = (a <= b); // Less than or equal
auto mask_gt = (a > b); // Greater than
auto mask_ge = (a >= b); // Greater than or equal
// Select based on mask
float_v<4> min_vals = float_v<4>::select(mask_lt, a, b); // Element-wise min
float_v<4> max_vals = float_v<4>::select(mask_gt, a, b); // Element-wise max
// Blend vectors based on mask
float_v<4> blended = a.blend(b, mask_gt); // Take from b where a > b
return 0;
}Comprehensive math operations with automatic SIMD optimization:
#include "simd/simd.hpp"
#include <iostream>
#include <cmath>
int main()
{
using namespace vector_simd;
float_v<4> a{1.0f, 4.0f, 9.0f, 16.0f};
float_v<4> b{-2.5f, 3.7f, -1.2f, 0.0f};
// Basic math functions
float_v<4> abs_val = a.abs(); // Absolute value
float_v<4> sqrt_val = a.sqrt(); // Square root
// Trigonometric functions
float_v<4> angles{0.0f, 1.5708f, 3.14159f, 4.71239f};
float_v<4> sin_val = angles.sin(); // Sine
float_v<4> cos_val = angles.cos(); // Cosine
float_v<4> tan_val = angles.tan(); // Tangent
// Exponential and logarithmic
float_v<4> exp_val = a.exp(); // e^x
float_v<4> log_val = a.log(); // Natural log
// Rounding functions
float_v<4> round_vals{1.2f, 3.7f, -2.3f, -4.8f};
float_v<4> floor_val = round_vals.floor(); // Floor
float_v<4> ceil_val = round_vals.ceil(); // Ceiling
float_v<4> round_val = round_vals.round(); // Round
float_v<4> trunc_val = round_vals.trunc(); // Truncate
// Reciprocal and reciprocal square root (approximate, fast)
float_v<4> rcp_val = a.rcp(); // 1/x
float_v<4> rsqrt_val = a.rsqrt(); // 1/sqrt(x)
// Fused multiply-add: a * b + c
float_v<4> c{1.0f, 1.0f, 1.0f, 1.0f};
float_v<4> fmadd_result = a.fmadd(b, c);
// Fused multiply-subtract: a * b - c
float_v<4> fmsub_result = a.fmadd(b, c);
// Min and max
float_v<4> min_val = a.min(b);
float_v<4> max_val = a.max(b);
// Clamp to range
float_v<4> lo{0.0f, 0.0f, 0.0f, 0.0f};
float_v<4> hi{10.0f, 10.0f, 10.0f, 10.0f};
float_v<4> clamped = b.clamp(lo, hi);
return 0;
}Efficient memory access patterns:
#include "simd/simd.hpp"
#include <iostream>
int main()
{
using namespace vector_simd;
alignas(64) float data[16] = {
1.0f, 2.0f, 3.0f, 4.0f,
5.0f, 6.0f, 7.0f, 8.0f,
9.0f, 10.0f, 11.0f, 12.0f,
13.0f, 14.0f, 15.0f, 16.0f
};
// Aligned load (fastest, requires aligned address)
float_v<4> v1 = float_v<4>::load_aligned(data);
// Unaligned load (works with any address)
float_v<4> v2 = float_v<4>::load_unaligned(data + 1);
// Aligned store
alignas(64) float result[4];
v1.store_aligned(result);
// Unaligned store
float unaligned_result[4];
v2.store_unaligned(unaligned_result);
// Non-temporal store (bypasses cache, useful for write-once data)
alignas(64) float nt_buffer[4];
v1.store_nt(nt_buffer);
// Gather: load non-contiguous elements
int32_v<4> indices{0, 2, 4, 6};
float_v<4> gathered = float_v<4>::gather(data, indices);
// Scatter: store to non-contiguous locations
float output[16] = {0};
gathered.scatter(output, indices);
// Prefetch data into cache
float_v<4>::prefetch(data + 8);
return 0;
}Convert between different vector types:
#include "simd/simd.hpp"
#include <iostream>
int main()
{
using namespace vector_simd;
// Integer vectors
int32_v<4> ints{1, 2, 3, 4};
// Convert to float
float_v<4> floats = ints.convert<float>();
// Convert to double
double_v<4> doubles = ints.convert<double>();
// Different integer widths
int16_v<8> shorts{1, 2, 3, 4, 5, 6, 7, 8};
int32_v<8> expanded = shorts.convert<int32_t>();
// Saturation arithmetic (prevents overflow)
uint8_v<16> a{200, 200, 200, 200, 200, 200, 200, 200,
200, 200, 200, 200, 200, 200, 200, 200};
uint8_v<16> b{100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100};
uint8_v<16> sat_add = a.add_sat(b); // Saturates at 255
uint8_v<16> sat_sub = b.sub_sat(a); // Saturates at 0
return 0;
}Reduce vector elements to scalar values:
#include "simd/simd.hpp"
#include <iostream>
int main()
{
using namespace vector_simd;
float_v<4> a{1.0f, 2.0f, 3.0f, 4.0f};
// Horizontal sum: 1 + 2 + 3 + 4 = 10
float sum = a.hsum();
// Horizontal min: min(1, 2, 3, 4) = 1
float min_val = a.hmin();
// Horizontal max: max(1, 2, 3, 4) = 4
float max_val = a.hmax();
// Dot product
float_v<4> b{1.0f, 1.0f, 1.0f, 1.0f};
float dot = a.dot(b);
// Reduce operations (alternative names)
float reduce_sum = a.reduce_add();
float reduce_min = a.reduce_min();
float reduce_max = a.reduce_max();
return 0;
}Use vectors sized for the best available instruction set:
#include "simd/simd.hpp"
#include <iostream>
int main()
{
using namespace vector_simd;
// These types automatically use the optimal width:
// - SSE2: 4 floats (128-bit)
// - AVX: 8 floats (256-bit)
// - AVX-512: 16 floats (512-bit)
float_vn a{1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f};
float_vn b = a * 2.0f;
// Process arrays in chunks of native width
void process_array(float* data, size_t size)
{
size_t i = 0;
// Process in native-width chunks
for (; i + float_vn::size_value <= size; i += float_vn::size_value) {
float_vn v = float_vn::load_aligned(data + i);
v = v * 2.0f;
v.store_aligned(data + i);
}
// Handle remaining elements
for (; i < size; ++i) {
data[i] *= 2.0f;
}
}
return 0;
}The library detects the following SIMD features:
- MMX
- SSE, SSE2, SSE3, SSSE3
- SSE4.1, SSE4.2
- AVX
- AVX2
- FMA (Fused Multiply-Add)
- F16C (Half-precision conversion)
- AVX-512F (Foundation)
- AVX-512CD (Conflict Detection)
- AVX-512DQ (Doubleword and Quadword)
- AVX-512BW (Byte and Word)
- AVX-512VL (Vector Length)
- AVX-512IFMA (Integer FMA)
- AVX-512VBMI, VBMI2 (Vector Byte Manipulation)
- AVX-512VNNI (Neural Network)
- AVX-512BITALG (Bit Algorithms)
- AVX-512VPOPCNTDQ (Vector Population Count)
- AVX-512VP2INTERSECT
- AVX-512BF16 (BFloat16)
- AVX-512FP16 (Float16)
- AVX-512_4VNNIW, AVX-512_4FMAPS
- AMX_TILE
- AMX_INT8
- AMX_BF16
- AES, VAES
- PCLMULQDQ, VPCLMULQDQ
- SHA
- POPCNT (Population Count)
- LZCNT (Leading Zero Count)
- BMI1, BMI2 (Bit Manipulation Instructions)
- MOVBE (Move Byte Swap)
- RDRND, RDSEED (Random Number Generation)
- ADX (Multi-Precision Add-Carry)
- PREFETCHW, PREFETCHWT1
- GFNI (Galois Field)
- RDPID (Read Processor ID)
- SGX (Software Guard Extensions)
- CET_IBT, CET_SS (Control-flow Enforcement Technology)
Add this to your CMakeLists.txt:
add_subdirectory(path/to/simd_feature_check)
target_link_libraries(your_target PRIVATE simd_feature_check::simd_feature_check)After installing the library:
find_package(simd_feature_check REQUIRED)
target_link_libraries(your_target PRIVATE simd_feature_check::simd_feature_check)The library defines convenience macros for conditional compilation:
#include "simd/common.hpp"
#if SIMD_HAS_AVX2
// AVX2-specific code
#endif
#if SIMD_HAS_AVX512F
// AVX-512-specific code
#endif
// Alternative syntax (equivalent)
#if SIMD_AVX2
// AVX2-specific code
#endifRun the test suite:
cd build
ctest --output-on-failureBuild and run tests with verbose output:
ctest -VRun specific test categories:
ctest -R simd_features # Run feature detection tests
ctest -R vector_ops # Run vector operation testsThis project is licensed under the MIT License. See the LICENSE file for details.