Skip to content

Releases: SciSharp/NumSharp

NumSharp 0.41.0-prerelease

23 Mar 22:05

Choose a tag to compare

This prerelease introduces the IL Kernel Generator -
A complete architectural overhaul that replaces ~600K lines of Regen-generated template code with ~19K lines of runtime IL generation.
This delivers massive performance improvements, comprehensive NumPy 2.x alignment, and significantly cleaner maintainable code.

Installation

dotnet add package NumSharp --version 0.41.0-prerelease

Or via Package Manager:

Install-Package NumSharp -Version 0.41.0-prerelease

TL;DR

  • IL Kernel Generator: Runtime IL emission replaces 600K lines of Regen templates with 19K lines
  • SIMD everywhere: Vector128/256/512 with runtime detection across all operations
  • 35 new functions: nansum/prod/min/max/mean/var/std, cbrt, floor_divide, left/right_shift, deg2rad, rad2deg, cumprod, count_nonzero, isnan, isfinite, isinf, isclose, invert, reciprocal, square, trunc, plus comparison and logical modules
  • Operators fixed: ==, !=, <, >, <=, >=, &, |, ^
  • np.comparison module: np.equal(), np.not_equal(), np.less(), np.greater(), np.less_equal(), np.greater_equal()
  • np.logical module: np.logical_and(), np.logical_or(), np.logical_not(), np.logical_xor()
  • NDArray<T> operators: Typed &, |, ^ for generic arrays (resolves NDArray<bool> ambiguity)
  • Math functions rewritten: sin, cos, tan, exp, log, sqrt, abs, sign, floor, ceil, etc.
  • 60+ bug fixes: np.negative, np.positive, np.unique, np.dot, np.matmul, np.abs, np.argmax/min, np.mean, np.std/var, np.cumsum, np.nonzero, np.all/any, np.clip, and more
  • MatMul 35-100x faster: Cache-blocked SIMD achieving 20+ GFLOPS
  • Boolean indexing rewrite: SIMD fast path with CountTrue/CopyMasked
  • Axis reductions rewrite: AVX2 gather, NaN-aware, proper keepdims and empty array handling
  • Single-threaded execution: Deterministic, non-blocking (SIMD compensates for parallelism), Removed use of Parallel.*
  • Architecture cleanup: Broadcasting in Shape struct, TensorEngine routing, static ILKernelGenerator
  • np.random aligned (#582): Parameter names match NumPy, Shape overloads added
  • DecimalMath internalized (#588): Removed embedded third-party code
  • NEP50 compliant: NumPy 2.x type promotion rules
  • Benchmark infrastructure: SIMD vs scalar comparison suite
  • DefaultEngine dispatch layer: BinaryOp, BitwiseOp, CompareOp, ReductionOp, UnaryOp
  • +4,200 unit tests, our own and migrated from python/numpy to C#.

Contents

Section Highlights
Summary 106 commits, -533K lines, 3,907 tests
IL Kernel Generator 27 files, SIMD V128/256/512
Architecture Static ILKernelGenerator, TensorEngine routing
New NumPy Functions (35) nansum, isnan, cumprod, etc.
Critical Bug Fixes negative, unique, dot, linspace, intp
Operator Rewrites ==, !=, <, >, &, | now work
Boolean Indexing Rewrite SIMD fast path, 76 battle tests
Slicing Improvements Broadcast stride=0 preserved
Performance Improvements MatMul 35-100x, 20+ GFLOPS
Code Reduction 99% binary, 98% MatMul, 97% Dot
Infrastructure Changes NativeMemory, static kernels
API Alignment random() params aligned with NumPy
New Test Files (68) 34 kernel, 8 NumPy, 4 linalg, 76 boolean
Known Issues 52 OpenBugs excluded
Installation dotnet add package NumSharp

Summary

Metric Value
Commits 106
Files Changed 558
Lines Added +72,635
Lines Deleted -605,976
Net Change -533K lines
Test Results 3,907 passed, 52 OpenBugs, 11 skipped

Detailed Breakdown

Read More

IL Kernel Generator

Runtime IL generation via System.Reflection.Emit.DynamicMethod replaces static Regen templates.

Kernel Files (27 new files)

  • ILKernelGenerator.cs - Core infrastructure, SIMD detection (Vector128/256/512)
  • ILKernelGenerator.Binary.cs - Add, Sub, Mul, Div, BitwiseAnd/Or/Xor
  • ILKernelGenerator.MixedType.cs - Mixed-type ops with type promotion
  • ILKernelGenerator.Unary.cs - Negate, Abs, Sqrt, Sin, Cos, Exp, Log, Sign
  • ILKernelGenerator.Comparison.cs - ==, !=, <, >, <=, >= returning bool arrays
  • ILKernelGenerator.Reduction.cs - Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, Any
  • ILKernelGenerator.Reduction.Axis.Simd.cs - AVX2 gather for axis reductions
  • ILKernelGenerator.Scan.cs - CumSum, CumProd with SIMD
  • ILKernelGenerator.Shift.cs - LeftShift, RightShift
  • ILKernelGenerator.MatMul.cs - Cache-blocked SIMD matrix multiply
  • ILKernelGenerator.Clip.cs, .Modf.cs, .Masking.cs - Specialized ops

Execution Paths

  1. SimdFull - Contiguous + SIMD-capable dtype → Vector loop + scalar tail
  2. ScalarFull - Contiguous + non-SIMD dtype (Decimal) → Scalar loop
  3. General - Strided/broadcast → Coordinate-based iteration

Infrastructure

  • KernelKey.cs, KernelOp.cs, KernelSignatures.cs - Kernel dispatch
  • SimdMatMul.cs - SIMD matrix multiplication helpers
  • TypeRules.cs - NEP50 type promotion rules

Architecture

Clean separation of concerns:

Component Design
ILKernelGenerator Static class (27 partial files), internal to DefaultEngine
TensorEngine All np.* ops route through abstract methods
Shape.Broadcasting Pure shape math in Shape struct (456 lines)
ArgMin/ArgMax Unified IL kernel with NaN-aware + Boolean semantics
DecimalMath Internal utility (~403 lines) for Sqrt, Pow, ATan2, Exp, Log

Single-Threaded Execution

All computation is single-threaded with no Parallel.For usage. This provides:

  • Deterministic behavior - Same inputs always produce same outputs in same order
  • Non-blocking execution - No thread synchronization overhead
  • Simplified debugging - Stack traces are straightforward
  • SIMD compensation - Vector128/256/512 intrinsics provide parallelism at the CPU level

Broadcasting External to Engine

Broadcasting logic (Shape.Broadcasting.cs) is pure shape math with no engine dependencies:

  • Shape.AreBroadcastable() - Check if shapes can broadcast
  • Shape.Broadcast() - Compute broadcast result shape and strides
  • Shape.ResolveReturnShape() - Determine output shape for operations
  • DefaultEngine delegates all broadcasting to Shape.* methods

DecimalMath (#588)

Replaced embedded third-party DecimalEx.cs (~1061 lines) with minimal internal DecimalMath.cs (~403 lines) containing only the functions NumSharp actually uses: Sqrt, Pow, ATan2, Exp, Log, Log10, ATan.

TensorEngine Abstract Methods

Compare, NotEqual, Less, LessEqual, Greater, GreaterEqual, BitwiseAnd, BitwiseOr, BitwiseXor, LeftShift, RightShift, Power(NDArray, NDArray), FloorDivide, Truncate, Reciprocal, Square, Cbrt, Invert, Deg2Rad, Rad2Deg, IsInf, ReduceCumMul, Any, NanSum, NanProd, NanMin, NanMax, BooleanMask

DefaultEngine Dispatch Files (IL kernel integration)

File Functions
DefaultEngine.BinaryOp.cs np.add, np.subtract, np.multiply, np.divide, np.mod, np.power
DefaultEngine.BitwiseOp.cs np.bitwise_and, np.bitwise_or, np.bitwise_xor, &, |, ^
DefaultEngine.CompareOp.cs np.equal, np.not_equal, np.less, np.greater, np.less_equal, np.greater_equal
DefaultEngine.ReductionOp.cs np.sum, np.prod, np.min, np.max, np.mean, np.std, np.var, np.argmax, np.argmin
DefaultEngine.UnaryOp.cs np.abs, np.negative, np.sqrt, np.sin, np.cos, np.exp, np.log, np.sign, etc.

Implementation Files

Default.Any.cs, Default.BooleanMask.cs, Default.Reduction.Nan.cs, Shape.Broadcasting.cs


New NumPy Functions (35)

NaN-Aware Reductions (7)

Function Description
np.nansum Sum ignoring NaN
np.nanprod Product ignoring NaN
np.nanmin Minimum ignoring NaN
np.nanmax Maximum ignoring NaN
np.nanmean Mean ignoring NaN
np.nanvar Variance ignoring NaN
np.nanstd Standard deviation ignoring NaN

Math Operations (8)

Function Description
np.cbrt Cube root
np.floor_divide Integer division
np.reciprocal Element-wise 1/x
np.trunc Truncate to integer
np.invert Bitwise NOT
np.square Element-wise square
np.cumprod Cumulative product
np.count_nonzero Count non-zero elements

Bitwise & Trigonometric (4)

Function Description
np.left_shift Bitwise left shift
np.right_shift Bitwise right shift
np.deg2rad Degrees to radians
np.rad2deg Radians to degrees

Logic & Validation (4) - Previously returned null

Function Description
np.isnan Test element-wise for NaN
np.isfinite Test element-wise for finiteness
np.isinf Test element-wise for infinity
np.isclose Element-wise comparison within tolerance

Operators (2) - Previously returned null

Operator Description
operator & Bitwise/logical AND with broadcasting
operator | Bitwise/logical OR with broadcasting

Comparison Functions (6) - New named AP...

Read more

v0.4.0-alpha1

14 Feb 10:14

Choose a tag to compare

v0.4.0-alpha1 Pre-release
Pre-release

NumSharp v0.4.0-alpha1

See #538 for information.

NuGet

No nuget release this preview version.

What's Changed

  • Enabled NDArray boolean comparisons for LessThan, GreaterThan, and … by @Rikki-Tavi in #395
  • Added data types in np.frombuffer. in #425
  • F# in README by @dsyme in #432
  • Added support for user defined decimal precision for np.around() and TensorEngine.Round() by @shashi4u in #453
  • NumSharp.Bitmap support for odd sized bitmaps with odd sized bytes per pixel by @AmbachtIT in #460
  • Fixing the consistency of seed in the random choice. by @bojake in #489
  • (Logics):add high performance logical AND function with axis an… by @zhuoshui-AI in #525
  • Upgrade target frameworks to net8.0;net10.0 by @Nucs in #532
  • Add GitHub Actions CI/CD pipeline by @Nucs in #534
  • Fix: skip Bitmap tests on non-Windows CI by @Nucs in #535
  • docs: relocate website to docs/website/ by @Nucs in #557
  • docs: move docfx_project to docs/website-src by @Nucs in #558
  • feat(docs): upgrade to DocFX v2 modern template by @Nucs in #562

New Contributors

Many of the contributer's merges were piggybacked by this release and was probably not entirely intentional.

Full Changelog: 0.20.5...v0.4.0-alpha1

v0.20.5

31 Dec 16:34

Choose a tag to compare

  • NDArray.Indexing: Rewrite of the getter mechanism, NDArray getter now supports combining 'NDArray, Slice, string, int, bool' in the same slice.
  • NDArray.Indexing: Added support for indexing with unmanaged array of indices: ndarray[int* pointer, int length], nd.GetData(int*, int), etc..
  • NDArray.Broadcasting: fixed multiple issues.
  • NDArray.Slicing: Added support for slicing a broadcasted NDArray.
  • Added NPTypeCode.Float as an alias to NPTypeCode.Single
  • Extending NPY and fixing NPZ (Thanks Matthew Moloney)
  • Added NDArray.AsOrMakeGeneric()
  • Added np.nonzero. np.maximum, np.minimum, np.all, np.any
  • Arrays.cs: perf-optted Arrays.Slice
  • NDArray.FromMultiDimArray: Fixed #367
  • np.clip: Added @out argument
  • Added np.array(IEnumerable) and np.array(IEnumerable, int size) which is faster.
  • np.broadcast_to: added additional overloads.

v0.20.4

05 Oct 16:06

Choose a tag to compare

Changes

  • Added np.transpose, np.swapaxes, ndarray.T, np.moveaxis, np.rollaxis, np.size, np.copyto.
  • Added np.ceil, np.arccos, np.floor, np.modf, np.square, np.round, np.sign, np.arcsin, np.arctan.
  • Added np.random.*: beta, gamma, bernoulli, binomial, lognormal, normal, poisson, chisquare, geometric.
  • Added support for np.newaxis, ... (ellipsis) in a slice.
  • Performance optimization for np.array, np.linspace, Randomizer class and all np.random.* methods.

Bug Fixes

  • ndarray.view copying when it shouldn't.
  • couple of ambiguous methods

Obsoletion

  • nd.Unsafe.Shape is now obsolete in favor of nd.Shape.

Special thanks to @henon and @deepakkumar1984 for a PRing great portion of this release.

v0.20.3

28 Sep 14:57

Choose a tag to compare

Breaking Changes

  • NumSharp.Backends.NPTypeCode moved to NumSharp.NPTypeCode.

v0.10-slice

28 Jul 12:37

Choose a tag to compare

release signed assembly v0.10.6.

v0.7 works with TensorFlow.NET

31 Jan 04:04

Choose a tag to compare

v0.7-tensorflow

Merge branch 'master' of https://github.com/Oceania2018/NumSharp

v0.6 Supports LAPACK

22 Dec 14:45
727fd41

Choose a tag to compare

Merge pull request #162 from dotChris90/master

Extend doc and generated new API docs

v0.5-dtype

05 Dec 02:38

Choose a tag to compare

release v0.5

v0.4

22 Nov 02:54

Choose a tag to compare

released v0.4