Releases · SciSharp/NumSharp

23 Mar 22:05

v0.41.0-prerelease

88fdcaf

Latest

This prerelease introduces the IL Kernel Generator -
A complete architectural overhaul that replaces ~600K lines of Regen-generated template code with ~19K lines of runtime IL generation.
This delivers massive performance improvements, comprehensive NumPy 2.x alignment, and significantly cleaner maintainable code.

Installation

dotnet add package NumSharp --version 0.41.0-prerelease

Or via Package Manager:

Install-Package NumSharp -Version 0.41.0-prerelease

TL;DR

IL Kernel Generator: Runtime IL emission replaces 600K lines of Regen templates with 19K lines
SIMD everywhere: Vector128/256/512 with runtime detection across all operations
35 new functions: nansum/prod/min/max/mean/var/std, cbrt, floor_divide, left/right_shift, deg2rad, rad2deg, cumprod, count_nonzero, isnan, isfinite, isinf, isclose, invert, reciprocal, square, trunc, plus comparison and logical modules
Operators fixed: ==, !=, <, >, <=, >=, &, |, ^
np.comparison module: np.equal(), np.not_equal(), np.less(), np.greater(), np.less_equal(), np.greater_equal()
np.logical module: np.logical_and(), np.logical_or(), np.logical_not(), np.logical_xor()
NDArray<T> operators: Typed &, |, ^ for generic arrays (resolves NDArray<bool> ambiguity)
Math functions rewritten: sin, cos, tan, exp, log, sqrt, abs, sign, floor, ceil, etc.
60+ bug fixes: np.negative, np.positive, np.unique, np.dot, np.matmul, np.abs, np.argmax/min, np.mean, np.std/var, np.cumsum, np.nonzero, np.all/any, np.clip, and more
MatMul 35-100x faster: Cache-blocked SIMD achieving 20+ GFLOPS
Boolean indexing rewrite: SIMD fast path with CountTrue/CopyMasked
Axis reductions rewrite: AVX2 gather, NaN-aware, proper keepdims and empty array handling
Single-threaded execution: Deterministic, non-blocking (SIMD compensates for parallelism), Removed use of Parallel.*
Architecture cleanup: Broadcasting in Shape struct, TensorEngine routing, static ILKernelGenerator
np.random aligned (#582): Parameter names match NumPy, Shape overloads added
DecimalMath internalized (#588): Removed embedded third-party code
NEP50 compliant: NumPy 2.x type promotion rules
Benchmark infrastructure: SIMD vs scalar comparison suite
DefaultEngine dispatch layer: BinaryOp, BitwiseOp, CompareOp, ReductionOp, UnaryOp
+4,200 unit tests, our own and migrated from python/numpy to C#.

Section	Highlights
Summary	106 commits, -533K lines, 3,907 tests
IL Kernel Generator	27 files, SIMD V128/256/512
Architecture	Static ILKernelGenerator, TensorEngine routing
New NumPy Functions (35)	nansum, isnan, cumprod, etc.
Critical Bug Fixes	negative, unique, dot, linspace, intp
Operator Rewrites	==, !=, <, >, &, \| now work
Boolean Indexing Rewrite	SIMD fast path, 76 battle tests
Slicing Improvements	Broadcast stride=0 preserved
Performance Improvements	MatMul 35-100x, 20+ GFLOPS
Code Reduction	99% binary, 98% MatMul, 97% Dot
Infrastructure Changes	NativeMemory, static kernels
API Alignment	random() params aligned with NumPy
New Test Files (68)	34 kernel, 8 NumPy, 4 linalg, 76 boolean
Known Issues	52 OpenBugs excluded
Installation	`dotnet add package NumSharp`

Summary

Metric	Value
Commits	106
Files Changed	558
Lines Added	+72,635
Lines Deleted	-605,976
Net Change	-533K lines
Test Results	3,907 passed, 52 OpenBugs, 11 skipped

Detailed Breakdown

IL Kernel Generator

Runtime IL generation via System.Reflection.Emit.DynamicMethod replaces static Regen templates.

Kernel Files (27 new files)

ILKernelGenerator.cs - Core infrastructure, SIMD detection (Vector128/256/512)
ILKernelGenerator.Binary.cs - Add, Sub, Mul, Div, BitwiseAnd/Or/Xor
ILKernelGenerator.MixedType.cs - Mixed-type ops with type promotion
ILKernelGenerator.Unary.cs - Negate, Abs, Sqrt, Sin, Cos, Exp, Log, Sign
ILKernelGenerator.Comparison.cs - ==, !=, <, >, <=, >= returning bool arrays
ILKernelGenerator.Reduction.cs - Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, Any
ILKernelGenerator.Reduction.Axis.Simd.cs - AVX2 gather for axis reductions
ILKernelGenerator.Scan.cs - CumSum, CumProd with SIMD
ILKernelGenerator.Shift.cs - LeftShift, RightShift
ILKernelGenerator.MatMul.cs - Cache-blocked SIMD matrix multiply
ILKernelGenerator.Clip.cs, .Modf.cs, .Masking.cs - Specialized ops

Execution Paths

SimdFull - Contiguous + SIMD-capable dtype → Vector loop + scalar tail
ScalarFull - Contiguous + non-SIMD dtype (Decimal) → Scalar loop
General - Strided/broadcast → Coordinate-based iteration

Infrastructure

KernelKey.cs, KernelOp.cs, KernelSignatures.cs - Kernel dispatch
SimdMatMul.cs - SIMD matrix multiplication helpers
TypeRules.cs - NEP50 type promotion rules

Architecture

Clean separation of concerns:

Component	Design
`ILKernelGenerator`	Static class (27 partial files), internal to `DefaultEngine`
`TensorEngine`	All `np.*` ops route through abstract methods
`Shape.Broadcasting`	Pure shape math in `Shape` struct (456 lines)
`ArgMin/ArgMax`	Unified IL kernel with NaN-aware + Boolean semantics
`DecimalMath`	Internal utility (~403 lines) for Sqrt, Pow, ATan2, Exp, Log

Single-Threaded Execution

All computation is single-threaded with no Parallel.For usage. This provides:

Deterministic behavior - Same inputs always produce same outputs in same order
Non-blocking execution - No thread synchronization overhead
Simplified debugging - Stack traces are straightforward
SIMD compensation - Vector128/256/512 intrinsics provide parallelism at the CPU level

Broadcasting External to Engine

Broadcasting logic (Shape.Broadcasting.cs) is pure shape math with no engine dependencies:

Shape.AreBroadcastable() - Check if shapes can broadcast
Shape.Broadcast() - Compute broadcast result shape and strides
Shape.ResolveReturnShape() - Determine output shape for operations
DefaultEngine delegates all broadcasting to Shape.* methods

DecimalMath (#588)

Replaced embedded third-party DecimalEx.cs (~1061 lines) with minimal internal DecimalMath.cs (~403 lines) containing only the functions NumSharp actually uses: Sqrt, Pow, ATan2, Exp, Log, Log10, ATan.

TensorEngine Abstract Methods

Compare, NotEqual, Less, LessEqual, Greater, GreaterEqual, BitwiseAnd, BitwiseOr, BitwiseXor, LeftShift, RightShift, Power(NDArray, NDArray), FloorDivide, Truncate, Reciprocal, Square, Cbrt, Invert, Deg2Rad, Rad2Deg, IsInf, ReduceCumMul, Any, NanSum, NanProd, NanMin, NanMax, BooleanMask

DefaultEngine Dispatch Files (IL kernel integration)

File	Functions
`DefaultEngine.BinaryOp.cs`	`np.add`, `np.subtract`, `np.multiply`, `np.divide`, `np.mod`, `np.power`
`DefaultEngine.BitwiseOp.cs`	`np.bitwise_and`, `np.bitwise_or`, `np.bitwise_xor`, `&`, `\|`, `^`
`DefaultEngine.CompareOp.cs`	`np.equal`, `np.not_equal`, `np.less`, `np.greater`, `np.less_equal`, `np.greater_equal`
`DefaultEngine.ReductionOp.cs`	`np.sum`, `np.prod`, `np.min`, `np.max`, `np.mean`, `np.std`, `np.var`, `np.argmax`, `np.argmin`
`DefaultEngine.UnaryOp.cs`	`np.abs`, `np.negative`, `np.sqrt`, `np.sin`, `np.cos`, `np.exp`, `np.log`, `np.sign`, etc.

Implementation Files

Default.Any.cs, Default.BooleanMask.cs, Default.Reduction.Nan.cs, Shape.Broadcasting.cs

New NumPy Functions (35)

NaN-Aware Reductions (7)

Function	Description
`np.nansum`	Sum ignoring NaN
`np.nanprod`	Product ignoring NaN
`np.nanmin`	Minimum ignoring NaN
`np.nanmax`	Maximum ignoring NaN
`np.nanmean`	Mean ignoring NaN
`np.nanvar`	Variance ignoring NaN
`np.nanstd`	Standard deviation ignoring NaN

Math Operations (8)

Function	Description
`np.cbrt`	Cube root
`np.floor_divide`	Integer division
`np.reciprocal`	Element-wise 1/x
`np.trunc`	Truncate to integer
`np.invert`	Bitwise NOT
`np.square`	Element-wise square
`np.cumprod`	Cumulative product
`np.count_nonzero`	Count non-zero elements

Bitwise & Trigonometric (4)

Function	Description
`np.left_shift`	Bitwise left shift
`np.right_shift`	Bitwise right shift
`np.deg2rad`	Degrees to radians
`np.rad2deg`	Radians to degrees

Logic & Validation (4) - Previously returned `null`

Function	Description
`np.isnan`	Test element-wise for NaN
`np.isfinite`	Test element-wise for finiteness
`np.isinf`	Test element-wise for infinity
`np.isclose`	Element-wise comparison within tolerance

Operators (2) - Previously returned `null`

Operator	Description
`operator &`	Bitwise/logical AND with broadcasting
`operator \|`	Bitwise/logical OR with broadcasting

Comparison Functions (6) - New named AP...

Assets 6

14 Feb 10:14

github-actions

v0.4.0-alpha1

fd029a2

v0.4.0-alpha1 Pre-release

Pre-release

NumSharp v0.4.0-alpha1

See #538 for information.

NuGet

No nuget release this preview version.

What's Changed

Enabled NDArray boolean comparisons for LessThan, GreaterThan, and … by @Rikki-Tavi in #395
Added data types in np.frombuffer. in #425
F# in README by @dsyme in #432
Added support for user defined decimal precision for np.around() and TensorEngine.Round() by @shashi4u in #453
NumSharp.Bitmap support for odd sized bitmaps with odd sized bytes per pixel by @AmbachtIT in #460
Fixing the consistency of seed in the random choice. by @bojake in #489
(Logics):add high performance logical AND function with axis an… by @zhuoshui-AI in #525
Upgrade target frameworks to net8.0;net10.0 by @Nucs in #532
Add GitHub Actions CI/CD pipeline by @Nucs in #534
Fix: skip Bitmap tests on non-Windows CI by @Nucs in #535
docs: relocate website to docs/website/ by @Nucs in #557
docs: move docfx_project to docs/website-src by @Nucs in #558
feat(docs): upgrade to DocFX v2 modern template by @Nucs in #562

New Contributors

Many of the contributer's merges were piggybacked by this release and was probably not entirely intentional.

@Rikki-Tavi made their first contribution in #395
@dsyme made their first contribution in #432
@shashi4u made their first contribution in #453
@AmbachtIT made their first contribution in #460
@bojake made their first contribution in #489
@zhuoshui-AI made their first contribution in #525

Full Changelog: 0.20.5...v0.4.0-alpha1

Contributors

Nucs, shashi4u, and 5 other contributors

Assets 6

31 Dec 16:34

Nucs

0.20.5

e10f309

v0.20.5

NDArray.Indexing: Rewrite of the getter mechanism, NDArray getter now supports combining 'NDArray, Slice, string, int, bool' in the same slice.
NDArray.Indexing: Added support for indexing with unmanaged array of indices: ndarray[int* pointer, int length], nd.GetData(int*, int), etc..
NDArray.Broadcasting: fixed multiple issues.
NDArray.Slicing: Added support for slicing a broadcasted NDArray.
Added NPTypeCode.Float as an alias to NPTypeCode.Single
Extending NPY and fixing NPZ (Thanks Matthew Moloney)
Added NDArray.AsOrMakeGeneric()
Added np.nonzero. np.maximum, np.minimum, np.all, np.any
Arrays.cs: perf-optted Arrays.Slice
NDArray.FromMultiDimArray: Fixed #367
np.clip: Added @out argument
Added np.array(IEnumerable) and np.array(IEnumerable, int size) which is faster.
np.broadcast_to: added additional overloads.

Assets 3

05 Oct 16:06

Nucs

0.20.4

48e4fa0

v0.20.4

Changes

Added np.transpose, np.swapaxes, ndarray.T, np.moveaxis, np.rollaxis, np.size, np.copyto.
Added np.ceil, np.arccos, np.floor, np.modf, np.square, np.round, np.sign, np.arcsin, np.arctan.
Added np.random.*: beta, gamma, bernoulli, binomial, lognormal, normal, poisson, chisquare, geometric.
Added support for np.newaxis, ... (ellipsis) in a slice.
Performance optimization for np.array, np.linspace, Randomizer class and all np.random.* methods.

Bug Fixes

ndarray.view copying when it shouldn't.
couple of ambiguous methods

Obsoletion

nd.Unsafe.Shape is now obsolete in favor of nd.Shape.

Special thanks to @henon and @deepakkumar1984 for a PRing great portion of this release.

Assets 3

28 Sep 14:57

Nucs

0.20.3

81abece

v0.20.3

Breaking Changes

NumSharp.Backends.NPTypeCode moved to NumSharp.NPTypeCode.

Assets 3

28 Jul 12:37

Oceania2018

v0.10-slice

92fa9dc

v0.10-slice

release signed assembly v0.10.6.

Assets 2

31 Jan 04:04

Oceania2018

v0.7-tensorflow

4b1724a

v0.7 works with TensorFlow.NET

v0.7-tensorflow

Merge branch 'master' of https://github.com/Oceania2018/NumSharp

Assets 2

22 Dec 14:45

Oceania2018

v0.6-lapack

727fd41

v0.6 Supports LAPACK

Merge pull request #162 from dotChris90/master

Extend doc and generated new API docs

Assets 2

05 Dec 02:38

Oceania2018

v0.5-dtype

4d5ce4e

v0.5-dtype

release v0.5

Assets 2

22 Nov 02:54

Oceania2018

v0.4

6c5f405

v0.4

released v0.4

Assets 2

Releases: SciSharp/NumSharp

NumSharp 0.41.0-prerelease

Installation

TL;DR

Contents

Summary

Detailed Breakdown

IL Kernel Generator

Kernel Files (27 new files)

Execution Paths

Infrastructure

Architecture

Single-Threaded Execution

Broadcasting External to Engine

DecimalMath (#588)

TensorEngine Abstract Methods

DefaultEngine Dispatch Files (IL kernel integration)

Implementation Files

New NumPy Functions (35)

NaN-Aware Reductions (7)

Math Operations (8)

Bitwise & Trigonometric (4)

Logic & Validation (4) - Previously returned null

Operators (2) - Previously returned null

Comparison Functions (6) - New named AP...

Uh oh!

v0.4.0-alpha1

NumSharp v0.4.0-alpha1

NuGet

What's Changed

New Contributors

Contributors

Uh oh!

v0.20.5

Uh oh!

v0.20.4

Uh oh!

v0.20.3

Uh oh!

v0.10-slice

Uh oh!

v0.7 works with TensorFlow.NET

Uh oh!

v0.6 Supports LAPACK

Uh oh!

v0.5-dtype

Uh oh!

v0.4

Uh oh!

Logic & Validation (4) - Previously returned `null`

Operators (2) - Previously returned `null`