Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NumSharp 0.41.0-prerelease
This prerelease introduces the IL Kernel Generator - a complete architectural overhaul that replaces ~600K lines of Regen-generated template code with ~19K lines of runtime IL generation. This delivers massive performance improvements, comprehensive NumPy 2.x alignment, and significantly cleaner maintainable code.
TL;DR
==,!=,<,>,<=,>=,&,|,^np.equal(),np.not_equal(),np.less(),np.greater(),np.less_equal(),np.greater_equal()np.logical_and(),np.logical_or(),np.logical_not(),np.logical_xor()&,|,^for generic arrays (resolvesNDArray<bool>ambiguity)Parallel.*Contents
dotnet add package NumSharpSummary
IL Kernel Generator
Runtime IL generation via
System.Reflection.Emit.DynamicMethodreplaces static Regen templates.Kernel Files (27 new files)
ILKernelGenerator.cs- Core infrastructure, SIMD detection (Vector128/256/512)ILKernelGenerator.Binary.cs- Add, Sub, Mul, Div, BitwiseAnd/Or/XorILKernelGenerator.MixedType.cs- Mixed-type ops with type promotionILKernelGenerator.Unary.cs- Negate, Abs, Sqrt, Sin, Cos, Exp, Log, SignILKernelGenerator.Comparison.cs- ==, !=, <, >, <=, >= returning bool arraysILKernelGenerator.Reduction.cs- Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, AnyILKernelGenerator.Reduction.Axis.Simd.cs- AVX2 gather for axis reductionsILKernelGenerator.Scan.cs- CumSum, CumProd with SIMDILKernelGenerator.Shift.cs- LeftShift, RightShiftILKernelGenerator.MatMul.cs- Cache-blocked SIMD matrix multiplyILKernelGenerator.Clip.cs,.Modf.cs,.Masking.cs- Specialized opsExecution Paths
Infrastructure
KernelKey.cs,KernelOp.cs,KernelSignatures.cs- Kernel dispatchSimdMatMul.cs- SIMD matrix multiplication helpersTypeRules.cs- NEP50 type promotion rulesArchitecture
Clean separation of concerns:
ILKernelGeneratorDefaultEngineTensorEnginenp.*ops route through abstract methodsShape.BroadcastingShapestruct (456 lines)ArgMin/ArgMaxDecimalMathSingle-Threaded Execution
All computation is single-threaded with no
Parallel.Forusage. This provides:Broadcasting External to Engine
Broadcasting logic (
Shape.Broadcasting.cs) is pure shape math with no engine dependencies:Shape.AreBroadcastable()- Check if shapes can broadcastShape.Broadcast()- Compute broadcast result shape and stridesShape.ResolveReturnShape()- Determine output shape for operationsDefaultEnginedelegates all broadcasting toShape.*methodsDecimalMath (#588)
Replaced embedded third-party
DecimalEx.cs(~1061 lines) with minimal internalDecimalMath.cs(~403 lines) containing only the functions NumSharp actually uses: Sqrt, Pow, ATan2, Exp, Log, Log10, ATan.TensorEngine Abstract Methods
Compare,NotEqual,Less,LessEqual,Greater,GreaterEqual,BitwiseAnd,BitwiseOr,BitwiseXor,LeftShift,RightShift,Power(NDArray, NDArray),FloorDivide,Truncate,Reciprocal,Square,Cbrt,Invert,Deg2Rad,Rad2Deg,IsInf,ReduceCumMul,Any,NanSum,NanProd,NanMin,NanMax,BooleanMaskDefaultEngine Dispatch Files (IL kernel integration)
DefaultEngine.BinaryOp.csnp.add,np.subtract,np.multiply,np.divide,np.mod,np.powerDefaultEngine.BitwiseOp.csnp.bitwise_and,np.bitwise_or,np.bitwise_xor,&,|,^DefaultEngine.CompareOp.csnp.equal,np.not_equal,np.less,np.greater,np.less_equal,np.greater_equalDefaultEngine.ReductionOp.csnp.sum,np.prod,np.min,np.max,np.mean,np.std,np.var,np.argmax,np.argminDefaultEngine.UnaryOp.csnp.abs,np.negative,np.sqrt,np.sin,np.cos,np.exp,np.log,np.sign, etc.Implementation Files
Default.Any.cs,Default.BooleanMask.cs,Default.Reduction.Nan.cs,Shape.Broadcasting.csNew NumPy Functions (35)
NaN-Aware Reductions (7)
np.nansumnp.nanprodnp.nanminnp.nanmaxnp.nanmeannp.nanvarnp.nanstdMath Operations (8)
np.cbrtnp.floor_dividenp.reciprocalnp.truncnp.invertnp.squarenp.cumprodnp.count_nonzeroBitwise & Trigonometric (4)
np.left_shiftnp.right_shiftnp.deg2radnp.rad2degLogic & Validation (4) - Previously returned
nullnp.isnannp.isfinitenp.isinfnp.iscloseOperators (2) - Previously returned
nulloperator &operator |Comparison Functions (6) - New named API
np.equal==)np.not_equal!=)np.less<)np.greater>)np.less_equal<=)np.greater_equal>=)Logical Functions (4) - New named API
np.logical_andnp.logical_ornp.logical_notnp.logical_xorNew Overloads
np.power(array, array)np.repeat(array, NDArray)np.argmax/argmin(axis, keepdims)np.convolveCritical Bug Fixes
Behavioral Fixes
np.negative()if val > 0)val = -val)np.positive()abs()np.unique()np.dot(1D, 2D)NotSupportedExceptionnp.dot()non-contiguousnp.matmul()broadcastnp.linspace()float32for float inputsfloat64defaultnp.arange()start >= stopnp.searchsorted()intnp.shuffle()passesparameternp.moveaxis()np.argsort()np.intpint(always 32-bit)nint(native-sized integer)np.uintpnuint(native unsigned)np.LogicalNot()Return Type Fixes
np.argmax()/np.argmin()intlong(large array support)np.abs()Empty Array Handling
np.mean([])NaNnp.mean(zeros((0,3)), axis=0)[NaN, NaN, NaN]np.mean(zeros((0,3)), axis=1)[]np.std/varsingle elementNaNwithddof >= sizekeepdims Fixes
All reduction functions now properly preserve dimensions when
keepdims=True:np.sum,np.prod,np.mean,np.std,np.varnp.min,np.max,np.argmin,np.argmaxRewritten Functions (IL kernel migration)
np.all()np.any()np.sum()np.cumsum()np.cumprod()np.nonzero()np.clip()Math Functions (IL migration)
All migrated from Regen templates to IL kernels with SIMD:
sin,cos,tan,sinh,cosh,tanh,arcsin,arccos,arctan,arctan2exp,exp2,expm1,log,log2,log10,log1psqrt,abs,sign,floor,ceil,roundOperator Rewrites
Comparison Operators (==, !=, <, >, <=, >=)
TensorEnginewith IL kernelsfalsescalar)object op NDArray)Bitwise Operators (&, |, ^)
nullNDArray<T>typed operatorsImplicit Scalar Conversion
(int)ndarray_float64would failConverts.ChangeTypefor cross-dtype conversionBoolean Indexing Rewrite
Complete rewrite with NumPy-aligned behavior:
Two Cases Supported
arr[mask]wheremask.shape == arr.shape→ element-wise selectionarr[mask]wheremaskis 1D andmask.shape[0] == arr.shape[0]→ axis-0 selectionSIMD Fast Path
BooleanMaskFastPathfor contiguous arraysCountTrue(bool*, int)- SIMD count of true valuesCopyMasked<T>(src, mask, dest, size)- SIMD masked copySlicing Improvements
Broadcast Array Handling
cumsumand axis reductions on broadcast arraysEmpty Slice Handling
a[100:200]on 10-element array now returns proper empty arrayContiguous Optimization
offset=0IsSliced=falsefor contiguous slicesPerformance Improvements
Code Reduction
Massive File Deletions
Default.MatMul.2D2D.csDefault.Dot.NDMD.csDeleted Files (76)
Default.Add.{Type}.cs, etc.)Default.Equals.{Type}.cs, etc.)Infrastructure Changes
Memory Allocation
Marshal.AllocHGlobal→NativeMemory.AllocMarshal.FreeHGlobal→NativeMemory.FreeAllocationType.AllocHGlobal→AllocationType.NativeStackedMemoryPoolmigrated to NativeMemoryDefaultEngine
ILKernelGeneratoris a static class (internal to DefaultEngine)Parallel.For)Math Functions
All migrated from Regen templates to
ExecuteUnaryOp:DecimalMathdependency for most operationsTensorEngine Extensions
New abstract methods (28 total):
Compare,NotEqual,Less,LessEqual,Greater,GreaterEqualBitwiseAnd,BitwiseOr,BitwiseXor,LeftShift,RightShiftPower(NDArray, NDArray),FloorDivide,Truncate,Reciprocal,Square,Cbrt,Invert,Deg2Rad,Rad2Deg,IsInfReduceCumMul,Any,NanSum,NanProd,NanMin,NanMaxBooleanMaskIKernelProvider Methods
CountTrue(bool*, int)- SIMD true countCopyMasked<T>- SIMD masked copyVariance<T>,StandardDeviation<T>- SIMD two-passNanSum/Prod/Min/Maxfor float/doubleFindNonZeroStrided<T>- Strided nonzero detectionAPI Alignment
np.random.random()random_sample()np.random.standard_normal()np.random.*paramssize,a,b,p,d0(NumPy names)np.random.randn/rand/normalShapeparameternp.minimum/maximumdtypeparameter (notoutType)np.modf()New Test Files (68)
Kernel Tests (34)
BinaryOpTests,UnaryOpTests,ComparisonOpTests,ReductionOpTests,AxisReductionSimdTests,NonContiguousTests,SlicedArrayOpTests,NanReductionTests,VarStdComprehensiveTests,ArgMaxArgMinComprehensiveTests,CumSumComprehensiveTests,BitwiseOpTests,ShiftOpTests,DtypeCoverageTests,DtypePromotionTests,EdgeCaseTests,BattleProofTests,SimdOptimizationTests, and more.NumPy Ported Tests (8)
ArgMaxArgMinEdgeCaseTests,ClipEdgeCaseTests,ClipNDArrayTests,CumSumEdgeCaseTests,ModfEdgeCaseTests,NonzeroEdgeCaseTests,PowerEdgeCaseTests,VarStdEdgeCaseTestsLinear Algebra Battle Tests (4)
np.dot.BattleTest(195 tests),np.matmul.BattleTest(106 tests),np.outer.BattleTest(88 tests)Boolean Indexing Battle Tests (76 tests)
BooleanIndexing.BattleTests.cs- Comprehensive NumPy 2.4.2 alignment covering same-shape masks, axis-0 selection, partial shape match, 0-D indexing, mask assignment, empty masks, shape mismatch errors, non-contiguous arrays, all dtypes, NaN/Infinity, logical operations.Random Sampling Tests
np.random.shuffle.NumPyAligned.Test.cs(133 tests)Breaking Changes
None. This is a drop-in replacement with improved performance and NumPy compatibility.
Known Issues (OpenBugs)
52 tests marked as
[OpenBugs]are excluded from CI:Installation
Or via Package Manager:
Testing
Feedback
This is a prerelease. Please report any issues at:
https://github.com/SciSharp/NumSharp/issues
Full Changelog: See CHANGES.md for complete documentation of all 106 commits.