Skip to content

sched: add SCHED_MIC, a hybrid-core-aware scheduler derived from ULE#385

Merged
laffer1 merged 1 commit into
masterfrom
sched_mic
Jun 12, 2026
Merged

sched: add SCHED_MIC, a hybrid-core-aware scheduler derived from ULE#385
laffer1 merged 1 commit into
masterfrom
sched_mic

Conversation

@laffer1

@laffer1 laffer1 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Adds SCHED_MIC, a new scheduler derived from ULE that weighs heterogeneous
CPU core classes when placing threads. ULE remains the GENERIC default; SCHED_MIC
is opt-in (options SCHED_MIC, or the sample sys/amd64/conf/MIC config).

On x86 hybrid CPUs it prefers cores in this order:

  1. P-cores / AMD 3D V-Cache CCD cores (physical)
  2. E-cores / AMD compute-CCD cores / AMD mobile "C" cores (physical)
  3. the second SMT thread of a busy core ("hyperthreaded")
  4. Intel LP-E cores (last)

The preference is a soft, tunable bias folded into the existing
cpu_search_lowest() load comparison, applied only on the placement path
(sched_pickcpu). The long-term balancer and work stealing stay class-blind, so
real-load balancing is preserved. On homogeneous hardware and non-x86 arches it
behaves like ULE (byte-identical with kern.sched.smt_busy_penalty=0).

Detection

Per-CPU classification runs at AP startup via smp_rendezvous in an SI_SUB_SMP
SYSINIT (#ifdef SCHED_MIC in mp_x86.c), stored in a new cpu_core_class[]
array that defaults to "performance" (so anything unrecognized behaves like ULE):

Case Method Reliability
Intel P vs E CPUID 0x1A core type architectural
Intel LP-E small core with no L3, gated by kern.sched.detect_lpe heuristic
AMD X3D / C-cores larger per-CCD L3 (0x8000001D) preferred; symmetric/single-CCD → all perf heuristic

Tunables (kern.sched.*)

class_weight_eff (64), class_weight_lp (512), smt_busy_penalty (192, 0 =
stock ULE), prefer_compute (swap AMD cache/compute CCD), detect_lpe, and a
read-only core_class dump.

Testing

  • Both sched_mic.c and mp_x86.c build clean under -Werror; a MINIMAL-based
    SCHED_MIC kernel links into a complete kernel ELF with all symbols resolved.
  • Intel detection validated against real Alder Lake silicon (Core i7-1260P): 8
    P-core threads → perf, 8 E-cores → eff, no LP-E (E-cores keep L3, so the LP
    heuristic does not misfire).
  • Not yet exercised on hardware: LP-E (needs Meteor Lake / Core Ultra), AMD X3D,
    AMD C-cores, and live placement behavior (needs a booted MIC kernel).

Notes / open questions

  • The long-term balancer is class-blind by default, so under sustained saturation
    it spreads work off the preferred cores to equalize load, softening the
    placement preference. Flipping one sched_lowest() argument in
    sched_balance_group makes the balancer reinforce packing instead — a
    hardware-dependent tradeoff. See scheduler.md for details.
  • scheduler.md is included at the repo root as a design doc; drop it if you'd
    rather it not live in the tree.

🤖 Generated with Claude Code

Summary by Sourcery

Introduce a new optional hybrid-core-aware scheduler (SCHED_MIC) derived from ULE and integrate x86 core classification, tunables, and documentation to prefer performance cores while remaining compatible with existing behavior.

New Features:

  • Add the SCHED_MIC scheduler as an alternative to ULE with hybrid-core-aware thread placement.
  • Classify x86 CPUs into performance, efficiency, and low-power classes using CPUID-based detection for use by the scheduler.
  • Expose new sysctls and a sample amd64 kernel configuration to enable and tune SCHED_MIC behavior.
  • Add a top-level scheduler.md design document describing SCHED_MIC behavior and configuration.

Enhancements:

  • Extend x86 CPUID cache enumeration helpers and SMP topology structures to support hybrid core classification without changing default schedulers.

SCHED_MIC is a copy of the ULE scheduler that weighs heterogeneous CPU
core classes when placing threads. On x86 CPUs with hybrid topologies it
prefers, in order: P-cores / AMD 3D V-Cache CCD cores, then E-cores / AMD
compute-CCD / AMD mobile "C" cores, then the second SMT thread of a busy
core, then Intel LP-E cores last. The preference is a soft, tunable bias
folded into the existing cpu_search_lowest() load comparison, applied only
on the placement path (sched_pickcpu); the long-term balancer and work
stealing stay class-blind. On homogeneous hardware, and on non-x86
architectures, SCHED_MIC behaves like ULE.

ULE remains the default; SCHED_MIC is opt-in (options SCHED_MIC, or the
sample sys/amd64/conf/MIC config).

Core class is detected per-CPU at AP startup via an smp_rendezvous in an
SI_SUB_SMP SYSINIT (#ifdef SCHED_MIC in mp_x86.c) and stored in a new
cpu_core_class[] array, defaulting to "performance" so unrecognized and
non-hybrid CPUs behave like ULE:

  - Intel P vs E: CPUID 0x1A core type (architectural).
  - Intel LP-E:   small core with no L3 (heuristic, kern.sched.detect_lpe).
  - AMD X3D / Cx: larger per-CCD L3 via CPUID 0x8000001D is preferred
                  (heuristic); symmetric/single-CCD parts stay all-perf.

Tunables under kern.sched.*: class_weight_eff, class_weight_lp,
smt_busy_penalty (0 = stock ULE SMT behavior), prefer_compute (swap AMD
cache/compute CCD preference), detect_lpe, and a read-only core_class dump.

The Intel detection path was validated against real Alder Lake silicon
(Core i7-1260P): 8 P-core threads classed perf, 8 E-cores classed eff,
no LP-E (E-cores retain L3, so the LP heuristic does not misfire).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@sourcery-ai

sourcery-ai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Reviewer's Guide

Introduces SCHED_MIC, a new scheduler forked from ULE that biases thread placement based on per-CPU hybrid core classes, with x86-specific CPUID-based detection and new sysctls, while leaving ULE behavior unchanged on homogeneous or non-x86 systems. The main implementation adds SCHED_MIC’s core-class-aware placement logic to the ULE codepath in a new sched_mic.c file, wires in per-CPU core classification via mp_x86.c and smp.h, and exposes tunables and debug output via new kern.sched.* sysctls and a scheduler.md design doc.

Flow diagram for SCHED_MIC core classification and placement

flowchart TD
  Boot["AP startup (SI_SUB_SMP)"] --> MicClassify[mic_classify]
  MicClassify --> Rendezvous[smp_rendezvous]
  Rendezvous --> MicProbe[mic_probe_cpu]
  MicProbe --> CpuCoreClass["populate cpu_core_class per CPU (Intel/AMD heuristics)"]

  subgraph Runtime_Placement
    SchedAdd[sched_add] --> SchedPickcpu[sched_pickcpu]
    SchedPickcpu --> SchedLowest["sched_lowest (class_aware=1)"]
    SchedLowest --> CpuSearch[cpu_search_lowest]
    CpuSearch --> ClassCost[sched_class_cost]
    ClassCost -->|uses| CoreClassArray[cpu_core_class]
    CpuSearch --> ChosenCpu["return target CPU id"]
  end

  CpuCoreClass -->|read by| ClassCost
Loading

File-Level Changes

Change Details Files
Add SCHED_MIC scheduler as a ULE-derived, hybrid-core-aware scheduler implementation.
  • Introduce new sys/kern/sched_mic.c by copying sched_ule.c and injecting hybrid core class awareness into cpu_search_lowest()/sched_pickcpu via sched_class_cost and related helpers.
  • Add hybrid-core placement tunables (class_weight_eff, class_weight_lp, smt_busy_penalty, prefer_compute, detect_lpe, core_class) under kern.sched.*, and wire SCHED_MIC into sys/conf/options, files, NOTES, and a sample amd64 MIC kernel config.
  • Keep long-term load balancer and work stealing class-blind while making only the placement path (sched_pickcpu) class-aware, ensuring behavior matches ULE on non-hybrid systems when smt_busy_penalty=0.
sys/kern/sched_mic.c
sys/conf/options
sys/conf/files
sys/conf/NOTES
sys/amd64/conf/MIC
Classify x86 CPUs into hybrid core classes for SCHED_MIC using CPUID and cache enumeration.
  • Add cpu_core_class[] array and CPU_CLASS_* constants to describe performance, efficiency, and LP core classes, defaulting all CPUs to performance class for non-hybrid behavior.
  • Implement mic_probe_cpu and mic_classify SYSINIT in mp_x86.c (under #ifdef SCHED_MIC) to probe per-CPU hybrid core type and L3 size via CPUID 0x1A and deterministic cache leaves, then derive per-CPU classes for Intel hybrid (P/E/LP-E) and AMD X3D / compute CCD / C-cores.
  • Extend specialreg.h with CPUID_CACHE_* helper macros and CPUID_HYBRID_NATIVE_MODEL_MASK to parse cache descriptors and hybrid model IDs used by the classifier.
sys/sys/smp.h
sys/x86/x86/mp_x86.c
sys/x86/include/specialreg.h
Document SCHED_MIC’s design and configuration and provide a dedicated kernel config.
  • Add scheduler.md at the repository root describing SCHED_MIC’s goals, core classes, placement algorithm, detection heuristics, and tunables for reviewers and future maintainers.
  • Introduce a new amd64 kernel config MIC that enables options SCHED_MIC and any required settings to build a MIC kernel.
  • Clarify that ULE remains the default scheduler and SCHED_MIC is opt-in, preserving existing behavior by default.
scheduler.md
sys/amd64/conf/MIC

Possibly linked issues

  • #0: They implement the requested hybrid-core-aware ULE-derived scheduler, including Intel/AMD core classification and tunable core-class placement preferences.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces SCHED_MIC, a hybrid-core-aware scheduler derived from ULE that optimizes thread placement on x86 CPUs with heterogeneous cores by weighing four core priority classes. Feedback on the implementation highlights a critical issue where the global cpu_core_class array is not explicitly initialized, potentially leaving CPUs with an invalid default class of 0 instead of CPU_CLASS_PERF (1) during early returns or AMD core classification. It is recommended to initialize the array to CPU_CLASS_PERF at the start of mic_classify.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sys/x86/x86/mp_x86.c
Comment on lines +1031 to +1039
static void
mic_classify(void *arg __unused)
{
uint64_t maxl3;
int i;
bool amd, intel, asymmetric;

if (mp_ncpus <= 1)
return; /* defaults: every CPU is class perf. */

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cpu_core_class array is defined globally but is not explicitly initialized to CPU_CLASS_PERF (1) before classification. Since global arrays in C default to 0 (which is not a valid class constant, as CPU_CLASS_PERF is 1), any early returns in mic_classify (e.g., if mp_ncpus <= 1, if the vendor is neither Intel nor AMD, or if the CPU has symmetric L3) will leave the CPUs with a class of 0. Additionally, the AMD classification path only sets non-preferred cores to CPU_CLASS_EFF, leaving the preferred cores at their default value (which would remain 0 instead of CPU_CLASS_PERF).

To ensure correct and robust default behavior across all platforms and vendors, explicitly initialize the entire cpu_core_class array to CPU_CLASS_PERF at the beginning of mic_classify.

static void
mic_classify(void *arg __unused)
{
	uint64_t maxl3;
	int i;
	bool amd, intel, asymmetric;

	for (i = 0; i < MAXCPU; i++)
		cpu_core_class[i] = CPU_CLASS_PERF;

	if (mp_ncpus <= 1)
		return;			/* defaults: every CPU is class perf. */

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but cpu_core_class is not default-zero — it's statically initialized to CPU_CLASS_PERF for every element at its definition in sys/kern/sched_mic.c:

uint8_t cpu_core_class[MAXCPU] = { [0 ... MAXCPU - 1] = CPU_CLASS_PERF };

(This file holds the definition; mp_x86.c only sees the extern, which is likely why it looked uninitialized here.)

Because of that initializer, every path is correct without any class-0:

  • Early returns (mp_ncpus <= 1, non-Intel/AMD vendor, no L3, symmetric L3) leave all CPUs at CPU_CLASS_PERF — that's the intended "behave like ULE" default, and the in-function comments say so.
  • The AMD path only needs to demote the smaller-L3 die to CPU_CLASS_EFF; the preferred die stays CPU_CLASS_PERF from the initializer by design.

So there's no uninitialized-class bug here; an explicit re-init in mic_classify() would be redundant with the static initializer.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Claude finished @laffer1's task —— View job


Code Review for SCHED_MIC Scheduler

Reviewing PR #385 for correctness, memory safety, bounds checking, style(9) compliance, and security issues.

Review Summary

✅ Overall Assessment: High Quality Implementation

This is a well-structured scheduler implementation with proper defensive programming practices. The code follows MidnightBSD coding standards and shows careful attention to bounds checking and initialization.

Key Findings

Initialization Issue Resolution

The Gemini review incorrectly flagged an initialization issue. The cpu_core_class array IS properly initialized in sys/kern/sched_mic.c:316:

uint8_t cpu_core_class[MAXCPU] = { [0 ... MAXCPU - 1] = CPU_CLASS_PERF };

This means all CPUs default to CPU_CLASS_PERF (1) as intended, not 0.

Bounds Checking

Excellent bounds checking throughout:

  • All array accesses use MAXCPU-bounded indices via CPU_FOREACH(i) macro
  • Static arrays properly sized: cpu_core_class[MAXCPU], mic_hybrid[MAXCPU], etc.
  • No buffer overflow risks identified

Memory Management

  • No malloc/free operations requiring symmetry analysis
  • Uses static global arrays and stack variables appropriately
  • No unsafe string operations (strcpy, sprintf, etc.)

Style(9) Compliance

  • Proper 8-space tab indentation throughout
  • Reasonable line lengths (inspected samples under 80 characters)
  • Consistent BSD kernel coding style

Security Assessment

  • No obvious security vulnerabilities
  • Proper input validation in sysctls
  • No user-controllable array indexing beyond validated bounds
  • Safe CPUID usage with proper vendor checks

Technical Details

Architecture

The implementation correctly:

  • Derives from ULE with minimal changes focused on hybrid core awareness
  • Preserves ULE behavior on homogeneous systems (smt_busy_penalty=0)
  • Uses proper #ifdef SCHED_MIC guards in mp_x86.c

Core Classification Logic

sys/x86/x86/mp_x86.c:1032-1079 implements sound detection:

  • Intel: Uses architectural CPUID 0x1A for P/E core detection
  • AMD: Heuristic based on L3 cache size differences for X3D/compute CCDs
  • Safe fallbacks to performance class for unknown configurations

Scheduler Integration

sys/kern/sched_mic.c:714-743 adds class cost calculation:

  • Lightweight addition to existing ULE cpu_search_lowest() path
  • Tunable bias weights via sysctls
  • Maintains load balancing properties of ULE

Recommendations

✅ Approved for Merge

This implementation demonstrates:

  • Careful attention to defensive programming
  • Proper BSD kernel coding standards
  • Safe handling of heterogeneous CPU detection
  • Minimal risk surface area

No critical issues identified. The scheduler should provide meaningful improvements on hybrid x86 systems while maintaining ULE compatibility.


@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • sched_mic.c is almost a full copy of sched_ule.c with a few hybrid-specific changes; consider factoring the shared logic into common helpers or a shared base implementation to avoid divergence and make future scheduler bugfixes/features easier to apply consistently.
  • sched_class_cost() calls smt_sibling_busy() inside the innermost cpu_search_lowest() loop, which re-walks the cg mask for every candidate CPU and can become O(n^2) per search on larger SMT groups; consider caching SMT busy state per group or per CPU (e.g., via precomputed sibling indices or a transient bitmap) to reduce this overhead.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- sched_mic.c is almost a full copy of sched_ule.c with a few hybrid-specific changes; consider factoring the shared logic into common helpers or a shared base implementation to avoid divergence and make future scheduler bugfixes/features easier to apply consistently.
- sched_class_cost() calls smt_sibling_busy() inside the innermost cpu_search_lowest() loop, which re-walks the cg mask for every candidate CPU and can become O(n^2) per search on larger SMT groups; consider caching SMT busy state per group or per CPU (e.g., via precomputed sibling indices or a transient bitmap) to reduce this overhead.

## Individual Comments

### Comment 1
<location path="sys/kern/sched_mic.c" line_range="713" />
<code_context>
+ * The result is folded into "load" before it is summed into the per-group
+ * total, so the bias steers both subtree selection and the final CPU pick.
+ */
+static int
+sched_class_cost(const struct cpu_group *cg, int c, int l)
+{
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the per-CPU effective load computation in cpu_search_lowest() into a dedicated helper to separate cost policy from traversal logic and make the search code easier to follow.

The inner logic of `cpu_search_lowest()` is now carrying a lot of policy (preference bias, randomization, hybrid class cost, SMT tweaks) on top of the topology traversal. You can reduce the cognitive load without changing behavior by extracting the per‑CPU “effective load” calculation into a small helper, so `cpu_search_lowest()` only orchestrates traversal and compares scalars.

For example, you can refactor the leaf CPU loop like this:

```c
/* New helper: encapsulate all per-CPU load/cost heuristics. */
static inline int
cpu_effective_load(const struct cpu_group *cg, int c,
    const struct cpu_search *s, int base_load)
{
	int load, penalty;

	load = base_load * 256;

	/* Prefer cs_prefer, possibly reduced if already running there. */
	penalty = 0;
	if (c == s->cs_prefer) {
		if (__predict_false(s->cs_running))
			base_load--;		/* matches existing behavior */
		penalty = 128;
	}

	/* Hybrid/E/LP cost. */
	if (s->cs_class)
		load += sched_class_cost(cg, c, base_load);

	/* Balancing-time SMT group bias (for >1 load in threaded group). */
	if (__predict_false(s->cs_running) &&
	    (cg->cg_flags & CG_FLAG_THREAD) &&
	    base_load >= 128 && (base_load & 128) != 0)
		load += 128;

	/* Randomization. */
	load -= sched_random() % 128;

	return load - penalty;
}
```

Then the leaf part of `cpu_search_lowest()` becomes much easier to follow:

```c
for (c = cg->cg_last; c >= cg->cg_first; c--) {
	if (!CPU_ISSET(c, &cg->cg_mask))
		continue;
	tdq = TDQ_CPU(c);
	l = TDQ_LOAD(tdq);

	/* Reject this CPU early based on load/pri/mask. */
	if (l > s->cs_load ||
	    (atomic_load_char(&tdq->tdq_lowpri) <= s->cs_pri &&
	     (!s->cs_running || c != s->cs_prefer)) ||
	    !CPU_ISSET(c, s->cs_mask))
		continue;

	load = cpu_effective_load(cg, c, s, l);
	total += load;

	if (load < bload || (load == bload && load < r->csr_load)) {
		bload = load;
		r->csr_cpu = c;
		r->csr_load = load;
	}
}
```

This keeps all existing heuristics (including hybrid-class penalties and SMT behavior) but:

- Moves hybrid/SMT/randomization details into a narrow helper.
- Leaves `cpu_search_lowest()` primarily responsible for traversal, filtering, and comparing scalar costs.
- Makes it easier to reason about or adjust cost policy in one place (`cpu_effective_load` and `sched_class_cost`) without re-reading the tree search logic each time.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread sys/kern/sched_mic.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant