Conversation
SCHED_MIC is a copy of the ULE scheduler that weighs heterogeneous CPU
core classes when placing threads. On x86 CPUs with hybrid topologies it
prefers, in order: P-cores / AMD 3D V-Cache CCD cores, then E-cores / AMD
compute-CCD / AMD mobile "C" cores, then the second SMT thread of a busy
core, then Intel LP-E cores last. The preference is a soft, tunable bias
folded into the existing cpu_search_lowest() load comparison, applied only
on the placement path (sched_pickcpu); the long-term balancer and work
stealing stay class-blind. On homogeneous hardware, and on non-x86
architectures, SCHED_MIC behaves like ULE.
ULE remains the default; SCHED_MIC is opt-in (options SCHED_MIC, or the
sample sys/amd64/conf/MIC config).
Core class is detected per-CPU at AP startup via an smp_rendezvous in an
SI_SUB_SMP SYSINIT (#ifdef SCHED_MIC in mp_x86.c) and stored in a new
cpu_core_class[] array, defaulting to "performance" so unrecognized and
non-hybrid CPUs behave like ULE:
- Intel P vs E: CPUID 0x1A core type (architectural).
- Intel LP-E: small core with no L3 (heuristic, kern.sched.detect_lpe).
- AMD X3D / Cx: larger per-CCD L3 via CPUID 0x8000001D is preferred
(heuristic); symmetric/single-CCD parts stay all-perf.
Tunables under kern.sched.*: class_weight_eff, class_weight_lp,
smt_busy_penalty (0 = stock ULE SMT behavior), prefer_compute (swap AMD
cache/compute CCD preference), detect_lpe, and a read-only core_class dump.
The Intel detection path was validated against real Alder Lake silicon
(Core i7-1260P): 8 P-core threads classed perf, 8 E-cores classed eff,
no LP-E (E-cores retain L3, so the LP heuristic does not misfire).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reviewer's GuideIntroduces SCHED_MIC, a new scheduler forked from ULE that biases thread placement based on per-CPU hybrid core classes, with x86-specific CPUID-based detection and new sysctls, while leaving ULE behavior unchanged on homogeneous or non-x86 systems. The main implementation adds SCHED_MIC’s core-class-aware placement logic to the ULE codepath in a new sched_mic.c file, wires in per-CPU core classification via mp_x86.c and smp.h, and exposes tunables and debug output via new kern.sched.* sysctls and a scheduler.md design doc. Flow diagram for SCHED_MIC core classification and placementflowchart TD
Boot["AP startup (SI_SUB_SMP)"] --> MicClassify[mic_classify]
MicClassify --> Rendezvous[smp_rendezvous]
Rendezvous --> MicProbe[mic_probe_cpu]
MicProbe --> CpuCoreClass["populate cpu_core_class per CPU (Intel/AMD heuristics)"]
subgraph Runtime_Placement
SchedAdd[sched_add] --> SchedPickcpu[sched_pickcpu]
SchedPickcpu --> SchedLowest["sched_lowest (class_aware=1)"]
SchedLowest --> CpuSearch[cpu_search_lowest]
CpuSearch --> ClassCost[sched_class_cost]
ClassCost -->|uses| CoreClassArray[cpu_core_class]
CpuSearch --> ChosenCpu["return target CPU id"]
end
CpuCoreClass -->|read by| ClassCost
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Code Review
This pull request introduces SCHED_MIC, a hybrid-core-aware scheduler derived from ULE that optimizes thread placement on x86 CPUs with heterogeneous cores by weighing four core priority classes. Feedback on the implementation highlights a critical issue where the global cpu_core_class array is not explicitly initialized, potentially leaving CPUs with an invalid default class of 0 instead of CPU_CLASS_PERF (1) during early returns or AMD core classification. It is recommended to initialize the array to CPU_CLASS_PERF at the start of mic_classify.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| static void | ||
| mic_classify(void *arg __unused) | ||
| { | ||
| uint64_t maxl3; | ||
| int i; | ||
| bool amd, intel, asymmetric; | ||
|
|
||
| if (mp_ncpus <= 1) | ||
| return; /* defaults: every CPU is class perf. */ |
There was a problem hiding this comment.
The cpu_core_class array is defined globally but is not explicitly initialized to CPU_CLASS_PERF (1) before classification. Since global arrays in C default to 0 (which is not a valid class constant, as CPU_CLASS_PERF is 1), any early returns in mic_classify (e.g., if mp_ncpus <= 1, if the vendor is neither Intel nor AMD, or if the CPU has symmetric L3) will leave the CPUs with a class of 0. Additionally, the AMD classification path only sets non-preferred cores to CPU_CLASS_EFF, leaving the preferred cores at their default value (which would remain 0 instead of CPU_CLASS_PERF).
To ensure correct and robust default behavior across all platforms and vendors, explicitly initialize the entire cpu_core_class array to CPU_CLASS_PERF at the beginning of mic_classify.
static void
mic_classify(void *arg __unused)
{
uint64_t maxl3;
int i;
bool amd, intel, asymmetric;
for (i = 0; i < MAXCPU; i++)
cpu_core_class[i] = CPU_CLASS_PERF;
if (mp_ncpus <= 1)
return; /* defaults: every CPU is class perf. */There was a problem hiding this comment.
Thanks, but cpu_core_class is not default-zero — it's statically initialized to CPU_CLASS_PERF for every element at its definition in sys/kern/sched_mic.c:
uint8_t cpu_core_class[MAXCPU] = { [0 ... MAXCPU - 1] = CPU_CLASS_PERF };(This file holds the definition; mp_x86.c only sees the extern, which is likely why it looked uninitialized here.)
Because of that initializer, every path is correct without any class-0:
- Early returns (
mp_ncpus <= 1, non-Intel/AMD vendor, no L3, symmetric L3) leave all CPUs atCPU_CLASS_PERF— that's the intended "behave like ULE" default, and the in-function comments say so. - The AMD path only needs to demote the smaller-L3 die to
CPU_CLASS_EFF; the preferred die staysCPU_CLASS_PERFfrom the initializer by design.
So there's no uninitialized-class bug here; an explicit re-init in mic_classify() would be redundant with the static initializer.
|
Claude finished @laffer1's task —— View job Code Review for SCHED_MIC SchedulerReviewing PR #385 for correctness, memory safety, bounds checking, style(9) compliance, and security issues. Review Summary✅ Overall Assessment: High Quality Implementation This is a well-structured scheduler implementation with proper defensive programming practices. The code follows MidnightBSD coding standards and shows careful attention to bounds checking and initialization. Key Findings✅ Initialization Issue ResolutionThe Gemini review incorrectly flagged an initialization issue. The uint8_t cpu_core_class[MAXCPU] = { [0 ... MAXCPU - 1] = CPU_CLASS_PERF };This means all CPUs default to ✅ Bounds CheckingExcellent bounds checking throughout:
✅ Memory Management
✅ Style(9) Compliance
✅ Security Assessment
Technical DetailsArchitectureThe implementation correctly:
Core Classification Logic
Scheduler Integration
Recommendations✅ Approved for Merge This implementation demonstrates:
No critical issues identified. The scheduler should provide meaningful improvements on hybrid x86 systems while maintaining ULE compatibility. |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- sched_mic.c is almost a full copy of sched_ule.c with a few hybrid-specific changes; consider factoring the shared logic into common helpers or a shared base implementation to avoid divergence and make future scheduler bugfixes/features easier to apply consistently.
- sched_class_cost() calls smt_sibling_busy() inside the innermost cpu_search_lowest() loop, which re-walks the cg mask for every candidate CPU and can become O(n^2) per search on larger SMT groups; consider caching SMT busy state per group or per CPU (e.g., via precomputed sibling indices or a transient bitmap) to reduce this overhead.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- sched_mic.c is almost a full copy of sched_ule.c with a few hybrid-specific changes; consider factoring the shared logic into common helpers or a shared base implementation to avoid divergence and make future scheduler bugfixes/features easier to apply consistently.
- sched_class_cost() calls smt_sibling_busy() inside the innermost cpu_search_lowest() loop, which re-walks the cg mask for every candidate CPU and can become O(n^2) per search on larger SMT groups; consider caching SMT busy state per group or per CPU (e.g., via precomputed sibling indices or a transient bitmap) to reduce this overhead.
## Individual Comments
### Comment 1
<location path="sys/kern/sched_mic.c" line_range="713" />
<code_context>
+ * The result is folded into "load" before it is summed into the per-group
+ * total, so the bias steers both subtree selection and the final CPU pick.
+ */
+static int
+sched_class_cost(const struct cpu_group *cg, int c, int l)
+{
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the per-CPU effective load computation in cpu_search_lowest() into a dedicated helper to separate cost policy from traversal logic and make the search code easier to follow.
The inner logic of `cpu_search_lowest()` is now carrying a lot of policy (preference bias, randomization, hybrid class cost, SMT tweaks) on top of the topology traversal. You can reduce the cognitive load without changing behavior by extracting the per‑CPU “effective load” calculation into a small helper, so `cpu_search_lowest()` only orchestrates traversal and compares scalars.
For example, you can refactor the leaf CPU loop like this:
```c
/* New helper: encapsulate all per-CPU load/cost heuristics. */
static inline int
cpu_effective_load(const struct cpu_group *cg, int c,
const struct cpu_search *s, int base_load)
{
int load, penalty;
load = base_load * 256;
/* Prefer cs_prefer, possibly reduced if already running there. */
penalty = 0;
if (c == s->cs_prefer) {
if (__predict_false(s->cs_running))
base_load--; /* matches existing behavior */
penalty = 128;
}
/* Hybrid/E/LP cost. */
if (s->cs_class)
load += sched_class_cost(cg, c, base_load);
/* Balancing-time SMT group bias (for >1 load in threaded group). */
if (__predict_false(s->cs_running) &&
(cg->cg_flags & CG_FLAG_THREAD) &&
base_load >= 128 && (base_load & 128) != 0)
load += 128;
/* Randomization. */
load -= sched_random() % 128;
return load - penalty;
}
```
Then the leaf part of `cpu_search_lowest()` becomes much easier to follow:
```c
for (c = cg->cg_last; c >= cg->cg_first; c--) {
if (!CPU_ISSET(c, &cg->cg_mask))
continue;
tdq = TDQ_CPU(c);
l = TDQ_LOAD(tdq);
/* Reject this CPU early based on load/pri/mask. */
if (l > s->cs_load ||
(atomic_load_char(&tdq->tdq_lowpri) <= s->cs_pri &&
(!s->cs_running || c != s->cs_prefer)) ||
!CPU_ISSET(c, s->cs_mask))
continue;
load = cpu_effective_load(cg, c, s, l);
total += load;
if (load < bload || (load == bload && load < r->csr_load)) {
bload = load;
r->csr_cpu = c;
r->csr_load = load;
}
}
```
This keeps all existing heuristics (including hybrid-class penalties and SMT behavior) but:
- Moves hybrid/SMT/randomization details into a narrow helper.
- Leaves `cpu_search_lowest()` primarily responsible for traversal, filtering, and comparing scalar costs.
- Makes it easier to reason about or adjust cost policy in one place (`cpu_effective_load` and `sched_class_cost`) without re-reading the tree search logic each time.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary
Adds SCHED_MIC, a new scheduler derived from ULE that weighs heterogeneous
CPU core classes when placing threads. ULE remains the GENERIC default; SCHED_MIC
is opt-in (
options SCHED_MIC, or the samplesys/amd64/conf/MICconfig).On x86 hybrid CPUs it prefers cores in this order:
The preference is a soft, tunable bias folded into the existing
cpu_search_lowest()load comparison, applied only on the placement path(
sched_pickcpu). The long-term balancer and work stealing stay class-blind, soreal-load balancing is preserved. On homogeneous hardware and non-x86 arches it
behaves like ULE (byte-identical with
kern.sched.smt_busy_penalty=0).Detection
Per-CPU classification runs at AP startup via
smp_rendezvousin anSI_SUB_SMPSYSINIT (
#ifdef SCHED_MICinmp_x86.c), stored in a newcpu_core_class[]array that defaults to "performance" (so anything unrecognized behaves like ULE):
0x1Acore typekern.sched.detect_lpe0x8000001D) preferred; symmetric/single-CCD → all perfTunables (
kern.sched.*)class_weight_eff(64),class_weight_lp(512),smt_busy_penalty(192,0=stock ULE),
prefer_compute(swap AMD cache/compute CCD),detect_lpe, and aread-only
core_classdump.Testing
sched_mic.candmp_x86.cbuild clean under-Werror; a MINIMAL-basedSCHED_MICkernel links into a complete kernel ELF with all symbols resolved.P-core threads → perf, 8 E-cores → eff, no LP-E (E-cores keep L3, so the LP
heuristic does not misfire).
AMD C-cores, and live placement behavior (needs a booted MIC kernel).
Notes / open questions
it spreads work off the preferred cores to equalize load, softening the
placement preference. Flipping one
sched_lowest()argument insched_balance_groupmakes the balancer reinforce packing instead — ahardware-dependent tradeoff. See
scheduler.mdfor details.scheduler.mdis included at the repo root as a design doc; drop it if you'drather it not live in the tree.
🤖 Generated with Claude Code
Summary by Sourcery
Introduce a new optional hybrid-core-aware scheduler (SCHED_MIC) derived from ULE and integrate x86 core classification, tunables, and documentation to prefer performance cores while remaining compatible with existing behavior.
New Features:
Enhancements: