From 7597a575588981600126787ac263f1abdc58d909 Mon Sep 17 00:00:00 2001 From: b-reyes Date: Thu, 8 Jan 2026 16:55:51 -0700 Subject: [PATCH 01/13] begin creating the FAQ for what arbiter2 is, remove commented out code in pl_tier_table.html --- docs/getting_started/faq.md | 23 ++ .../faq_html/arbiter_penalty_table.html | 264 ++++++++++++++++++ .../images_and_html/pl_tier_table.html | 24 -- 3 files changed, 287 insertions(+), 24 deletions(-) create mode 100644 docs/getting_started/faq_html/arbiter_penalty_table.html diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 7c7ca684..2f23b6a1 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -144,6 +144,29 @@ re-enroll by visiting . If that did not resolve your i ## General High Performance Computing +### What is Arbiter2? +::::{dropdown} Show +:icon: note + +[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah which allows us to monitor non-compute node resources for undesirable behavior. Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. + +When a user goes over defined limits for a given amount of time (see the table below), the amount of resources they have available to them on the login nodes will be throttled for a certain period of time. Additionally, the user will be sent a no-reply email stating they have violated our usage policies. If a user continues to violate the usage policy, they will eventually go to higher penalty states that have more strict throttling. If behavior is corrected, the user will return to a normal status and their usage will not be throttled. + + +| User Status | Resource Limit | Resource Threshold | Penalty Action Upon Threshold Exceeded | Duration of Penalty | +| :--- | :--- | :--- | :--- | :--- | +| **Normal** | **CPU:** 2 Virtual Cores
**Memory:** 2 GB | **Threshold Exceeded:** Any sustained usage above 50% of the limit (e.g., 1 Core / 1 GB) starts badness score accumulation. | Badness score starts to accumulate (typically leads to Penalty1 after a time period, e.g., 10 minutes) | N/A (Default State) | +| **Penalty1** | **CPU:** 2 Virtual Cores (No Throttling)
**Memory:** 2 GB (No Throttling) | **Threshold Exceeded:** Any sustained usage above 50% of the limit continues to increase the badness score. | **Warning:** Processes are *not* throttled, but email notification is sent. | 30 minutes | +| **Penalty2** | **CPU:** 1.2 Virtual Cores (60% of Normal)
**Memory:** 1.2 GB (60% of Normal) | **Threshold Exceeded:** Any sustained usage above 50% of the *new, lower limit* (e.g., 0.6 Cores / 0.6 GB) continues to increase the badness score. | **Throttling:** CPU-intensive processes are slowed. **Termination:** Processes exceeding the Memory limit are killed. | 60 minutes | +| **Penalty3** | **CPU:** 0.4 Virtual Cores (20% of Normal)
**Memory:** 0.4 GB (20% of Normal) | **Threshold Exceeded:** Any sustained usage above 50% of the *new, severely reduced limit* (e.g., 0.2 Cores / 0.2 GB) maintains the severe penalty. | **Severe Throttling/Termination:** Further reduced limits; memory limit violations terminate processes. | 120 minutes | + +```{eval-rst} +.. raw:: html + :file: ./faq_html/arbiter_penalty_table.html +``` + +:::: + ### How can I add users to a Linux group? ::::{dropdown} Show :icon: note diff --git a/docs/getting_started/faq_html/arbiter_penalty_table.html b/docs/getting_started/faq_html/arbiter_penalty_table.html new file mode 100644 index 00000000..d5f394ae --- /dev/null +++ b/docs/getting_started/faq_html/arbiter_penalty_table.html @@ -0,0 +1,264 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
HostPenalty StatusResource LimitResource ThresholdPenalty Action Upon Threshold ExceededDuration of Penalty
login nodesnormalCPU: 2 Virtual Cores Memory: 2 GBThreshold Exceeded: Any sustained usage above 50% of the limit (e.g., 1 Core / 1 GB) starts badness score accumulation.Badness score starts to accumulate (typically leads to Penalty1 after a time period, e.g., 10 minutes)N/A (Default State)
penalty1CPU: 2 Virtual Cores (No Throttling) Memory: 2 GB (No Throttling)Threshold Exceeded: Any sustained usage above 50% of the limit continues to increase the badness score.Warning: Processes are not throttled, but email notification is sent.30 minutes
penalty2CPU: 1.2 Virtual Cores (60% of Normal) Memory: 1.2 GB (60% of Normal)Threshold Exceeded: Any sustained usage above 50% of the new, lower limit (e.g., 0.6 Cores / 0.6 GB) continues to increase the badness score.Throttling: CPU-intensive processes are slowed. Termination: Processes exceeding the Memory limit are killed.60 minutes
penalty3CPU: 0.4 Virtual Cores (20% of Normal) Memory: 0.4 GB (20% of Normal)TThreshold Exceeded: Any sustained usage above 50% of the new, severely reduced limit (e.g., 0.2 Cores / 0.2 GB) maintains the severe penalty.Severe Throttling/Termination: Further reduced limits; memory limit violations terminate processes.120 minutes
diff --git a/docs/petalibrary/images_and_html/pl_tier_table.html b/docs/petalibrary/images_and_html/pl_tier_table.html index 9ea0ac4b..0808375c 100644 --- a/docs/petalibrary/images_and_html/pl_tier_table.html +++ b/docs/petalibrary/images_and_html/pl_tier_table.html @@ -9,30 +9,6 @@ } /* Light theme variables */ - /* html[data-theme="light"] { - --header-color: #91bff0; - --color-data-integrity: #f2f2f2; - --color-data-limits: #dfdddd;; - --color-data-access: #adabab; - --color-data-transfer: #908f8f; - --color-data-compression: #656464; - --border-color: black; - --text-color: black; - --header-border-weight: 2px; - --border-weight: 2px; - } */ - /* html[data-theme="light"] { - --header-color: #91bff0; - --color-data-integrity: #f2f2f2; - --color-data-limits: #dfdddd; - --color-data-access: #f2f2f2; - --color-data-transfer: #dfdddd; - --color-data-compression: #f2f2f2; - --border-color: black; - --text-color: black; - --header-border-weight: 2px; - --border-weight: 2px; - } */ html[data-theme="light"] { --header-color: #91bff0; --color-data-integrity: #dfdddd; From ca6a91d778a0c72216316d87d328c22f3d5352ff Mon Sep 17 00:00:00 2001 From: b-reyes Date: Mon, 12 Jan 2026 16:33:17 -0700 Subject: [PATCH 02/13] modify items in the arbiter penalty table and expand on Arbiter2 FAQ --- docs/getting_started/faq.md | 17 +- .../faq_html/arbiter_penalty_table.html | 218 +++--------------- 2 files changed, 43 insertions(+), 192 deletions(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 2f23b6a1..351f11a6 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -148,23 +148,24 @@ re-enroll by visiting . If that did not resolve your i ::::{dropdown} Show :icon: note -[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah which allows us to monitor non-compute node resources for undesirable behavior. Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. +[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah which allows us to monitor non-compute node resources for undesirable behavior. Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. -When a user goes over defined limits for a given amount of time (see the table below), the amount of resources they have available to them on the login nodes will be throttled for a certain period of time. Additionally, the user will be sent a no-reply email stating they have violated our usage policies. If a user continues to violate the usage policy, they will eventually go to higher penalty states that have more strict throttling. If behavior is corrected, the user will return to a normal status and their usage will not be throttled. +When a user goes over the resource threshold, the user will accrue what is called "badness". When a user accrues a badness of 100, the user will be moved into the next penalty state for a defined amount of time. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. If the user adjusts their work so that they are under the threshold and they have not reached a badness of 100, then their badness will reduce. Once a user enters a penalty state, they will stay in that penalty state for a specified duration. If the thresholds are not violated while in a penalty state, then the user will return to a normal state after the penalty duration. +Note that once you return to a normal state, you may not move sequentially through the penalty state, if you have violated the penalties multiple times. For example, you can go from a normal state to a penalty3 state, if you have ... -| User Status | Resource Limit | Resource Threshold | Penalty Action Upon Threshold Exceeded | Duration of Penalty | -| :--- | :--- | :--- | :--- | :--- | -| **Normal** | **CPU:** 2 Virtual Cores
**Memory:** 2 GB | **Threshold Exceeded:** Any sustained usage above 50% of the limit (e.g., 1 Core / 1 GB) starts badness score accumulation. | Badness score starts to accumulate (typically leads to Penalty1 after a time period, e.g., 10 minutes) | N/A (Default State) | -| **Penalty1** | **CPU:** 2 Virtual Cores (No Throttling)
**Memory:** 2 GB (No Throttling) | **Threshold Exceeded:** Any sustained usage above 50% of the limit continues to increase the badness score. | **Warning:** Processes are *not* throttled, but email notification is sent. | 30 minutes | -| **Penalty2** | **CPU:** 1.2 Virtual Cores (60% of Normal)
**Memory:** 1.2 GB (60% of Normal) | **Threshold Exceeded:** Any sustained usage above 50% of the *new, lower limit* (e.g., 0.6 Cores / 0.6 GB) continues to increase the badness score. | **Throttling:** CPU-intensive processes are slowed. **Termination:** Processes exceeding the Memory limit are killed. | 60 minutes | -| **Penalty3** | **CPU:** 0.4 Virtual Cores (20% of Normal)
**Memory:** 0.4 GB (20% of Normal) | **Threshold Exceeded:** Any sustained usage above 50% of the *new, severely reduced limit* (e.g., 0.2 Cores / 0.2 GB) maintains the severe penalty. | **Severe Throttling/Termination:** Further reduced limits; memory limit violations terminate processes. | 120 minutes | + For the threshold values and durations for each penalty state, please refer to the table below. Additionally, the user will be sent a no-reply email stating they have violated our usage policies. If a user continues to violate the usage policy, they will eventually go to higher penalty states that have more strict throttling. If behavior is corrected, the user will return to a normal status and their usage will not be throttled. ```{eval-rst} .. raw:: html :file: ./faq_html/arbiter_penalty_table.html ``` +```{important} +The information above for resource limits, duration of penalty, and the time it takes to get moved into the next penalty state may change over time as we adjust Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. +``` + + :::: ### How can I add users to a Linux group? diff --git a/docs/getting_started/faq_html/arbiter_penalty_table.html b/docs/getting_started/faq_html/arbiter_penalty_table.html index d5f394ae..19d5a9a7 100644 --- a/docs/getting_started/faq_html/arbiter_penalty_table.html +++ b/docs/getting_started/faq_html/arbiter_penalty_table.html @@ -1,128 +1,4 @@ - - @@ -224,41 +79,36 @@ - - + - + - - - - - + + + + - - - + + + - - + - - - + + + - - + - - - - - - + + + + +
Host Penalty Status Resource LimitResource ThresholdPenalty Action Upon Threshold ExceededWhat happens when I exceed a resource limit? Duration of Penalty
penalty1CPU: 2 Virtual Cores (No Throttling) Memory: 2 GB (No Throttling)
penalty2CPU: 1.2 Virtual Cores (60% of Normal) Memory: 1.2 GB (60% of Normal)
penalty3CPU: 0.4 Virtual Cores (20% of Normal) Memory: 0.4 GB (20% of Normal)TThreshold Exceeded: Any sustained usage above 50% of the new, severely reduced limit (e.g., 0.2 Cores / 0.2 GB) maintains the severe penalty.Severe Throttling/Termination: Further reduced limits; memory limit violations terminate processes.120 minutes
From 7a8d6ccfd23e175304bb41799a24db7805bb01ca Mon Sep 17 00:00:00 2001 From: b-reyes Date: Fri, 16 Jan 2026 17:06:22 -0700 Subject: [PATCH 03/13] add an arbiter flowchart to make the process more understandable --- docs/_static/custom.css | 4 + docs/getting_started/faq.md | 27 ++- .../dot_files/arbiter_flowchart.dot | 75 +++++++ .../generated_images/arbiter_flowchart.svg | 185 ++++++++++++++++++ 4 files changed, 283 insertions(+), 8 deletions(-) create mode 100644 graphviz_flowcharts/dot_files/arbiter_flowchart.dot create mode 100644 graphviz_flowcharts/generated_images/arbiter_flowchart.svg diff --git a/docs/_static/custom.css b/docs/_static/custom.css index ad896f80..f882bc2c 100644 --- a/docs/_static/custom.css +++ b/docs/_static/custom.css @@ -92,6 +92,10 @@ html[data-theme=light] .graph#doc-flowchart .node text { fill: black; } +html[data-theme=light] .graph#doc-flowchart .edge text { + fill: black; +} + .bd-content .sd-tab-set .sd-tab-content { padding: 1.5rem; } diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 351f11a6..44e952a5 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -148,23 +148,34 @@ re-enroll by visiting . If that did not resolve your i ::::{dropdown} Show :icon: note -[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah which allows us to monitor non-compute node resources for undesirable behavior. Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. +[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah that allows us to monitor non-compute node resources for undesirable behavior. For an in-depth explanation of how Arbiter2 works, please see the official paper ["Arbiter: Dynamically Limiting Resource Consumption on Login Nodes"](https://dl.acm.org/doi/10.1145/3332186.3333043). Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. -When a user goes over the resource threshold, the user will accrue what is called "badness". When a user accrues a badness of 100, the user will be moved into the next penalty state for a defined amount of time. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. If the user adjusts their work so that they are under the threshold and they have not reached a badness of 100, then their badness will reduce. Once a user enters a penalty state, they will stay in that penalty state for a specified duration. If the thresholds are not violated while in a penalty state, then the user will return to a normal state after the penalty duration. - -Note that once you return to a normal state, you may not move sequentially through the penalty state, if you have violated the penalties multiple times. For example, you can go from a normal state to a penalty3 state, if you have ... +In general, when a user goes over a defined resource threshold, the user will accrue what is called "badness", which is a value between 0 and 100. When a user accrues a badness of 100, the user will be moved into the next penalty state for a defined amount of time and receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. Once this duration ends, the user has adjusted their work so that they are under the threshold, and they have reached a badness of 0, then their resources will return to a normal state (i.e. no throttling will be applied). For a list of threshold values and durations for each penalty state, see table below. +```{important} +- Arbiter2 keeps track of the penalty state the user was last in. This means that if the user accrues a badness of 100 shortly after they returned to a normal state, they could potentially be moved into a higher penalty state, rather than sequentially going through the penalties. +- Arbiter2 is currently setup to track work across all login nodes. For this reason, the user's state will be the same on all login nodes. +- Resource limits, duration of penalty, and the time it takes to get moved into the next penalty state may change over time as we adjust Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. +``` - For the threshold values and durations for each penalty state, please refer to the table below. Additionally, the user will be sent a no-reply email stating they have violated our usage policies. If a user continues to violate the usage policy, they will eventually go to higher penalty states that have more strict throttling. If behavior is corrected, the user will return to a normal status and their usage will not be throttled. ```{eval-rst} .. raw:: html :file: ./faq_html/arbiter_penalty_table.html ``` -```{important} -The information above for resource limits, duration of penalty, and the time it takes to get moved into the next penalty state may change over time as we adjust Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. -``` +```{eval-rst} +.. raw:: html +
+
+ +.. raw:: html + :file: ../../graphviz_flowcharts/generated_images/arbiter_flowchart.svg + +.. raw:: html + +
+``` :::: diff --git a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot new file mode 100644 index 00000000..e72390f9 --- /dev/null +++ b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot @@ -0,0 +1,75 @@ +digraph "" { + bgcolor="transparent"; + graph [id="doc-flowchart", rankdir=TB, nodesep=0.5, ranksep=0.4, bgcolor="none"]; + + // Node Styling + node [fontname="Verdana", fontsize="16", color="#CFB87C", style="filled", fillcolor="#121212", penwidth="2", fontcolor="white", shape=ellipse]; + edge [color="#CFB87C", fillcolor="#121212", penwidth="1.5", fontsize="14", fontcolor="#CFB87C"]; + + // --- ROW 0: The solitary top node --- + { rank=source; DoSomething; } + + // --- ROW 1: The Start --- + { rank=same; NS; Exceed; Badness; } + + // --- ROW 4: Logic Gates --- + { rank=same; TooBad; Track; } + + // --- ROW 5: Branching Outcomes --- + { rank=same; Penalty1; NextPen; } + + // --- ROW 6: Post-Penalty Actions --- + { rank=same; SentEmail; DurationTimer; } + { rank=same; DurationEnded; } + + // --- ROW 7: State Management --- + { rank=same; StayIn; BadnessFed; } + + // --- ROW 8: Final Check --- + { rank=same; BadAtZero; } + + // Nodes + NS [label="normal state"] + Exceed [label="Thresholds exceeded?", style="filled,dashed"] + DoSomething [label="If penalty occurrences > 0,\nstart the timer; at zero,\nreduce penalty occurrences\nby 1, repeating until zero.", fontsize="10"] + Badness [label="Badness accumulates"] + TooBad [label="Has badness\nreached 100?", style="filled,dashed"] + Track [label="Do you have >0\npenalty occurrences?", style="filled,dashed"] + Penalty1 [label="penalty1 state\nassigned"] + NextPen [label="Penalty state\nincreased by 1"] + SentEmail [label="Warning email sent\nand resources throttled"] + DurationTimer [label="Penalty duration\ntimer starts"] + DurationEnded [label="Has the penalty\nduration ended?", style="filled,dashed"] + StayIn [label="Remain in\npenalty state"] + BadnessFed [label="Badness begins\nto reduce"] + BadAtZero [label="Is badness\nat zero?", style="filled,dashed"] + + // Flow Logic + NS -> Exceed + + // Use constraint=false to point UP to the top row without pulling the chart apart + Exceed -> DoSomething [label=" No"] + Exceed -> Badness [label="Yes"] + + Badness -> TooBad + TooBad -> Badness [label=" No"] + TooBad -> Track [label="Yes", constraint=false] + + Track -> Penalty1 [label=" No"] + Track -> NextPen [label=" Yes"] + + Penalty1 -> SentEmail + NextPen -> SentEmail + + SentEmail -> DurationTimer [constraint=false] + DurationTimer -> DurationEnded + + DurationEnded -> StayIn [label=" No"] + DurationEnded -> BadnessFed [label=" Yes"] + + BadnessFed -> BadAtZero + BadAtZero -> BadnessFed [label=" No"] + + // Loop back to start + BadAtZero -> NS [label=" Yes"] +} \ No newline at end of file diff --git a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg new file mode 100644 index 00000000..30bd12f0 --- /dev/null +++ b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg @@ -0,0 +1,185 @@ + + + + + + + +If penalty occurrences > 0, +start the timer; at zero, +reduce penalty occurrences +by 1, repeating until zero. + + + + +normal state + + + + +Thresholds exceeded? + + + + + + + + + + +  No + + + + +Badness accumulates + + + + + +Yes + + + + +Has badness +reached 100? + + + + + + + + + + +  No + + + + +Do you have >0 +penalty occurrences? + + + + + +Yes + + + + +penalty1 state +assigned + + + + + + No + + + + +Penalty state +increased by 1 + + + + + + Yes + + + + +Warning email sent +and resources throttled + + + + + + + + + + + + + + +Penalty duration +timer starts + + + + + + + + + +Has the penalty +duration ended? + + + + + + + + + +Remain in +penalty state + + + + + + No + + + + +Badness begins +to reduce + + + + + + Yes + + + + +Is badness +at zero? + + + + + + + + + + +  Yes + + + + + +  No + + + \ No newline at end of file From 96aebfc00acfe6f70254ed2274ad4f2e5602794d Mon Sep 17 00:00:00 2001 From: b-reyes Date: Fri, 16 Jan 2026 17:15:16 -0700 Subject: [PATCH 04/13] fix arbiter flowchart logic --- .../dot_files/arbiter_flowchart.dot | 7 +- .../generated_images/arbiter_flowchart.svg | 203 ++++++++---------- 2 files changed, 94 insertions(+), 116 deletions(-) diff --git a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot index e72390f9..0941e549 100644 --- a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot +++ b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot @@ -23,7 +23,7 @@ digraph "" { { rank=same; DurationEnded; } // --- ROW 7: State Management --- - { rank=same; StayIn; BadnessFed; } + { rank=same; BadnessFed; } // --- ROW 8: Final Check --- { rank=same; BadAtZero; } @@ -40,7 +40,6 @@ digraph "" { SentEmail [label="Warning email sent\nand resources throttled"] DurationTimer [label="Penalty duration\ntimer starts"] DurationEnded [label="Has the penalty\nduration ended?", style="filled,dashed"] - StayIn [label="Remain in\npenalty state"] BadnessFed [label="Badness begins\nto reduce"] BadAtZero [label="Is badness\nat zero?", style="filled,dashed"] @@ -52,7 +51,6 @@ digraph "" { Exceed -> Badness [label="Yes"] Badness -> TooBad - TooBad -> Badness [label=" No"] TooBad -> Track [label="Yes", constraint=false] Track -> Penalty1 [label=" No"] @@ -64,11 +62,10 @@ digraph "" { SentEmail -> DurationTimer [constraint=false] DurationTimer -> DurationEnded - DurationEnded -> StayIn [label=" No"] DurationEnded -> BadnessFed [label=" Yes"] BadnessFed -> BadAtZero - BadAtZero -> BadnessFed [label=" No"] + BadnessFed -> Exceed // Loop back to start BadAtZero -> NS [label=" Yes"] diff --git a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg index 30bd12f0..c19e6c87 100644 --- a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg +++ b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg @@ -1,185 +1,166 @@ - - + --> + + - -If penalty occurrences > 0, -start the timer; at zero, -reduce penalty occurrences -by 1, repeating until zero. + +If penalty occurrences > 0, +start the timer; at zero, +reduce penalty occurrences +by 1, repeating until zero. - -normal state + +normal state - -Thresholds exceeded? + +Thresholds exceeded? - - + + - - -  No + + +  No - -Badness accumulates + +Badness accumulates - - -Yes + + +Yes - -Has badness -reached 100? + +Has badness +reached 100? - - - - - - - -  No + + - -Do you have >0 -penalty occurrences? + +Do you have >0 +penalty occurrences? - - - -Yes + + + +Yes - -penalty1 state -assigned + +penalty1 state +assigned - - - - No + + + + No - -Penalty state -increased by 1 + +Penalty state +increased by 1 - - - - Yes + + + + Yes - -Warning email sent -and resources throttled + +Warning email sent +and resources throttled - - - + + + - - - + + + - -Penalty duration -timer starts + +Penalty duration +timer starts - - - + + + - -Has the penalty -duration ended? + +Has the penalty +duration ended? - - - - - - - -Remain in -penalty state - - - - - - No + + + - - -Badness begins -to reduce + + +Badness begins +to reduce + + + + Yes + + - - - Yes + + - - -Is badness -at zero? + + +Is badness +at zero? - - - + + + - - - -  Yes - - - - - -  No + + + +  Yes \ No newline at end of file From da1db6f3170bd9456ac5162d4b6ddd32fbc54cc7 Mon Sep 17 00:00:00 2001 From: b-reyes Date: Fri, 16 Jan 2026 17:40:05 -0700 Subject: [PATCH 05/13] add a logical step asking if you went below the threshold as the badness accumulated in the arbiter flow chart --- .../dot_files/arbiter_flowchart.dot | 9 +- .../generated_images/arbiter_flowchart.svg | 220 ++++++++++-------- 2 files changed, 128 insertions(+), 101 deletions(-) diff --git a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot index 0941e549..ca0cc301 100644 --- a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot +++ b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot @@ -13,7 +13,7 @@ digraph "" { { rank=same; NS; Exceed; Badness; } // --- ROW 4: Logic Gates --- - { rank=same; TooBad; Track; } + { rank=same; TooBad; BehFix; Track; } // --- ROW 5: Branching Outcomes --- { rank=same; Penalty1; NextPen; } @@ -33,6 +33,7 @@ digraph "" { Exceed [label="Thresholds exceeded?", style="filled,dashed"] DoSomething [label="If penalty occurrences > 0,\nstart the timer; at zero,\nreduce penalty occurrences\nby 1, repeating until zero.", fontsize="10"] Badness [label="Badness accumulates"] + BehFix [label="Below thresholds?", style="filled,dashed"] TooBad [label="Has badness\nreached 100?", style="filled,dashed"] Track [label="Do you have >0\npenalty occurrences?", style="filled,dashed"] Penalty1 [label="penalty1 state\nassigned"] @@ -49,8 +50,12 @@ digraph "" { // Use constraint=false to point UP to the top row without pulling the chart apart Exceed -> DoSomething [label=" No"] Exceed -> Badness [label="Yes"] - + + Badness -> BehFix Badness -> TooBad + + BehFix -> Badness [label=" No"] + BehFix -> BadnessFed [label=" Yes"] TooBad -> Track [label="Yes", constraint=false] Track -> Penalty1 [label=" No"] diff --git a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg index c19e6c87..d9d6a5aa 100644 --- a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg +++ b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg @@ -1,166 +1,188 @@ - - + --> + + - -If penalty occurrences > 0, -start the timer; at zero, -reduce penalty occurrences -by 1, repeating until zero. + +If penalty occurrences > 0, +start the timer; at zero, +reduce penalty occurrences +by 1, repeating until zero. - -normal state + +normal state - -Thresholds exceeded? + +Thresholds exceeded? - - + + - - -  No + + +  No - -Badness accumulates + +Badness accumulates - - -Yes + + +Yes - -Has badness -reached 100? + +Has badness +reached 100? + + + + + + + +Below thresholds? + + - - + + - - -Do you have >0 -penalty occurrences? + + +Do you have >0 +penalty occurrences? - - - -Yes + + + +Yes + + + + + +  No + + + + +Badness begins +to reduce + + + + + +  Yes - - -penalty1 state -assigned + + +penalty1 state +assigned - - - - No + + + + No - - -Penalty state -increased by 1 + + +Penalty state +increased by 1 - - - - Yes + + + + Yes - - -Warning email sent -and resources throttled + + +Warning email sent +and resources throttled - - - + + + - - - + + + - - -Penalty duration -timer starts + + +Penalty duration +timer starts - - - + + + - - -Has the penalty -duration ended? + + +Has the penalty +duration ended? - - - - - - - -Badness begins -to reduce + + + - - - - Yes + + + + Yes - - - + + + - - -Is badness -at zero? + + +Is badness +at zero? - - - + + + - - - -  Yes + + + +  Yes \ No newline at end of file From 239fc7113931ad5ef6c30868b65c6c1d9a9d9c0b Mon Sep 17 00:00:00 2001 From: b-reyes Date: Fri, 16 Jan 2026 17:43:23 -0700 Subject: [PATCH 06/13] add penalty occurances increase to arbiter flow chart --- .../dot_files/arbiter_flowchart.dot | 2 +- .../generated_images/arbiter_flowchart.svg | 159 +++++++++--------- 2 files changed, 81 insertions(+), 80 deletions(-) diff --git a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot index ca0cc301..31c3cb4f 100644 --- a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot +++ b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot @@ -38,7 +38,7 @@ digraph "" { Track [label="Do you have >0\npenalty occurrences?", style="filled,dashed"] Penalty1 [label="penalty1 state\nassigned"] NextPen [label="Penalty state\nincreased by 1"] - SentEmail [label="Warning email sent\nand resources throttled"] + SentEmail [label="Warning email sent, penalty\noccurrences increases by 1,\nand resources throttled"] DurationTimer [label="Penalty duration\ntimer starts"] DurationEnded [label="Has the penalty\nduration ended?", style="filled,dashed"] BadnessFed [label="Badness begins\nto reduce"] diff --git a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg index d9d6a5aa..d9a061f2 100644 --- a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg +++ b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg @@ -1,86 +1,86 @@ - - + --> + + - -If penalty occurrences > 0, -start the timer; at zero, -reduce penalty occurrences -by 1, repeating until zero. + +If penalty occurrences > 0, +start the timer; at zero, +reduce penalty occurrences +by 1, repeating until zero. - -normal state + +normal state - -Thresholds exceeded? + +Thresholds exceeded? - - + + - - -  No + + +  No - -Badness accumulates + +Badness accumulates - - -Yes + + +Yes - -Has badness -reached 100? + +Has badness +reached 100? - - + + - -Below thresholds? + +Below thresholds? - - + + - -Do you have >0 -penalty occurrences? + +Do you have >0 +penalty occurrences? - - -Yes + + +Yes - - -  No + + +  No @@ -90,82 +90,83 @@ - + -  Yes +  Yes - -penalty1 state -assigned + +penalty1 state +assigned - - - No + + + No - -Penalty state -increased by 1 + +Penalty state +increased by 1 - - - Yes + + + Yes - -Warning email sent -and resources throttled + +Warning email sent, penalty +occurrences increases by 1, +and resources throttled - - + + - - + + - -Penalty duration -timer starts + +Penalty duration +timer starts - - + + - -Has the penalty -duration ended? + +Has the penalty +duration ended? - - + + - - - Yes + + + Yes - - + + @@ -180,9 +181,9 @@ - - -  Yes + + +  Yes \ No newline at end of file From 3a422ac9d1823810755a9fd43b05d16dbff14f58 Mon Sep 17 00:00:00 2001 From: b-reyes Date: Wed, 21 Jan 2026 17:53:38 -0700 Subject: [PATCH 07/13] fix logic in arbiter graph and begin working on the arbiter table --- docs/getting_started/faq.md | 4 +- .../faq_html/arbiter_penalty_table.html | 14 +- .../dot_files/arbiter_flowchart.dot | 55 ++-- .../generated_images/arbiter_flowchart.svg | 279 +++++++++--------- 4 files changed, 172 insertions(+), 180 deletions(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 44e952a5..8be87d68 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -150,7 +150,7 @@ re-enroll by visiting . If that did not resolve your i [Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah that allows us to monitor non-compute node resources for undesirable behavior. For an in-depth explanation of how Arbiter2 works, please see the official paper ["Arbiter: Dynamically Limiting Resource Consumption on Login Nodes"](https://dl.acm.org/doi/10.1145/3332186.3333043). Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. -In general, when a user goes over a defined resource threshold, the user will accrue what is called "badness", which is a value between 0 and 100. When a user accrues a badness of 100, the user will be moved into the next penalty state for a defined amount of time and receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. Once this duration ends, the user has adjusted their work so that they are under the threshold, and they have reached a badness of 0, then their resources will return to a normal state (i.e. no throttling will be applied). For a list of threshold values and durations for each penalty state, see table below. +In general, when a user goes over a defined resource threshold, the user will accrue what is called "badness", which is a value between 0 and 100. When a user accrues a badness of 100, the user will be moved into the next penalty state (for a defined amount of time), "penalty occurrences" will increment by one, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. Once this duration ends, the user has adjusted their work so that they are under the threshold, and they have reached a badness of 0, then their resources will return to a normal state (i.e. no throttling will be applied). If a user is in a normal state and does not accrue badness, their penalty occurrences will reduce by 1 after a defined duration. For a list of threshold values and durations for each penalty state, see the table below. Below we also provide a flowchart for the logic provided here to ```{important} - Arbiter2 keeps track of the penalty state the user was last in. This means that if the user accrues a badness of 100 shortly after they returned to a normal state, they could potentially be moved into a higher penalty state, rather than sequentially going through the penalties. @@ -167,7 +167,7 @@ In general, when a user goes over a defined resource threshold, the user will ac .. raw:: html
-
+
.. raw:: html :file: ../../graphviz_flowcharts/generated_images/arbiter_flowchart.svg diff --git a/docs/getting_started/faq_html/arbiter_penalty_table.html b/docs/getting_started/faq_html/arbiter_penalty_table.html index 19d5a9a7..9476b436 100644 --- a/docs/getting_started/faq_html/arbiter_penalty_table.html +++ b/docs/getting_started/faq_html/arbiter_penalty_table.html @@ -78,7 +78,7 @@ Host Penalty Status - Resource Limit + Resource usage maximum What happens when I exceed a resource limit? Duration of Penalty @@ -88,26 +88,26 @@ login nodes normal - 2 CPU cores
2 GB of RAM - When you exceed either or both thresholds for 10 minutes, . + 4 CPU cores
4 GB of RAM + After utilizing 3 CPU cores and or 3 GB of RAM for 10 minutes, the user will be placed into a penalty1 state and their penalty occurrences will increment by 1. N/A penalty1 - 2 CPU cores
2 GB of RAM - Threshold Exceeded: Any sustained usage above 50% of the limit continues to increase the badness score. + 3 CPU cores
3 GB of RAM + After 10 minutes, above the resource usage threshold, the user will be placed into a penalty2 state and their penalty occurrences will increment by 1. 30 minutes penalty2 2 CPU cores
2 GB of RAM - Threshold Exceeded: Any sustained usage above 50% of the new, lower limit (e.g., 0.6 Cores / 0.6 GB) continues to increase the badness score. + After 10 minutes, above a threshold, the user will be placed into a penalty3 state and their penalty occurrences will increment by 1. 60 minutes penalty3 2 CPU cores
2 GB of RAM - Threshold Exceeded: Any sustained usage above 50% of the new, severely reduced limit (e.g., 0.2 Cores / 0.2 GB) maintains the severe penalty. + After 10 minutes, above a threshold, the user will remain in the penalty3 state and their penalty occurrences will increment by 1. 120 minutes diff --git a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot index 31c3cb4f..a6a779a7 100644 --- a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot +++ b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot @@ -1,77 +1,68 @@ digraph "" { bgcolor="transparent"; - graph [id="doc-flowchart", rankdir=TB, nodesep=0.5, ranksep=0.4, bgcolor="none"]; + graph [id="doc-flowchart", rankdir=TB, nodesep=0.7, ranksep=0.7, bgcolor="none", splines=ortho]; // Node Styling node [fontname="Verdana", fontsize="16", color="#CFB87C", style="filled", fillcolor="#121212", penwidth="2", fontcolor="white", shape=ellipse]; edge [color="#CFB87C", fillcolor="#121212", penwidth="1.5", fontsize="14", fontcolor="#CFB87C"]; // --- ROW 0: The solitary top node --- - { rank=source; DoSomething; } + { rank=source; DoSomething; BadnessRed; BehFix;} // --- ROW 1: The Start --- { rank=same; NS; Exceed; Badness; } - // --- ROW 4: Logic Gates --- - { rank=same; TooBad; BehFix; Track; } - // --- ROW 5: Branching Outcomes --- - { rank=same; Penalty1; NextPen; } - - // --- ROW 6: Post-Penalty Actions --- - { rank=same; SentEmail; DurationTimer; } - { rank=same; DurationEnded; } + { rank=same; Track; TooBad; } - // --- ROW 7: State Management --- - { rank=same; BadnessFed; } + { rank=same; Penalty1; NextPen; } - // --- ROW 8: Final Check --- - { rank=same; BadAtZero; } + // --- ROW 6: Post-Penalty Actions --- + { rank=same; DurationEnded; SentEmail; DurationTimer; } + // { rank=same; DurationEnded; } // Nodes NS [label="normal state"] - Exceed [label="Thresholds exceeded?", style="filled,dashed"] - DoSomething [label="If penalty occurrences > 0,\nstart the timer; at zero,\nreduce penalty occurrences\nby 1, repeating until zero.", fontsize="10"] + Exceed [label="Usage thresholds\nexceeded?", style="filled,dashed"] + DoSomething [label="If penalty occurrences > 0,\nstart the timer; at zero,\nreduce penalty occurrences\nby 1, repeating until zero.", fontsize="13"] Badness [label="Badness accumulates"] - BehFix [label="Below thresholds?", style="filled,dashed"] + BehFix [label="Below usage\nthresholds?", style="filled,dashed"] TooBad [label="Has badness\nreached 100?", style="filled,dashed"] Track [label="Do you have >0\npenalty occurrences?", style="filled,dashed"] Penalty1 [label="penalty1 state\nassigned"] NextPen [label="Penalty state\nincreased by 1"] SentEmail [label="Warning email sent, penalty\noccurrences increases by 1,\nand resources throttled"] - DurationTimer [label="Penalty duration\ntimer starts"] + DurationTimer [label="Penalty duration\ntimer counts down"] DurationEnded [label="Has the penalty\nduration ended?", style="filled,dashed"] - BadnessFed [label="Badness begins\nto reduce"] - BadAtZero [label="Is badness\nat zero?", style="filled,dashed"] + BadnessRed [label="Badness reduces"] // Flow Logic NS -> Exceed + NS -> DoSomething // Use constraint=false to point UP to the top row without pulling the chart apart - Exceed -> DoSomething [label=" No"] + Exceed -> NS [xlabel=" No", constraint=false] Exceed -> Badness [label="Yes"] Badness -> BehFix Badness -> TooBad - BehFix -> Badness [label=" No"] - BehFix -> BadnessFed [label=" Yes"] + BehFix -> Badness [xlabel=" No"] + BehFix -> BadnessRed [label="Yes", constraint=false] TooBad -> Track [label="Yes", constraint=false] + TooBad -> Badness [xlabel="No "] - Track -> Penalty1 [label=" No"] - Track -> NextPen [label=" Yes"] + Track -> Penalty1 [xlabel=" No"] + Track -> NextPen [xlabel="Yes "] Penalty1 -> SentEmail NextPen -> SentEmail SentEmail -> DurationTimer [constraint=false] - DurationTimer -> DurationEnded - - DurationEnded -> BadnessFed [label=" Yes"] + DurationTimer -> DurationEnded [constraint=false] - BadnessFed -> BadAtZero - BadnessFed -> Exceed + DurationEnded -> NS [xlabel=" Yes"] + DurationEnded -> DurationTimer [xlabel=" No", constraint=false] - // Loop back to start - BadAtZero -> NS [label=" Yes"] + BadnessRed -> Exceed } \ No newline at end of file diff --git a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg index d9a061f2..394ce6d2 100644 --- a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg +++ b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg @@ -1,189 +1,190 @@ - - + --> + + - -If penalty occurrences > 0, -start the timer; at zero, -reduce penalty occurrences -by 1, repeating until zero. + +If penalty occurrences > 0, +start the timer; at zero, +reduce penalty occurrences +by 1, repeating until zero. - + - -normal state + +Badness reduces - - -Thresholds exceeded? + + +Usage thresholds +exceeded? - - - - + + + + - - - - -  No + + + +Below usage +thresholds? + + + + + +Yes - - -Badness accumulates + + +Badness accumulates - - - - -Yes + + + + +  No - - - -Has badness -reached 100? + + + +normal state - - - - + + + + - - - -Below thresholds? + + + + - + + + + +  No + + - - + + +Yes - - - -Do you have >0 -penalty occurrences? + + + + - - - - -Yes + + + +Has badness +reached 100? - + - - -  No + + - - - -Badness begins -to reduce - - - - - -  Yes + + + +Do you have >0 +penalty occurrences? - - -penalty1 state -assigned + + +penalty1 state +assigned - - - - No + + + + No - - -Penalty state -increased by 1 + + +Penalty state +increased by 1 + + + +Yes   + + - - - Yes + + +No   + + + + + +Yes - - -Warning email sent, penalty -occurrences increases by 1, -and resources throttled + + +Warning email sent, penalty +occurrences increases by 1, +and resources throttled - - - + + + - - - + + + - + - -Penalty duration -timer starts + +Has the penalty +duration ended? - - - - + + + + + Yes - - - -Has the penalty -duration ended? + + + +Penalty duration +timer counts down - - - - + + + + + No - + - - - Yes - - - - - - - - - -Is badness -at zero? + + - + - - - - - - - -  Yes + + \ No newline at end of file From 13e23afba5cd68d4527afe2de103ac7e1fa4284f Mon Sep 17 00:00:00 2001 From: b-reyes Date: Thu, 22 Jan 2026 11:59:40 -0700 Subject: [PATCH 08/13] rearrange Arbiter2 FAQ --- docs/getting_started/faq.md | 35 +++- .../faq_html/arbiter_penalty_table.html | 12 +- .../dot_files/arbiter_flowchart.dot | 4 +- .../generated_images/arbiter_flowchart.svg | 178 +++++++++--------- 4 files changed, 123 insertions(+), 106 deletions(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 8be87d68..38b28f61 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -148,20 +148,33 @@ re-enroll by visiting . If that did not resolve your i ::::{dropdown} Show :icon: note -[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah that allows us to monitor non-compute node resources for undesirable behavior. For an in-depth explanation of how Arbiter2 works, please see the official paper ["Arbiter: Dynamically Limiting Resource Consumption on Login Nodes"](https://dl.acm.org/doi/10.1145/3332186.3333043). Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. - -In general, when a user goes over a defined resource threshold, the user will accrue what is called "badness", which is a value between 0 and 100. When a user accrues a badness of 100, the user will be moved into the next penalty state (for a defined amount of time), "penalty occurrences" will increment by one, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. Once this duration ends, the user has adjusted their work so that they are under the threshold, and they have reached a badness of 0, then their resources will return to a normal state (i.e. no throttling will be applied). If a user is in a normal state and does not accrue badness, their penalty occurrences will reduce by 1 after a defined duration. For a list of threshold values and durations for each penalty state, see the table below. Below we also provide a flowchart for the logic provided here to +[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah that allows us to monitor non-compute node resources for undesirable behavior. For an in-depth explanation of how Arbiter2 works, please see the official paper ["Arbiter: Dynamically Limiting Resource Consumption on Login Nodes"](https://dl.acm.org/doi/10.1145/3332186.3333043) (for a general overview, please see the remaining content in this FAQ). Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. ```{important} -- Arbiter2 keeps track of the penalty state the user was last in. This means that if the user accrues a badness of 100 shortly after they returned to a normal state, they could potentially be moved into a higher penalty state, rather than sequentially going through the penalties. - Arbiter2 is currently setup to track work across all login nodes. For this reason, the user's state will be the same on all login nodes. -- Resource limits, duration of penalty, and the time it takes to get moved into the next penalty state may change over time as we adjust Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. +- Hosts that utilize Arbiter2 and Arbiter2 configuration values may change over time, as we adjust Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. ``` -```{eval-rst} -.. raw:: html - :file: ./faq_html/arbiter_penalty_table.html -``` +For those interested, we will now provide details on how Arbiter2 works, which includes how penalty states are applied. Before we get started, it is important to define some Arbiter2 terms. Please review the following table before proceeding. + +| Term | Description | +| :---------------------- | :--------------------------------------------- | +| badness | A value between 0 and 100 that accrues when a user exceeds defined resource thresholds | +| normal state | | +| Penalty state | | +| Penalty occurrences | | +| Penalty occurrences timer | | +cpu_badness_threshold = 0.75 +mem_badness_threshold = 0.75 +time_to_max_bad = 60 # 10 minutes +time_to_min_bad = 220 # 30 minutes + +In general, when a user goes over the resource threshold, the user will accrue badness. When a user accrues a badness of 100, the user's penalty occurrences will be incremented by 1, they will then be moved into the penalty state based on the penalty occurrences, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. + +Once the penalty state duration ends, the user will be placed back into the normal state (i.e. no throttling will be applied). If a user is in a normal state and does not accrue badness, their penalty occurrences will reduce by 1 after the penalty occurrence timer reaches zero. To improve + + +For a list of threshold values and durations for each penalty state, see the table below. Below we also provide a flowchart for the logic provided here to ```{eval-rst} .. raw:: html @@ -175,6 +188,10 @@ In general, when a user goes over a defined resource threshold, the user will ac .. raw:: html
+
+ +.. raw:: html + :file: ./faq_html/arbiter_penalty_table.html ``` :::: diff --git a/docs/getting_started/faq_html/arbiter_penalty_table.html b/docs/getting_started/faq_html/arbiter_penalty_table.html index 9476b436..b6957aed 100644 --- a/docs/getting_started/faq_html/arbiter_penalty_table.html +++ b/docs/getting_started/faq_html/arbiter_penalty_table.html @@ -77,9 +77,9 @@ Host - Penalty Status + Penalty State Resource usage maximum - What happens when I exceed a resource limit? + Penalty occurrences Duration of Penalty @@ -89,25 +89,25 @@ login nodes normal 4 CPU cores
4 GB of RAM - After utilizing 3 CPU cores and or 3 GB of RAM for 10 minutes, the user will be placed into a penalty1 state and their penalty occurrences will increment by 1. + N/A N/A penalty1 3 CPU cores
3 GB of RAM - After 10 minutes, above the resource usage threshold, the user will be placed into a penalty2 state and their penalty occurrences will increment by 1. + 1 30 minutes penalty2 2 CPU cores
2 GB of RAM - After 10 minutes, above a threshold, the user will be placed into a penalty3 state and their penalty occurrences will increment by 1. + 2 60 minutes penalty3 2 CPU cores
2 GB of RAM - After 10 minutes, above a threshold, the user will remain in the penalty3 state and their penalty occurrences will increment by 1. + 3 or more 120 minutes diff --git a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot index a6a779a7..e0d982f5 100644 --- a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot +++ b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot @@ -24,7 +24,7 @@ digraph "" { // Nodes NS [label="normal state"] Exceed [label="Usage thresholds\nexceeded?", style="filled,dashed"] - DoSomething [label="If penalty occurrences > 0,\nstart the timer; at zero,\nreduce penalty occurrences\nby 1, repeating until zero.", fontsize="13"] + DoSomething [label="If penalty occurrences > 0,\nstart penalty occurrences timer;\n at zero, reduce penalty occurrences\nby 1, repeating until zero.", fontsize="13"] Badness [label="Badness accumulates"] BehFix [label="Below usage\nthresholds?", style="filled,dashed"] TooBad [label="Has badness\nreached 100?", style="filled,dashed"] @@ -61,7 +61,7 @@ digraph "" { SentEmail -> DurationTimer [constraint=false] DurationTimer -> DurationEnded [constraint=false] - DurationEnded -> NS [xlabel=" Yes"] + DurationEnded -> NS [xlabel="Yes "] DurationEnded -> DurationTimer [xlabel=" No", constraint=false] BadnessRed -> Exceed diff --git a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg index 394ce6d2..75495371 100644 --- a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg +++ b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg @@ -1,190 +1,190 @@ + --> - + - -If penalty occurrences > 0, -start the timer; at zero, -reduce penalty occurrences -by 1, repeating until zero. + +If penalty occurrences > 0, +start penalty occurrences timer; + at zero, reduce penalty occurrences +by 1, repeating until zero. - -Badness reduces + +Badness reduces - -Usage thresholds -exceeded? + +Usage thresholds +exceeded? - - + + - -Below usage -thresholds? + +Below usage +thresholds? - - -Yes + + +Yes - -Badness accumulates + +Badness accumulates - - -  No + + +  No - -normal state + +normal state - - + + - - + + - - -  No + + +  No - - -Yes + + +Yes - - + + - -Has badness -reached 100? + +Has badness +reached 100? - - + + - -Do you have >0 -penalty occurrences? + +Do you have >0 +penalty occurrences? - -penalty1 state -assigned + +penalty1 state +assigned - - - No + + + No - -Penalty state -increased by 1 + +Penalty state +increased by 1 - - -Yes   + + +Yes   - - -No   + + +No   - - -Yes + + +Yes - -Warning email sent, penalty -occurrences increases by 1, -and resources throttled + +Warning email sent, penalty +occurrences increases by 1, +and resources throttled - - + + - - + + - -Has the penalty -duration ended? + +Has the penalty +duration ended? - - - Yes + + +Yes   - -Penalty duration -timer counts down + +Penalty duration +timer counts down - - - No + + + No - - + + - - + + \ No newline at end of file From 597eb6d6a8fccf100875a77f4eeb24b9ad772307 Mon Sep 17 00:00:00 2001 From: b-reyes Date: Thu, 22 Jan 2026 16:30:48 -0700 Subject: [PATCH 09/13] fill in all of the descriptions for the different terms --- docs/getting_started/faq.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 38b28f61..3931b9b5 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -160,14 +160,14 @@ For those interested, we will now provide details on how Arbiter2 works, which i | Term | Description | | :---------------------- | :--------------------------------------------- | | badness | A value between 0 and 100 that accrues when a user exceeds defined resource thresholds | -| normal state | | -| Penalty state | | -| Penalty occurrences | | -| Penalty occurrences timer | | -cpu_badness_threshold = 0.75 -mem_badness_threshold = 0.75 -time_to_max_bad = 60 # 10 minutes -time_to_min_bad = 220 # 30 minutes +| normal state | The default user state that has the maximum amount of CPU and memory resources| +| Penalty state | A user state with CPU and memory constraints applied | +| Penalty occurrences | A variable that is used to determine what penalty state the user should be put in (see the table below for penalty state and penalty occurrences mapping) | +| Penalty occurrences timer | A variable that defines how long the user must be in the normal state before their penalty occurrences is reduced by 1 | +| CPU threshold | A threshold percentage of normal-state CPU capacity that triggers badness accumulation. With the value set to `0.75` and `4` CPUs available in the normal state, badness begins accumulating when usage exceeds `3` CPUs (`4 × 0.75`). | +| Memory threshold | A threshold percentage of normal-state memory (RAM) capacity that triggers badness accumulation. With the value set to `0.75` and `4 GB` of memory available in the normal state, badness begins accumulating when usage exceeds `3 GB` (`4 × 0.75`). +| Time to max baddness | The amount of time spent over a threshold that will trigger an increase in penalty occurrences. Currently, this value is set to 10 minutes | +| Time to min baddness | The amount of time spent under all thresholds to go from 100 to 0 badness. Currently, this value is set to 30 minutes | In general, when a user goes over the resource threshold, the user will accrue badness. When a user accrues a badness of 100, the user's penalty occurrences will be incremented by 1, they will then be moved into the penalty state based on the penalty occurrences, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. From 5d6540d6eaf3dadcb66b0b2d8254179b0fc9fab1 Mon Sep 17 00:00:00 2001 From: b-reyes Date: Thu, 22 Jan 2026 16:52:08 -0700 Subject: [PATCH 10/13] modify the arbiter flowchart so that the penalty occurences items are wrapped into the email node --- docs/getting_started/faq.md | 7 +- .../dot_files/arbiter_flowchart.dot | 22 +- .../generated_images/arbiter_flowchart.svg | 205 +++++++----------- 3 files changed, 90 insertions(+), 144 deletions(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 3931b9b5..0a351fd5 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -169,12 +169,9 @@ For those interested, we will now provide details on how Arbiter2 works, which i | Time to max baddness | The amount of time spent over a threshold that will trigger an increase in penalty occurrences. Currently, this value is set to 10 minutes | | Time to min baddness | The amount of time spent under all thresholds to go from 100 to 0 badness. Currently, this value is set to 30 minutes | -In general, when a user goes over the resource threshold, the user will accrue badness. When a user accrues a badness of 100, the user's penalty occurrences will be incremented by 1, they will then be moved into the penalty state based on the penalty occurrences, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. +In general, when a user goes over a CPU or memory threshold, the user will accrue badness. When a user accrues a badness of 100, the user's penalty occurrences will be incremented by 1, they will then be moved into the penalty state based on the penalty occurrences, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. -Once the penalty state duration ends, the user will be placed back into the normal state (i.e. no throttling will be applied). If a user is in a normal state and does not accrue badness, their penalty occurrences will reduce by 1 after the penalty occurrence timer reaches zero. To improve - - -For a list of threshold values and durations for each penalty state, see the table below. Below we also provide a flowchart for the logic provided here to +Once the penalty state duration ends, the user will be placed back into the normal state (i.e. no throttling will be applied). If a user is in a normal state and does not accrue badness, their penalty occurrences will reduce by 1 after the penalty occurrence timer reaches zero. For a list of threshold values and durations for each penalty state, see the table below. Additionally, below we provide a flowchart representation for the logic Arbiter2 uses to put a user in a normal or penalty state. ```{eval-rst} .. raw:: html diff --git a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot index e0d982f5..025fc686 100644 --- a/graphviz_flowcharts/dot_files/arbiter_flowchart.dot +++ b/graphviz_flowcharts/dot_files/arbiter_flowchart.dot @@ -1,6 +1,6 @@ digraph "" { bgcolor="transparent"; - graph [id="doc-flowchart", rankdir=TB, nodesep=0.7, ranksep=0.7, bgcolor="none", splines=ortho]; + graph [id="doc-flowchart", rankdir=TB, nodesep=0.7, ranksep=0.8, bgcolor="none", splines=ortho]; // Node Styling node [fontname="Verdana", fontsize="16", color="#CFB87C", style="filled", fillcolor="#121212", penwidth="2", fontcolor="white", shape=ellipse]; @@ -13,13 +13,10 @@ digraph "" { { rank=same; NS; Exceed; Badness; } // --- ROW 5: Branching Outcomes --- - { rank=same; Track; TooBad; } - - { rank=same; Penalty1; NextPen; } + { rank=same; TooBad; } // --- ROW 6: Post-Penalty Actions --- { rank=same; DurationEnded; SentEmail; DurationTimer; } - // { rank=same; DurationEnded; } // Nodes NS [label="normal state"] @@ -28,10 +25,7 @@ digraph "" { Badness [label="Badness accumulates"] BehFix [label="Below usage\nthresholds?", style="filled,dashed"] TooBad [label="Has badness\nreached 100?", style="filled,dashed"] - Track [label="Do you have >0\npenalty occurrences?", style="filled,dashed"] - Penalty1 [label="penalty1 state\nassigned"] - NextPen [label="Penalty state\nincreased by 1"] - SentEmail [label="Warning email sent, penalty\noccurrences increases by 1,\nand resources throttled"] + SentEmail [label="Warning email sent, penalty\noccurrences increases by 1,\n penalty state assigned,\nand resources throttled"] DurationTimer [label="Penalty duration\ntimer counts down"] DurationEnded [label="Has the penalty\nduration ended?", style="filled,dashed"] BadnessRed [label="Badness reduces"] @@ -49,15 +43,9 @@ digraph "" { BehFix -> Badness [xlabel=" No"] BehFix -> BadnessRed [label="Yes", constraint=false] - TooBad -> Track [label="Yes", constraint=false] TooBad -> Badness [xlabel="No "] - - Track -> Penalty1 [xlabel=" No"] - Track -> NextPen [xlabel="Yes "] - - Penalty1 -> SentEmail - NextPen -> SentEmail - + TooBad -> SentEmail [xlabel="Yes "] + SentEmail -> DurationTimer [constraint=false] DurationTimer -> DurationEnded [constraint=false] diff --git a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg index 75495371..ef275040 100644 --- a/graphviz_flowcharts/generated_images/arbiter_flowchart.svg +++ b/graphviz_flowcharts/generated_images/arbiter_flowchart.svg @@ -1,190 +1,151 @@ - - + --> + + - -If penalty occurrences > 0, -start penalty occurrences timer; - at zero, reduce penalty occurrences -by 1, repeating until zero. + +If penalty occurrences > 0, +start penalty occurrences timer; + at zero, reduce penalty occurrences +by 1, repeating until zero. - -Badness reduces + +Badness reduces - -Usage thresholds -exceeded? + +Usage thresholds +exceeded? - - - + + + - -Below usage -thresholds? + +Below usage +thresholds? - - -Yes + + +Yes - -Badness accumulates + +Badness accumulates - - -  No + + +  No - -normal state + +normal state - - + + - - + + - - -  No + + +  No - - -Yes + + +Yes - - + + - - -Has badness -reached 100? + + +Has badness +reached 100? - - - - - - -Do you have >0 -penalty occurrences? - - - - -penalty1 state -assigned - - - - - - No - - - - -Penalty state -increased by 1 - - - - - -Yes   + + - - - -No   - - - - -Yes + + +No   - - -Warning email sent, penalty -occurrences increases by 1, -and resources throttled - - - - - + + +Warning email sent, penalty +occurrences increases by 1, + penalty state assigned, +and resources throttled - - - - + + + + +Yes   - - -Has the penalty -duration ended? + + +Has the penalty +duration ended? - - - -Yes   + + + +Yes   - - -Penalty duration -timer counts down + + +Penalty duration +timer counts down - - - - No + + + + No - - - + + + - - - + + + \ No newline at end of file From 34157bb56a8ebafac9a1ea4960954b1b151a2c6f Mon Sep 17 00:00:00 2001 From: b-reyes Date: Fri, 23 Jan 2026 15:57:57 -0700 Subject: [PATCH 11/13] Finish revising FAQ content, put term table and flow chart in dropdowns, fill in values for the penalty states --- docs/getting_started/faq.md | 46 ++++++++++--------- .../faq_html/arbiter_penalty_table.html | 12 ++--- 2 files changed, 31 insertions(+), 27 deletions(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 0a351fd5..3eb25ffe 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -152,41 +152,45 @@ re-enroll by visiting . If that did not resolve your i ```{important} - Arbiter2 is currently setup to track work across all login nodes. For this reason, the user's state will be the same on all login nodes. -- Hosts that utilize Arbiter2 and Arbiter2 configuration values may change over time, as we adjust Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. +- Chosen configuration values (e.g. thresholds) and where Arbiter2 is deployed may change over time. This is due to adjustments we may need to make for Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. ``` -For those interested, we will now provide details on how Arbiter2 works, which includes how penalty states are applied. Before we get started, it is important to define some Arbiter2 terms. Please review the following table before proceeding. +For those interested, we will now provide details on how Arbiter2 works, which includes how penalty states are applied. Before we get started, it is important to define some Arbiter2 terms. Please review the table of terms in the dropdown below before proceeding. +:::{dropdown} Show Arbiter2 terms +:icon: note | Term | Description | | :---------------------- | :--------------------------------------------- | -| badness | A value between 0 and 100 that accrues when a user exceeds defined resource thresholds | -| normal state | The default user state that has the maximum amount of CPU and memory resources| -| Penalty state | A user state with CPU and memory constraints applied | +| badness | A value between 0 and 100 that accrues when a user exceeds defined resource thresholds. | +| normal state | The default user state that has the maximum amount of CPU and memory resources.| +| Penalty state | A user state with CPU and memory constraints applied. | | Penalty occurrences | A variable that is used to determine what penalty state the user should be put in (see the table below for penalty state and penalty occurrences mapping) | -| Penalty occurrences timer | A variable that defines how long the user must be in the normal state before their penalty occurrences is reduced by 1 | -| CPU threshold | A threshold percentage of normal-state CPU capacity that triggers badness accumulation. With the value set to `0.75` and `4` CPUs available in the normal state, badness begins accumulating when usage exceeds `3` CPUs (`4 × 0.75`). | -| Memory threshold | A threshold percentage of normal-state memory (RAM) capacity that triggers badness accumulation. With the value set to `0.75` and `4 GB` of memory available in the normal state, badness begins accumulating when usage exceeds `3 GB` (`4 × 0.75`). -| Time to max baddness | The amount of time spent over a threshold that will trigger an increase in penalty occurrences. Currently, this value is set to 10 minutes | -| Time to min baddness | The amount of time spent under all thresholds to go from 100 to 0 badness. Currently, this value is set to 30 minutes | - -In general, when a user goes over a CPU or memory threshold, the user will accrue badness. When a user accrues a badness of 100, the user's penalty occurrences will be incremented by 1, they will then be moved into the penalty state based on the penalty occurrences, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. +| Penalty occurrences timer | A variable that defines how long the user must be in the normal state before their penalty occurrences are reduced by 1. We currently set this value to `3 hours`. | +| CPU threshold | A threshold percentage of normal-state CPU capacity that triggers badness accumulation. We set this value to `0.75`. Since there are `4` CPUs available in the normal state, badness begins accumulating when usage exceeds `3` CPUs (`4 × 0.75`). | +| Memory threshold | A threshold percentage of normal-state memory (RAM) capacity that triggers badness accumulation. We set this value to `0.75`. Since there is `4GB` of memory available in the normal state, badness begins accumulating when usage exceeds `3GB` (`4 × 0.75`). | +| Time to max baddness | The amount of time spent over a threshold that will result in 100 badness and trigger an increase in penalty occurrences. Currently, this value is set to `10 minutes`. | +| Time to min baddness | The amount of time spent under all thresholds to go from 100 to 0 badness. Currently, this value is set to `30 minutes`. | +::: +In general, when a user goes over a CPU or memory threshold, the user will accrue badness. When a user accrues a badness of 100, the user's penalty occurrences will be incremented by 1, they will be moved into the penalty state corresponding to their penalty occurrences, and they will receive a no-reply warning email. Once in a penalty state, the amount of resources they have available to them on the host will be reduced (i.e. throttled) based on the penalty state they are in. They will stay in this penalty state for a set duration. -Once the penalty state duration ends, the user will be placed back into the normal state (i.e. no throttling will be applied). If a user is in a normal state and does not accrue badness, their penalty occurrences will reduce by 1 after the penalty occurrence timer reaches zero. For a list of threshold values and durations for each penalty state, see the table below. Additionally, below we provide a flowchart representation for the logic Arbiter2 uses to put a user in a normal or penalty state. +Once the penalty state duration ends, the user will be placed back into the normal state (i.e. no throttling will be applied). If a user is in a normal state, their penalty occurrences will reduce by 1 after the penalty occurrence timer reaches zero. If a user has more than 1 penalty occurrences, the penalty occurrence timer will restart after reaching zero and repeat until the number of penalty occurrences reaches zero. For a list of threshold values and durations for each penalty state, see the table below. Additionally, in the dropdown below we provide a flowchart representation for the logic Arbiter2 uses to put a user in a normal or penalty state. +:::{dropdown} Show flowchart depiction of Arbiter2 +:icon: note ```{eval-rst} -.. raw:: html - -
-
- .. raw:: html :file: ../../graphviz_flowcharts/generated_images/arbiter_flowchart.svg -.. raw:: html +``` +::: -
-
+```{note} +When a user attempts to use more resources than their "Resource usage maximum" they will experience the following: +- If they are using more CPUs, they will see that their CPU usage will automatically be throttled such that their CPU usage is below the maximum +- If they are trying to use more memory, their program will be automatically killed once they go over their memory usage maximum +``` +```{eval-rst} .. raw:: html :file: ./faq_html/arbiter_penalty_table.html ``` diff --git a/docs/getting_started/faq_html/arbiter_penalty_table.html b/docs/getting_started/faq_html/arbiter_penalty_table.html index b6957aed..074bf8dd 100644 --- a/docs/getting_started/faq_html/arbiter_penalty_table.html +++ b/docs/getting_started/faq_html/arbiter_penalty_table.html @@ -88,27 +88,27 @@ login nodes normal - 4 CPU cores
4 GB of RAM + 4 CPU cores
4GB of RAM N/A N/A penalty1 - 3 CPU cores
3 GB of RAM + 3.2 CPU cores
3.2GB of RAM 1 30 minutes penalty2 - 2 CPU cores
2 GB of RAM + 2 CPU cores
2GB of RAM 2 - 60 minutes + 1 hour penalty3 - 2 CPU cores
2 GB of RAM + 1.2 CPU cores
1.2GB of RAM 3 or more - 120 minutes + 2 hours From 6da2a16cd5251f20a27716457f3ef0f6e17a4e50 Mon Sep 17 00:00:00 2001 From: b-reyes Date: Fri, 23 Jan 2026 16:07:15 -0700 Subject: [PATCH 12/13] modify CPU note --- docs/getting_started/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 3eb25ffe..67c41291 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -186,7 +186,7 @@ Once the penalty state duration ends, the user will be placed back into the norm ```{note} When a user attempts to use more resources than their "Resource usage maximum" they will experience the following: -- If they are using more CPUs, they will see that their CPU usage will automatically be throttled such that their CPU usage is below the maximum +- If they are using more CPUs, they will see that their CPU usage will automatically be throttled below the maximum CPU usage - If they are trying to use more memory, their program will be automatically killed once they go over their memory usage maximum ``` From 5ec32febfacba4c3779a0a2e125575d5974aa8e6 Mon Sep 17 00:00:00 2001 From: b-reyes Date: Fri, 23 Jan 2026 16:30:07 -0700 Subject: [PATCH 13/13] add a sentence in the beginning paragraph explaining what happens once processes are flaged and they are moved into a penalty state --- docs/getting_started/faq.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/getting_started/faq.md b/docs/getting_started/faq.md index 67c41291..538aa8de 100644 --- a/docs/getting_started/faq.md +++ b/docs/getting_started/faq.md @@ -148,11 +148,11 @@ re-enroll by visiting . If that did not resolve your i ::::{dropdown} Show :icon: note -[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah that allows us to monitor non-compute node resources for undesirable behavior. For an in-depth explanation of how Arbiter2 works, please see the official paper ["Arbiter: Dynamically Limiting Resource Consumption on Login Nodes"](https://dl.acm.org/doi/10.1145/3332186.3333043) (for a general overview, please see the remaining content in this FAQ). Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. +[Arbiter2](https://github.com/chpc-uofu/arbiter2) is a tool created by the University of Utah that allows us to monitor non-compute node resources for undesirable behavior. For an in-depth explanation of how Arbiter2 works, please see the official paper ["Arbiter: Dynamically Limiting Resource Consumption on Login Nodes"](https://dl.acm.org/doi/10.1145/3332186.3333043) (for a general overview, please see the remaining content in this FAQ). Currently, Arbiter2 is deployed on low resource hosts, such as login nodes, to detect work that consumes substantial CPU or memory resources. Once processes that consume substantial resources are detected and the user is moved into a penalty state, the user's total available resources on that host will be reduced and the user will be sent a no-reply warning email. Work that can consume substantial resources are items such as installing/compiling software, running software applications, and modification of large files. For a list of all hosts that Arbiter2 is deployed on, see the `Host` column in the table below. ```{important} - Arbiter2 is currently setup to track work across all login nodes. For this reason, the user's state will be the same on all login nodes. -- Chosen configuration values (e.g. thresholds) and where Arbiter2 is deployed may change over time. This is due to adjustments we may need to make for Arbiter2 to fit the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. +- Chosen configuration values (e.g. thresholds) and where Arbiter2 is deployed may change over time. This is due to adjustments we may need to make so that Arbiter2 fits the needs of the system and users. Please refer to this FAQ in the future for the most up-to-date information. ``` For those interested, we will now provide details on how Arbiter2 works, which includes how penalty states are applied. Before we get started, it is important to define some Arbiter2 terms. Please review the table of terms in the dropdown below before proceeding.