This document describes the available metrics in nuPlan to evaluate scenarios and score planners.
In this challenge, expert driven trajectory and planner proposed trajectories are downsampled with a selected frequency (comparison_frequency = 1Hz). At each sampled time, the planner proposed trajectory and expert trajectory are compared up to the selected time horizon in the future (comparison_horizon = [3,5,8]s). The submission planners must meet the minimum required frequency of 1 Hz and the minimum planning horizon of 6s for this challenge.
Average Displacement Error (ADE) within bound: At each sampled time, ADE is defined as the average of pointwise L2 distances between the sampled planner proposed trajectory and expert trajectory starting at that time up to the selected comparison horizon in the future. ADE is calculated at all sampled times in the scenario. We define ADE score for a scenario based on the mean of all the calculated ADEs in that scenario.
Final Displacement Error (FDE) within bound: At each sampled time, FDE is defined as the L2 distance between the sampled planner proposed trajectory and expert trajectory at the final time available in the sampled trajectories (i.e., at the selected comparison horizon seconds in the future). FDE is calculated at all sampled times in the scenario. We define FDE score for a scenario based on the mean of all the calculated FDEs in that scenario.
Miss Rate within bound: At each sampled time, if the maximum of pointwise L2 distances between the sampled planner proposed trajectory and sampled expert trajectory up to the selected comparison horizon in the future is greater than its corresponding maximum displacement threshold, we consider the planner proposed trajectory at that time as a miss. This process is done for all sampled times in the scenario. Miss rate is defined as the ratio of sampled times where the corresponding trajectory was marked as a miss over the number of the sampled times. Nuboard visualizes miss rate ratio and a boolean metric (True if the miss rate is below a maximum acceptable threshold, max_miss_rate_threshold = 0.3, for all horizons in comparison_horizon = [3,5,8]s else False).
Average Heading Error (AHE) within bound: At each sampled time, average heading error is defined as the average of the absolute differences between the sampled planner proposed trajectory and expert trajectory starting at that time up to the selected comparison horizon in the future. Average heading error is calculated at all sampled times in the scenario. We define average heading error score for a scenario based on the mean of all the calculated average heading errors in that scenario.
Final Heading Error (FHE) within bound: At each sampled time, final heading error is defined as the abosulte difference between the sampled planner proposed trajectory and expert trajectory at the final time available in the sampled trajectories (i.e., at the selected comparison horizon seconds in the future). Final heading error is calculated at all sampled times in the scenario. We define final heading error score for a scenario based on the mean of all the calculated final heading errors in that scenario.
| Metric | Thresholds | Visualization |
|---|---|---|
| Average Displacement Error (ADE) within bound | comparison_horizon = [3,5,8]s comparison_frequency = 1Hz max_average_l2_error_threshold = 8m |
Histogram of: • mean of ADE (m) |
| Final Displacement Error (FDE) within bound | *comparison_horizon = [3,5,8]s *comparison_frequency = 1Hz max_final_l2_error_threshold = 8m |
Histogram of: • mean of FDE (m) |
| Miss Rate within bound | *comparison_horizon = [3,5,8]s *comparison_frequency = 1Hz max_displacement_threshold = [6,8,16]m max_miss_rate_threshold = 0.3 |
Histograms of: • miss rate (ratio) • boolean (True if miss rate is below the max_miss_rate_threshold) |
| Average Heading Error (AHE) within bound | *comparison_horizon = [3,5,8]s *comparison_frequency = 1Hz max_average_heading_error_threshold = 0.8 rad |
Histogram of: • mean of average heading errors (rad) |
| Final Heading Error (FHE) within bound | *comparison_horizon = [3,5,8]s *comparison_frequency = 1Hz max_final_heading_error_threshold = 0.8 rad |
Histogram of: • mean of final heading errors (rad) |
'*' : inherited threshold from the related lower level metric
In these challenges, expert and driven ego trajectories are compared to score a scenario.
No at-fault Collisions: A collision is defined as the event of ego's bounding box intersecting another agent's bounding box. If a collision lasts for multiple frames, it still counts as a single collision and the first frame is considered to calculate the transferred kinematic energy during the collision. All collided tracks will be removed from metrics evaluations at future frames after the collision.
We classify collisions into 5 groups: STOPPED_EGO_COLLISION (a collision that happens when ego is stopped), STOPPED_TRACK_COLLISION (a collision that happens when the track is stopped), ACTIVE_FRONT_COLLISION (a collision that happens when front bumper of active ego hits an active track), ACTIVE_REAR_COLLISION (a collision that happens when an active track hits ego in the rear) and ACTIVE_LATERAL_COLLISION (a collision that happens when an active track and ego hit on the sides). To define the collision score for a scenario, we only consider collisions that should have been prevented if planner performed properly. For simplicity, we call these collisions at-fault. STOPPED_TRACK_COLLISION, ACTIVE_FRONT_COLLISION, and ACTIVE_LATERAL_COLLISION when ego's footprint is not fully in a single lane or lane_connector (e.g. during a lane change) are considered to be at-fault. For at-fault collisions, kinematic energy of collision at the time of collision is calculated based on the loss of velocity during the collision, which represents the intensity of the collision of ego with the other agents. At fault collisions are separated based on the collided track types to 3 groups: Vulnerable Road Users (VRUs) including pedestrians and bicyclists, vehicles, and objects including all other predicted track types (traffic cones, generic objects etc.). See nuboard histogram tab for statistics on the number of at-fault collisions and min/max/mean of at-fault collisions energies (m/s) for each group. This metric contributes to the scenario score as a multiplier based on the number of at-fault collisions that happens in a scenario for each group and the acceptable thresholds (max_violation_threshold_vru = 0 max_violation_threshold_vehicle = 0, max_violation_threshold_object = 1).
Drivable area compliance: Ego should drive in the mapped drivable area at all times. Drivable area compliance metric identifies the frames when ego drives outside the drivable area (See related timeseries in the scenario tab in nuboard). Due to over-approximation of ego's bounding box, we allow for a small infringemenet outside the drivable area (max_violation_threshold = 0.3m). If there exists a frame where the maximum distance of the corners of ego's bounding box from the nearest drivable area is more than the selected threshold, drivable area compliance score is set to 0, otherwise it is set to 1. Nuboard histogram tab visualizes the boolean representation of this metric (True if there is no drivable area violation more than max_violation_threshold else False). This boolean metric contributes to the scenario score as a multiplier.
Driving direction compliance: This is a metric defined to penalize ego when "it drives into oncoming traffic". The metric computes the movement of ego's center during a 1 second time_horizon along the driving direction defined according to the baselines of ego's lanes or lane-connectors. The score is set to 1 if it does not drive/move against the flow more than driving_direction_compliance_threshold (= 2 m) and 0 if it drives against the flow more than driving_direction_violation_threshold (= 6 m), and 0.5 otherwise.
Making progress: This metric is defined as a boolean metric based on the "Ego progress along the expert's route ratio" explained later in thos document. It's score is set to 1 if the ratio is more than the selected threshold (min_progress_threshold = 0.2), and is set to 0 otherwise. This boolean metric contributes to the scenario score as a multiplier.
Time to Collision (TTC) within bound: TTC is defined as the time required for ego and another track to collide if they continue at their present speed and heading. We only compute time to collision for tracks in front of ego and cross traffic, and lateral tracks on the sides only when ego's footprint is not fully in a single lane or lane_connector (e.g. during a lane change) or when ego is in areas marked as intersection where lanes may merge or extend into multiple lanes. To compute TTC, tracks and ego bounding boxes are projected with a selected time step (time_step_size = 0.1s) up to a selected time horizon in the future (time_horizon = 3.0s) and TTC is defined as the first time the projected bounding boxes intersect. If there is no intersection between the projected bounding boxes up to the selected time horizon, TTC is set to infinity. Nuboard histogram tab visualizes the min of TTC at all frames capped at time_horizon = 3.0s, and a boolean metric (True if TTC is higher than a minimum lower bound, least_min_ttc = 0.95s, else False). The boolean metric contributes to the scenario score in the weighted average function.
Ego progress along the expert's route ratio: We evaluate progress of the driven ego trajectory in a scenario by comparing its progress along the route that expert takes in that scenario. Expert's route is extracted as a sequence of lane and lane_connectors it moves along during the scenario. While ego is in the corresponding roadblocks of expert's route, progress per frame is computed based on progress along the baselines of expert's route. Sum/Integral of per frame ego's progress values (overall ego progress) and expert's progress values (overall expert progress) during the scenario are computated to define ego to expert progress ratio. Ego is not allowed to completely drive backwards in our scenarios. However, due to noise in data, in some scenarios where ego is stopped in the entire scenario the overall progress may take a small negative value up to - score_progress_threshold (score_progress_threshold = 0.1m). We set ego's to expert progress ratio to 0 if the overall ego progress is less than this negative threshold. The ratio is set to 1 in scenarios where expert is not assigned a route (i.e. is in car_park area, etc.). In other cases, the ratio is defined as the min(1 , max(overall ego progress, score_progress_threshold)/ max(overall expert progress, score_progress_threshold)). The ratio contributes to the scenario score in the weighted average function.
Speed limit compliance: This metric evaluates if ego's speed exceeds the associated speed limit in the map. The speed limit is queried from the lane ego is associated to. If ego is associated to a lane_connector, speed limit is set as the maximum of the speed limits of the incoming and outgoing lanes of that lane_connector. Speed limit violation at each frame is defined based on the difference between ego's speed and the speed limit, if ego's speed is higher than the speed limit (over-speeding). See nuboard histogram tab for statistics on the number of speed limit violations, a boolean metric (True if there is no speed limit violation else False), and min/max/mean of speed limit violations (m/s). Duration of the intervals when ego drives above the speed limit is considered in determining the mean value. This metric contributes to the scenario score in the weighted average function.
Comfort: We measure the comfort of ego's driven trajectory by evaluating minimum and maximum longitudinal accelerations, maximum absolute value of lateral acceleration, maximum absolute value of yaw rate, maximum absolute value of yaw acceleration, maximum absolute value of longitudinal component of jerk, and maximum magnitude of jerk vector. These variables are compared to thresholds with default values determined empirically from examination of a dataset of expert trajectories ( min_lon_accel = -4.05 m/s^2, max_lon_accel = 2.40 m/s^2, max_abs_lat_accel = 4.89 m/s^2, max_abs_yaw_accel = 1.93 rad/s^2, max_abs_yaw_rate = 0.95 rad/s, max_abs_lon_jerk = 4.13 m/s^3, max_abs_mag_jerk = 8.37 m/s^3). Nuboard histogram tab visualizes a boolean metric ego_is_comfortable (True if all the mentioned variables are within the selected thresholds, else False). The boolean metric contributes to the scenario score in the weighted average function.
The following metrics are available in nuboard, but are not used in the scenario score.
Mean speed: This metric reports the average speed of trajectory during the scenario.
Displacement error: This metric computes the pointwise L2 distances between the ego driven trajectory and expert at all timepoints in the scenario. A discount factor (discount_factor = 1) is added to reduce the effect of later timepoints in the scenario if needed. Nuboard visualizes timeseries of the error and histogram of max/min/mean values.
Displacement error with yaw: This metric computes the pointwise L2 distances between the ego driven trajectory and expert with a weighted sum of the absolute difference in their headings (heading_diff_weight = 2.5) at all timepoints in the scenario. A discount factor (discount_factor = 1) is added to reduce the effect of later timepoints in the scenario if desired. Nuboard visualizes the error and histogram of max/min/mean values.
| Metric | Thresholds | Visualization |
|---|---|---|
| No at-fault collision | max_violation_threshold_vru = 0 max_violation_threshold_vehicle = 0 max_violation_threshold_object = 1 |
Histograms of: • number of at-fault collisions • min/max/mean of at-fault collisions energy, if exists (m/s) |
| Drivable Area Compliance | max_violation_threshold = 0.3 m | Histogram of: • boolean (True if there is no drivable area violation more than max_violation_threshold else False) |
| Driving Direction Compliance | driving_direction_violation_threshold = 2 m driving_direction_violation_threshold = 6 m time_horizon = 1 s |
Histogram of: • score value |
| Making progress | min_progress_threshold = 0.2 | Histogram of: • boolean (True if progress ratio is more than min_progress_threshold else False) |
| Time to Collision (TTC) within bound | time_step_size = 0.1s time_horizon = 3.0s least_min_ttc = 0.95s |
Histogram of: • min of TTC (s) • boolean (True if TTC is higher than least_min_ttc, else False) |
| Speed limit compliance | max_compliance_threshold = 1.0 max_overspeed_value_threshold = 2.23 m/s (5mph) |
Histogram of: • number of speed limit violations • boolean (True if there is no speed limit violation else False) |
| Ego progress along the expert's route ratio | score_progress_threshold = 0.1 m | Histogram of: • mean of per frame ego's progress values over the scenario (m) • Sum/Integral of per frame ego's progress values (Overall progress) over the scenario (m) • Ratio of ego's progress along the expert route to a desired progress threshold, default set at expert's overall progress. The ratio is saturated at 0 and 1. |
| Comfort | min_lon_accel = -4.05 m/s^2 max_lon_accel = 2.40 m/s^2 max_abs_lat_accel = 4.89 m/s^2 max_abs_yaw_accel = 1.93 rad/s^2 max_abs_yaw_rate = 0.95 rad/s max_abs_lon_jerk = 4.13 m/s^3 max_abs_mag_jerk = 8.37 m/s^3 |
Histogram of: • boolean (True if all metrics are within the acceptable thresholds else False) |
| Displacement error | discount_factor = 1 | Histogram of: • min/max/mean of errors |
| Displacement error with yaw | discount_factor = 1 heading_diff_weight = 2.5 |
Histogram of: • min/max/mean of errors |