This directory contains scripts and tools for running PTO Runtime tests.
The PTO Runtime test framework provides a simplified interface for testing runtime implementations. Users only need to provide:
- Kernel configuration (
kernel_config.py) - Defines kernels and orchestration function - Golden script (
golden.py) - Defines input generation and expected output computation
The test framework automatically handles compilation, execution, and result validation.
python examples/scripts/run_example.py \
--kernels <kernels_directory> \
--golden <golden_script_path> \
--platform <platform_name>python examples/scripts/run_example.py \
-k examples/a2a3/host_build_graph/vector_example/kernels \
-g examples/a2a3/host_build_graph/vector_example/golden.py \
-p a2a3python examples/scripts/run_example.py \
-k examples/a2a3/host_build_graph/vector_example/kernels \
-g examples/a2a3/host_build_graph/vector_example/golden.py \
-p a2a3sim| Argument | Short | Description | Default |
|---|---|---|---|
--kernels |
-k |
Kernels directory path (contains kernel_config.py) | Required |
--golden |
-g |
golden.py script path | Required |
--platform |
-p |
Platform name: a2a3 or a2a3sim |
a2a3 |
--device |
-d |
Device ID | From env var or 0 |
--runtime |
-r |
Runtime implementation name | host_build_graph |
--verbose |
-v |
Enable verbose output (equivalent to --log-level debug) |
False |
--silent |
Enable silent mode (equivalent to --log-level error) |
False | |
--log-level |
Set log level: error, warn, info, debug |
info |
|
--clone-protocol |
Git protocol for cloning pto-isa: ssh or https |
ssh |
a2a3: Hardware platform, requires Ascend device and CANN toolkita2a3sim: Simulation platform, uses thread simulation, only requires gcc/g++
The test framework supports unified logging across Python and C++ Host code with four levels:
error(ERROR): Only show errors, suitable for CI/CD or productionwarn(WARNING): Show warnings and errorsinfo(INFO, default): Show progress and status informationdebug(DEBUG): Show detailed debug information including:- Compiler commands
- Compiler stdout/stderr output
- Detailed tensor data
- Intermediate step information
# Error level - only show errors
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --silent
# or explicitly:
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --log-level error
# Info level (default) - show progress
python examples/scripts/run_example.py -k ./kernels -g ./golden.py
# Debug level - show all debug info
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --verbose
# or explicitly:
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --log-level debug
# Warning level - show warnings and errors
python examples/scripts/run_example.py -k ./kernels -g ./golden.py --log-level warnYou can also control logging via environment variable (lower priority than CLI arguments):
export PTO_LOG_LEVEL=debug # Options: error, warn, info, debug
python examples/scripts/run_example.py -k ./kernels -g ./golden.pyLog level is determined by (highest to lowest priority):
- CLI arguments (
--log-level,--verbose,--silent) - Environment variable (
PTO_LOG_LEVEL) - Default value (
info/ INFO level)
- error/warn/info levels: Compiler output hidden unless compilation fails
- debug level: All compiler output displayed via DEBUG level
- Compilation failures: Always show error messages regardless of log level
The kernels directory must contain a kernel_config.py file:
kernels/
├── kernel_config.py # Required: kernel configuration
├── orchestration/
│ └── example_orch.cpp # Orchestration function implementation
└── aiv/ # or aic/, depending on core type
├── kernel_add.cpp
├── kernel_mul.cpp
└── ...
from pathlib import Path
KERNELS_DIR = Path(__file__).parent
# Kernel list
KERNELS = [
{
"func_id": 0, # Kernel ID
"core_type": "aiv", # Core type: aiv or aic
"source": str(KERNELS_DIR / "aiv/kernel_add.cpp") # Kernel source file path
},
# More kernels...
]
# Orchestration function configuration
ORCHESTRATION = {
"source": str(KERNELS_DIR / "orchestration/example_orch.cpp"),
"function_name": "BuildExampleGraph" # Orchestration function name
}import torch
# Output tensor names list (optional, or use 'out_' prefix convention)
__outputs__ = ["f"]
# Tensor order (required, must match orchestration function parameter order)
TENSOR_ORDER = ["a", "b", "f"]
# Comparison tolerances
RTOL = 1e-5
ATOL = 1e-5
def generate_inputs(params: dict) -> dict:
"""
Generate input and output tensors.
Args:
params: Parameter dictionary (from ALL_CASES)
Returns:
Dictionary containing all tensors (inputs + outputs)
"""
SIZE = 16384
return {
"a": torch.full((SIZE,), 2.0, dtype=torch.float32),
"b": torch.full((SIZE,), 3.0, dtype=torch.float32),
"f": torch.zeros(SIZE, dtype=torch.float32), # Output tensor
}
def compute_golden(tensors: dict, params: dict) -> None:
"""
Compute expected output (in-place modification).
Args:
tensors: Dictionary containing all tensors
params: Parameter dictionary
"""
a = tensors["a"]
b = tensors["b"]
tensors["f"][:] = (a + b + 1) * (a + b + 2)
# Optional: Multiple test cases
ALL_CASES = {
"Default": {},
# "Large": {"size": 1024}, # Other test cases
}
DEFAULT_CASE = "Default"-
generate_inputs(params: dict) -> dict- Generate input and output tensors
- Returns: Dictionary with tensor names as keys and torch tensors as values
-
compute_golden(tensors: dict, params: dict) -> None- Compute expected output values
- Modifies output tensors in
tensorsdictionary in-place
TENSOR_ORDER: List specifying the order of tensors passed to orchestration function
__outputs__: Output tensor names list (or useout_prefix convention)ALL_CASES: Dict of named parameter sets for parameterized testsDEFAULT_CASE: Name of the default case to runRTOL: Relative tolerance (default1e-5)ATOL: Absolute tolerance (default1e-5)
The test framework supports two methods for identifying output tensors:
__outputs__ = ["f", "result", ...]Use out_ prefix to name output tensors:
def generate_inputs(params: dict) -> dict:
return {
"a": torch.randn(1024), # Input
"b": torch.randn(1024), # Input
"out_f": torch.zeros(1024), # Output (auto-detected)
}For host_build_graph, orchestration sources should include orchestration_api.h and use ChipStorageTaskArgs:
// Assume TENSOR_ORDER = ["a", "b", "f"]
#include "orchestration_api.h"
int BuildExampleGraph(OrchestrationRuntime* runtime, const ChipStorageTaskArgs &orch_args) {
void* ptr_a = orch_args.tensor(0).data_as<void>();
void* ptr_b = orch_args.tensor(1).data_as<void>();
void* ptr_f = orch_args.tensor(2).data_as<void>();
size_t size_a = orch_args.tensor(0).nbytes();
size_t size_b = orch_args.tensor(1).nbytes();
size_t size_f = orch_args.tensor(2).nbytes();
void* dev_a = device_malloc(runtime, size_a);
void* dev_b = device_malloc(runtime, size_b);
void* dev_f = device_malloc(runtime, size_f);
copy_to_device(runtime, dev_a, ptr_a, size_a);
copy_to_device(runtime, dev_b, ptr_b, size_b);
record_tensor_pair(runtime, ptr_f, dev_f, size_f);
// Build task graph...
return 0;
}# Set log level (optional, CLI arguments take priority)
export PTO_LOG_LEVEL=debug # Options: error, warn, info, debug
# Optional: Output C++ Host logs to file
export PTO_LOG_FILE=/tmp/pto_runtime.log# Required
export ASCEND_HOME_PATH=/usr/local/Ascend/cann-8.5.0
# PTO_ISA_ROOT is auto-detected (auto-cloned to examples/scripts/_deps/pto-isa on first run)
# Override if needed:
# export PTO_ISA_ROOT=/path/to/pto-isa
# Optional: choose device via CLI, e.g. `-d 0`No special platform-specific environment variables required.
my_test/
├── kernels/
│ ├── kernel_config.py
│ ├── orchestration/
│ │ └── my_orch.cpp
│ └── aiv/
│ ├── kernel_add.cpp
│ └── kernel_mul.cpp
└── golden.py
# Hardware platform
python examples/scripts/run_example.py -k my_test/kernels -g my_test/golden.py -p a2a3
# Simulation platform
python examples/scripts/run_example.py -k my_test/kernels -g my_test/golden.py -p a2a3sim
# Verbose output
python examples/scripts/run_example.py -k my_test/kernels -g my_test/golden.py -p a2a3sim -v=== Building Runtime: host_build_graph (platform: a2a3sim) ===
...
=== Compiling and Registering Kernels ===
Compiling kernel: kernels/aiv/kernel_add.cpp (func_id=0)
...
=== Generating Input Tensors ===
Inputs: ['a', 'b']
Outputs: ['f']
...
=== Launching Runtime ===
...
=== Comparing Results ===
Comparing f: shape=(16384,), dtype=float32
First 10 actual: [42. 42. 42. 42. 42. 42. 42. 42. 42. 42.]
First 10 expected: [42. 42. 42. 42. 42. 42. 42. 42. 42. 42.]
f: PASS (16384/16384 elements matched)
============================================================
TEST PASSED
============================================================
=== Comparing Results ===
Comparing f: shape=(16384,), dtype=float32
First 10 actual: [40. 40. 40. 40. 40. 40. 40. 40. 40. 40.]
First 10 expected: [42. 42. 42. 42. 42. 42. 42. 42. 42. 42.]
TEST FAILED: Output 'f' does not match golden
- Hardware Example: examples/a2a3/host_build_graph/vector_example/
- Simulation Example: examples/a2a3/host_build_graph/vector_example/
Use the -v flag to enable verbose output:
python examples/scripts/run_example.py -k ... -g ... -p ... -vThis usually happens when:
- Using wrong platform (a2a3 vs a2a3sim)
- Kernel compilation failed silently
Solutions:
- Verify correct
-pparameter is used - Check if kernel source files exist
- Use
-vto view detailed compilation logs
Define ALL_CASES and DEFAULT_CASE in golden.py:
ALL_CASES = {
"Small": {"size": 1024},
"Medium": {"size": 2048},
"Large": {"size": 4096},
}
DEFAULT_CASE = "Small"
def generate_inputs(params: dict) -> dict:
size = params["size"]
return {
"a": torch.randn(size, dtype=torch.float32),
"b": torch.randn(size, dtype=torch.float32),
"out_f": torch.zeros(size, dtype=torch.float32),
}Then use --all to run all cases or --case Medium to run a specific one.
Yes. The test framework uses PyTorch tensors by default:
import torch
def generate_inputs(params: dict) -> dict:
return {
"a": torch.randn(1024),
"b": torch.randn(1024),
"out_f": torch.zeros(1024),
}Use the --log-level argument or --verbose/--silent flags:
# Show detailed debug information
python examples/scripts/run_example.py -k ... -g ... --verbose
# Only show errors
python examples/scripts/run_example.py -k ... -g ... --silent
# Or use explicit log level
python examples/scripts/run_example.py -k ... -g ... --log-level debug
# Show warnings and errors
python examples/scripts/run_example.py -k ... -g ... --log-level warnUse info level (default). Compiler output is automatically hidden in error/warn/info levels unless compilation fails:
# Default behavior - hides compiler output
python examples/scripts/run_example.py -k ... -g ...To see compiler output for debugging, use debug level:
python examples/scripts/run_example.py -k ... -g ... --verbose
# or
python examples/scripts/run_example.py -k ... -g ... --log-level debugSet the PTO_LOG_FILE environment variable:
export PTO_LOG_FILE=/tmp/pto_runtime.log
python examples/scripts/run_example.py -k ... -g ...
# View the logs
cat /tmp/pto_runtime.logIf you have a custom runtime implementation:
python examples/scripts/run_example.py \
-k my_test/kernels \
-g my_test/golden.py \
-r my_custom_runtime \
-p a2a3simRuntime implementation should be located at: src/{arch}/runtime/<runtime_name>/
You can use create_code_runner directly in Python scripts. It creates a
CodeRunner configured from the RUNTIME_CONFIG in your kernel_config.py:
from code_runner import create_code_runner
runner = create_code_runner(
kernels_dir="my_test/kernels",
golden_path="my_test/golden.py",
platform="a2a3sim",
device_id=0,
)
runner.run() # Execute testFor issues or suggestions, please submit an Issue or Pull Request.