mlcommons · obaileyw-uncc · Sep 3, 2025 · Sep 9, 2025 · Sep 12, 2025 · Oct 2, 2025
@@ -0,0 +1,89 @@
+# MLPerf™ Tiny Deep Learning Benchmarks for Embedded Devices Benchmarking Tutorial
+The following file contains a guide on how to run each of the benchmarks on the STM32 Nucleo reference board and how to migrate these benchmarks to new hardware.
+
+## **Part 1** Running the STM32 Nucleo reference implementations
+### System requirements
+A computer with **Python 3.9** and the STM32 Cube IDE and two USB ports is required for running the benchmarks. The streaming wakeword benchmark also requires the MB1677C interface board, and any energy benchmarks also require the STMicroelectronics energy board.
+
+### Steps for all benchmarks
+Setup of an Anaconda environment for the Python portion of the benchmarks is recommended. Creation and activation of this environment with `conda` is achieved by executing
+
+```sh
+$ conda create -n tinymlperf python=3.9 && conda activate tinymlperf
+```
+
+This can be achieved with `venv` as well.
+
+```sh
+$ python3.9 -m venv ./tinymlperf && source ./tinymlperf/bin/activate
+```
+
+Once the Python 3.9 environment is activated, the requirements for both the runner application for running the benchmarks from the computer and using ARM Mbed must be downloaded using `pip`.
+
+```sh
+$ pip install -r "requirements.txt"
+```
+
+`libusb` and `pyusb` are used by the runner to interface with the boards and must be installed to run the benchmarks. On Linux, `libusb` is usually installed by default, but on macOS and Windows it must be manually installed. `libusb` can be installed on macOS using Homebrew.
+
+```sh
+$ brew install libusb
+```
+
+See [this page](https://github.com/pyusb/pyusb/blob/master/docs/faq.rst#how-do-i-install-libusb-on-windows) for information on how to install `libusb` on Windows.
+
+### Image classification, anomaly detection, keyword spotting and person detection benchmarks
+The image classification, anomaly detection, keyword spotting and person detection benchmarks use the ARM Mbed toolchain to build the binaries for the benchmark device firmware. The Mbed projects need to be set up before they can be used, which is automated through the *setup_example.sh* script located in each of the folders. Run this script from the command line after installing the required Python packages.
+
+Next, build the firmware for the desired benchmark by executing `mbed compile -m NUCLEO_L4R5ZI -t GCC_ARM` in the benchmark directory, then flash the firmware to the board by copying the compiled .bin file to the STM32 Nucleo board. This can be achieved through the command line, through the file explorer, or through the use of the STM32 Cube Programmer application.
+
+To run any of the tests in energy mode, connect the power board and the interface board to the reference board as shown below, and connect the interface board and the power board to the computer via USB. Load the firmware from *[interface/benchmark_interface.elf](interface/benchmark-interface.elf)* to the interface board using the STM32 Cube Programmer. Then, navigate to the `submitter_implemented.h` file in the desired benchmark, modify the line `#define EE_CFG_ENERGY_MODE 0` to `#define EE_CFG_ENERGY_MODE 1`, then recompile the firmware and load the compiled .bin file to the reference board.
+
+The interface board runs at 3.3V, so if the DUT is running at any other supply voltage, the logic levels must be shifted.  The TXB0108, available in a [breakout board](https://www.adafruit.com/product/395) from Adafruit, support low-side voltages from 1.2V to 3.6V.
+
+#### Power board (LPM01A)
+![LPM01A Wiring](runner/img/LPM01A.png)
+
+#### Interface board (STM32H573I-DK)
+![STM32H573I-DK Top Wiring](runner/img/STM32H573I-DK-Top.png)
+![STM32H573I-DK Bottom Wiring](runner/img/STM32H573I-DK-Bottom.png)
+
+#### Device under test (L4R5ZI) with level shifter
+![DUT Wiring](runner/img/L4R5Zi.png)
+
+Once the hardware is configured, navigate to the [runner](./runner/) directory and execute the runner. Commands to run specific benchmarks are below.
+* Image Classification
+    * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_kws_ic_vww.yaml --mode=e`
+    * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_kws_ic_vww.yaml --mode=p`
+    * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_kws_ic_vww.yaml --mode=a`
+* Keyword Spotting
+    * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_kws_ic_vww.yaml --mode=e`
+    * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_kws_ic_vww.yaml --mode=p`
+    * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_kws_ic_vww.yaml --mode=a`
+* Visual Wakewords:
+    * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_kws_ic_vww.yaml --mode=e`
+    * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_kws_ic_vww.yaml --mode=p`
+    * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_kws_ic_vww.yaml --mode=a`
+* Anomaly Detection:
+    * Energy:  `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_ad.yaml --mode=e`
+    * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_ad.yaml --mode=p`
+    * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_ad.yaml --mode=a`
+
+### Streaming wakeword benchmark
+The streaming wakeword benchmark uses the STM32 Cube SDK and consists of two parts that must be installed independently. First, open the STM32 Cube IDE and import the *[sww_ref_l4r5zi](./reference_submissions/streaming_wakeword/sww_ref_l4r5zi/)* project. Navigate to the debug configurations dialog by right-clicking the project name and selecting *Debug As -> Debug Configurations* and change the ST-Link device on the *Debugger* tab to the device plugged into the computer by pressing *Scan* and selecting the device from the drop-down menu. Then, compile and run the program by pressing the *Debug* button and pressing *Resume* when the debugger connects. Disconnect the debugger and disconnect the Nucleo reference board. Then, connect the reference board, power board and interface board in the energy measurement configuration as shown above.
+
+The interface board has a slot for a micro-SD card. The SD card must be loaded with the WAV files containing the wakewords to be streamed from the *[runner/sww_data_dir](runner/sww_data_dir/)* folder. Ensure that it is formatted as an MS-DOS (FAT32) disk.  A 1GB card is plenty for the current benchmarks. **This data must also be stored in a folder named sww01 under the dataset path specified in the next steps otherwise the runner will fail to detect (e.g., in a the folder [evaluation/datasets/sww01](evaluation/datasets/sww01/) if the specified dataset path is [evaluation/datasets](evaluation/datasets)).**
+
+This benchmark runs performance, accuracy and energy tests in a single run, and should be run in energy mode.
+
+```sh
+python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_sww.yaml --mode=e
+```
+
+## **Part 2** Modifying reference implementations for new platforms
+Each of the reference implementations contains a submitter-implemented module with API functions that must be modified when migrating one of the reference implementations to a new platform. With the exception of the streaming wakeword reference implementation where the submitter-implemented module is named *[sww_ref_util_submitter.c](./reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util_submitter.c)*, this module is named *submitter_implemented.cpp* in each of the other implementations. Documentation for how these functions are supposed to work and the required input and output values is included in *[sww_ref_util_submitter.h](./reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util_submitter.h)* for the streaming wakeword reference implementation and in *submitter_implemented.h* for the other benchmarks.
+
+Once the submitter functions are implemented, verify the baud rate in the file *[device_under_test.py](./runner/device_under_test.py)* is correct and modify the *devices* YAML files in the *runner* folder for the new platform.
+
+## Further information
+More information on how to use the provided reference implementations or how to transition the provided reference implementations to new platforms not covered here is located in the README.md files located in the *[reference_submissions](reference_submissions/)* folder and its subfolders.
@@ -1,5 +1,5 @@
 # Copy API files since all source files must be within an mbed project.
-cp ../../api . -r
+cp -r ../../api .
 cp ../../main.cpp .
 
 # Create mbed project and checkout to restore the overwritten main.cpp.

@@ -0,0 +1,81 @@
+appdirs==1.4.4
+asn1ate==0.6.0
+beautifulsoup4==4.6.3
+cbor==1.0.0
+certifi==2025.8.3
+cffi==1.17.1
+chardet==3.0.4
+charset-normalizer==3.4.3
+click==8.1.8
+cmake==4.1.0
+cmsis-pack-manager==0.2.10
+colorama==0.3.9
+contourpy==1.3.0
+cryptography==45.0.6
+cycler==0.12.1
+Cython==3.1.3
+ecdsa==0.19.1
+fasteners==0.20
+fonttools==4.57.0
+future==0.16.0
+gitdb==4.0.12
+GitPython==3.1.45
+hidapi==0.14.0.post4
+icetea==1.2.4
+idna==2.7
+importlib_resources==6.4.5
+intelhex==2.3.0
+Jinja2==2.10.3
+joblib==1.4.2
+jsonmerge==1.9.2
+jsonschema==2.6.0
+junit-xml==1.8
+kiwisolver==1.4.7
+lockfile==0.12.2
+lxml==6.0.0
+manifest-tool==1.5.2
+MarkupSafe==2.0.1
+matplotlib==3.9.4
+mbed-cli==1.10.5
+mbed-cloud-sdk==2.0.8
+mbed-flasher==0.10.1
+mbed-greentea==1.7.4
+mbed-host-tests==1.5.10
+mbed-ls==1.7.12
+mbed-os-tools==0.0.15
+milksnake==0.1.6
+ninja==1.13.0
+numpy==1.26.4
+packaging==25.0
+pandas==2.3.2
+pillow==10.4.0
+prettytable==0.7.2
+protobuf==3.5.2.post1
+psutil==5.6.6
+pyasn1==0.2.3
+pycparser==2.22
+pycryptodome==3.23.0
+pyelftools==0.29
+pyparsing==3.1.4
+pyserial==3.4
+python-dateutil==2.9.0.post0
+python-dotenv==1.0.1
+pytz==2025.2
+pyusb==1.2.1
+PyYAML==6.0.2
+requests==2.20.1
+scikit-learn==1.6.1
+scipy==1.13.1
+semver==3.0.4
+six==1.12.0
+smmap==5.0.2
+soupsieve==2.7
+tabulate==0.9.0
+threadpoolctl==3.5.0
+tqdm==4.67.1
+typing_extensions==4.13.2
+tzdata==2025.2
+urllib3==1.24.2
+wcwidth==0.2.13
+yattag==1.16.1
+zipp==3.20.2
@@ -27,11 +27,11 @@ extern "C" {
 #endif
 
 /* Includes ------------------------------------------------------------------*/
-#include "stm32l4xx_hal.h"
 
 /* Private includes ----------------------------------------------------------*/
 /* USER CODE BEGIN Includes */
-
+#include "sww_ref_util.h"
+#include "sww_ref_util_submitter.h"
 /* USER CODE END Includes */
 
 /* Exported types ------------------------------------------------------------*/
@@ -50,53 +50,12 @@ extern "C" {
 /* USER CODE END EM */
 
 /* Exported functions prototypes ---------------------------------------------*/
-void Error_Handler(void);
 
 /* USER CODE BEGIN EFP */
 
 /* USER CODE END EFP */
 
 /* Private defines -----------------------------------------------------------*/
-#define B1_Pin GPIO_PIN_13
-#define B1_GPIO_Port GPIOC
-#define timestamp_Pin GPIO_PIN_13
-#define timestamp_GPIO_Port GPIOF
-#define Processing_Pin GPIO_PIN_9
-#define Processing_GPIO_Port GPIOE
-#define LD3_Pin GPIO_PIN_14
-#define LD3_GPIO_Port GPIOB
-#define STLK_RX_Pin GPIO_PIN_8
-#define STLK_RX_GPIO_Port GPIOD
-#define STLK_TX_Pin GPIO_PIN_9
-#define STLK_TX_GPIO_Port GPIOD
-#define USB_OverCurrent_Pin GPIO_PIN_5
-#define USB_OverCurrent_GPIO_Port GPIOG
-#define USB_PowerSwitchOn_Pin GPIO_PIN_6
-#define USB_PowerSwitchOn_GPIO_Port GPIOG
-#define STLINK_TX_Pin GPIO_PIN_7
-#define STLINK_TX_GPIO_Port GPIOG
-#define STLINK_RX_Pin GPIO_PIN_8
-#define STLINK_RX_GPIO_Port GPIOG
-#define USB_SOF_Pin GPIO_PIN_8
-#define USB_SOF_GPIO_Port GPIOA
-#define USB_VBUS_Pin GPIO_PIN_9
-#define USB_VBUS_GPIO_Port GPIOA
-#define USB_ID_Pin GPIO_PIN_10
-#define USB_ID_GPIO_Port GPIOA
-#define USB_DM_Pin GPIO_PIN_11
-#define USB_DM_GPIO_Port GPIOA
-#define USB_DP_Pin GPIO_PIN_12
-#define USB_DP_GPIO_Port GPIOA
-#define TMS_Pin GPIO_PIN_13
-#define TMS_GPIO_Port GPIOA
-#define TCK_Pin GPIO_PIN_14
-#define TCK_GPIO_Port GPIOA
-#define SWO_Pin GPIO_PIN_3
-#define SWO_GPIO_Port GPIOB
-#define LD2_Pin GPIO_PIN_7
-#define LD2_GPIO_Port GPIOB
-#define WW_DETECTED_Pin GPIO_PIN_8
-#define WW_DETECTED_GPIO_Port GPIOB
 
 /* USER CODE BEGIN Private defines */
 

@@ -8,13 +8,24 @@
 #ifndef INC_SWW_UTIL_H_
 #define INC_SWW_UTIL_H_
 
+// includes
 #include <string.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <stdarg.h>
 #include <stdint.h>
-#include "sww_model.h"
+#include <ctype.h>
+#include <math.h>
 
+//#include "stm32l4xx_hal.h"
+//#include "arm_math.h"
+#include "feature_extraction.h"
+#include "sww_ref_util_submitter.h"
+
+// needed for running the model and/or initializing inference setup
+//#include "sww_model.h"
+//#include "sww_model_data.h"
+#include "fixed_data.h"
 
 #define EE_FW_VERSION "MLPerf Tiny Firmware V0.1.0"
 
@@ -25,13 +36,6 @@
 #define EE_MODEL_VERSION_IC01 "ic01"
 #define EE_MODEL_VERSION_SWW01 "sww01"
 
-
-
-
-
-#define TH_MODEL_VERSION EE_MODEL_VERSION_SWW01
-
-
 typedef enum { EE_ARG_CLAIMED, EE_ARG_UNCLAIMED } arg_claimed_t;
 typedef enum { EE_STATUS_OK = 0, EE_STATUS_ERROR } ee_status_t;
 
@@ -40,20 +44,15 @@ typedef enum { EE_STATUS_OK = 0, EE_STATUS_ERROR } ee_status_t;
 #define EE_CMD_SIZE 1028u
 #define EE_CMD_DELIMITER " "
 #define EE_CMD_TERMINATOR '%'
-
 #define EE_CMD_NAME "name"
 #define EE_CMD_TIMESTAMP "timestamp"
 
 #define EE_MSG_READY "m-ready\r\n"
 #define EE_MSG_INIT_DONE "m-init-done\r\n"
 #define EE_MSG_NAME "m-name-%s-[%s]\r\n"
 #define EE_MSG_TIMESTAMP "m-lap-us-%lu\r\n"
-
 #define EE_ERR_CMD "e-[Unknown command: %s]\r\n"
 
-#define TH_VENDOR_NAME_STRING "ML Commons"
-
-
 #define SWW_WINLEN_SAMPLES 1024
 #define SWW_WINSTRIDE_SAMPLES 512
 #define SWW_MODEL_INPUT_SIZE 1200
@@ -79,21 +78,19 @@ typedef enum {
 } i2s_state_t;
 
 
-void print_vals_int16(const int16_t *buffer, uint32_t num_vals);
-void print_bytes(const uint8_t *buffer, uint32_t num_bytes);
-void print_vals_float(const float *buffer, uint32_t num_vals);
-void log_printf(LogBuffer *log, const char *format, ...);
+void ee_print_vals_int16(const int16_t *buffer, uint32_t num_vals);
+void ee_print_vals_int8(const int8_t *buffer, uint32_t num_vals);
+void ee_print_bytes(const uint8_t *buffer, uint32_t num_bytes);
+void ee_print_vals_float(const float *buffer, uint32_t num_vals);
+void ee_log_printf(LogBuffer *log, const char *format, ...);
 
-void process_command(char *full_command);
+void ee_process_command(char *full_command);
 void ee_serial_callback(char c);
-void th_timestamp(void);
-void set_processing_pin_high(void);
-void set_processing_pin_low(void);
-void infer_static_wav(char *cmd_args[]);
-
-ai_error aiInit(void);
-void setup_i2s_buffers();
-void compute_lfbe_f32(const int16_t *pSrc, float32_t *pDst, float32_t *pTmp);
-void extract_features_on_chunk(char *cmd_args[]);
+void ee_timestamp(void);
+void ee_set_processing_pin_high(void);
+void ee_set_processing_pin_low(void);
+
+void ee_setup_i2s_buffers();
+void ee_process_chunk_and_cont_capture(void *hsai);
 
 #endif /* INC_SWW_UTIL_H_ */