diff --git a/benchmark/TUTORIAL.md b/benchmark/TUTORIAL.md new file mode 100644 index 00000000..4e86705a --- /dev/null +++ b/benchmark/TUTORIAL.md @@ -0,0 +1,89 @@ +# MLPerf™ Tiny Deep Learning Benchmarks for Embedded Devices Benchmarking Tutorial +The following file contains a guide on how to run each of the benchmarks on the STM32 Nucleo reference board and how to migrate these benchmarks to new hardware. + +## **Part 1** Running the STM32 Nucleo reference implementations +### System requirements +A computer with **Python 3.9** and the STM32 Cube IDE and two USB ports is required for running the benchmarks. The streaming wakeword benchmark also requires the MB1677C interface board, and any energy benchmarks also require the STMicroelectronics energy board. + +### Steps for all benchmarks +Setup of an Anaconda environment for the Python portion of the benchmarks is recommended. Creation and activation of this environment with `conda` is achieved by executing + +```sh +$ conda create -n tinymlperf python=3.9 && conda activate tinymlperf +``` + +This can be achieved with `venv` as well. + +```sh +$ python3.9 -m venv ./tinymlperf && source ./tinymlperf/bin/activate +``` + +Once the Python 3.9 environment is activated, the requirements for both the runner application for running the benchmarks from the computer and using ARM Mbed must be downloaded using `pip`. + +```sh +$ pip install -r "requirements.txt" +``` + +`libusb` and `pyusb` are used by the runner to interface with the boards and must be installed to run the benchmarks. On Linux, `libusb` is usually installed by default, but on macOS and Windows it must be manually installed. `libusb` can be installed on macOS using Homebrew. + +```sh +$ brew install libusb +``` + +See [this page](https://github.com/pyusb/pyusb/blob/master/docs/faq.rst#how-do-i-install-libusb-on-windows) for information on how to install `libusb` on Windows. + +### Image classification, anomaly detection, keyword spotting and person detection benchmarks +The image classification, anomaly detection, keyword spotting and person detection benchmarks use the ARM Mbed toolchain to build the binaries for the benchmark device firmware. The Mbed projects need to be set up before they can be used, which is automated through the *setup_example.sh* script located in each of the folders. Run this script from the command line after installing the required Python packages. + +Next, build the firmware for the desired benchmark by executing `mbed compile -m NUCLEO_L4R5ZI -t GCC_ARM` in the benchmark directory, then flash the firmware to the board by copying the compiled .bin file to the STM32 Nucleo board. This can be achieved through the command line, through the file explorer, or through the use of the STM32 Cube Programmer application. + +To run any of the tests in energy mode, connect the power board and the interface board to the reference board as shown below, and connect the interface board and the power board to the computer via USB. Load the firmware from *[interface/benchmark_interface.elf](interface/benchmark-interface.elf)* to the interface board using the STM32 Cube Programmer. Then, navigate to the `submitter_implemented.h` file in the desired benchmark, modify the line `#define EE_CFG_ENERGY_MODE 0` to `#define EE_CFG_ENERGY_MODE 1`, then recompile the firmware and load the compiled .bin file to the reference board. + +The interface board runs at 3.3V, so if the DUT is running at any other supply voltage, the logic levels must be shifted. The TXB0108, available in a [breakout board](https://www.adafruit.com/product/395) from Adafruit, support low-side voltages from 1.2V to 3.6V. + +#### Power board (LPM01A) +![LPM01A Wiring](runner/img/LPM01A.png) + +#### Interface board (STM32H573I-DK) +![STM32H573I-DK Top Wiring](runner/img/STM32H573I-DK-Top.png) +![STM32H573I-DK Bottom Wiring](runner/img/STM32H573I-DK-Bottom.png) + +#### Device under test (L4R5ZI) with level shifter +![DUT Wiring](runner/img/L4R5Zi.png) + +Once the hardware is configured, navigate to the [runner](./runner/) directory and execute the runner. Commands to run specific benchmarks are below. +* Image Classification + * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_kws_ic_vww.yaml --mode=e` + * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_kws_ic_vww.yaml --mode=p` + * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_kws_ic_vww.yaml --mode=a` +* Keyword Spotting + * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_kws_ic_vww.yaml --mode=e` + * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_kws_ic_vww.yaml --mode=p` + * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_kws_ic_vww.yaml --mode=a` +* Visual Wakewords: + * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_kws_ic_vww.yaml --mode=e` + * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_kws_ic_vww.yaml --mode=p` + * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_kws_ic_vww.yaml --mode=a` +* Anomaly Detection: + * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_ad.yaml --mode=e` + * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_ad.yaml --mode=p` + * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_ad.yaml --mode=a` + +### Streaming wakeword benchmark +The streaming wakeword benchmark uses the STM32 Cube SDK and consists of two parts that must be installed independently. First, open the STM32 Cube IDE and import the *[sww_ref_l4r5zi](./reference_submissions/streaming_wakeword/sww_ref_l4r5zi/)* project. Navigate to the debug configurations dialog by right-clicking the project name and selecting *Debug As -> Debug Configurations* and change the ST-Link device on the *Debugger* tab to the device plugged into the computer by pressing *Scan* and selecting the device from the drop-down menu. Then, compile and run the program by pressing the *Debug* button and pressing *Resume* when the debugger connects. Disconnect the debugger and disconnect the Nucleo reference board. Then, connect the reference board, power board and interface board in the energy measurement configuration as shown above. + +The interface board has a slot for a micro-SD card. The SD card must be loaded with the WAV files containing the wakewords to be streamed from the *[runner/sww_data_dir](runner/sww_data_dir/)* folder. Ensure that it is formatted as an MS-DOS (FAT32) disk. A 1GB card is plenty for the current benchmarks. **This data must also be stored in a folder named sww01 under the dataset path specified in the next steps otherwise the runner will fail to detect (e.g., in a the folder [evaluation/datasets/sww01](evaluation/datasets/sww01/) if the specified dataset path is [evaluation/datasets](evaluation/datasets)).** + +This benchmark runs performance, accuracy and energy tests in a single run, and should be run in energy mode. + +```sh +python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_sww.yaml --mode=e +``` + +## **Part 2** Modifying reference implementations for new platforms +Each of the reference implementations contains a submitter-implemented module with API functions that must be modified when migrating one of the reference implementations to a new platform. With the exception of the streaming wakeword reference implementation where the submitter-implemented module is named *[sww_ref_util_submitter.c](./reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util_submitter.c)*, this module is named *submitter_implemented.cpp* in each of the other implementations. Documentation for how these functions are supposed to work and the required input and output values is included in *[sww_ref_util_submitter.h](./reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util_submitter.h)* for the streaming wakeword reference implementation and in *submitter_implemented.h* for the other benchmarks. + +Once the submitter functions are implemented, verify the baud rate in the file *[device_under_test.py](./runner/device_under_test.py)* is correct and modify the *devices* YAML files in the *runner* folder for the new platform. + +## Further information +More information on how to use the provided reference implementations or how to transition the provided reference implementations to new platforms not covered here is located in the README.md files located in the *[reference_submissions](reference_submissions/)* folder and its subfolders. diff --git a/benchmark/reference_submissions/mbed_basic/setup_mbed.sh b/benchmark/reference_submissions/mbed_basic/setup_mbed.sh index c05e8188..0d08ecc8 100755 --- a/benchmark/reference_submissions/mbed_basic/setup_mbed.sh +++ b/benchmark/reference_submissions/mbed_basic/setup_mbed.sh @@ -1,5 +1,5 @@ # Copy API files since all source files must be within an mbed project. -cp ../../api . -r +cp -r ../../api . cp ../../main.cpp . # Create mbed project and checkout to restore the overwritten main.cpp. diff --git a/benchmark/reference_submissions/requirements.txt b/benchmark/reference_submissions/requirements.txt new file mode 100644 index 00000000..1a389e68 --- /dev/null +++ b/benchmark/reference_submissions/requirements.txt @@ -0,0 +1,81 @@ +appdirs==1.4.4 +asn1ate==0.6.0 +beautifulsoup4==4.6.3 +cbor==1.0.0 +certifi==2025.8.3 +cffi==1.17.1 +chardet==3.0.4 +charset-normalizer==3.4.3 +click==8.1.8 +cmake==4.1.0 +cmsis-pack-manager==0.2.10 +colorama==0.3.9 +contourpy==1.3.0 +cryptography==45.0.6 +cycler==0.12.1 +Cython==3.1.3 +ecdsa==0.19.1 +fasteners==0.20 +fonttools==4.57.0 +future==0.16.0 +gitdb==4.0.12 +GitPython==3.1.45 +hidapi==0.14.0.post4 +icetea==1.2.4 +idna==2.7 +importlib_resources==6.4.5 +intelhex==2.3.0 +Jinja2==2.10.3 +joblib==1.4.2 +jsonmerge==1.9.2 +jsonschema==2.6.0 +junit-xml==1.8 +kiwisolver==1.4.7 +lockfile==0.12.2 +lxml==6.0.0 +manifest-tool==1.5.2 +MarkupSafe==2.0.1 +matplotlib==3.9.4 +mbed-cli==1.10.5 +mbed-cloud-sdk==2.0.8 +mbed-flasher==0.10.1 +mbed-greentea==1.7.4 +mbed-host-tests==1.5.10 +mbed-ls==1.7.12 +mbed-os-tools==0.0.15 +milksnake==0.1.6 +ninja==1.13.0 +numpy==1.26.4 +packaging==25.0 +pandas==2.3.2 +pillow==10.4.0 +prettytable==0.7.2 +protobuf==3.5.2.post1 +psutil==5.6.6 +pyasn1==0.2.3 +pycparser==2.22 +pycryptodome==3.23.0 +pyelftools==0.29 +pyparsing==3.1.4 +pyserial==3.4 +python-dateutil==2.9.0.post0 +python-dotenv==1.0.1 +pytz==2025.2 +pyusb==1.2.1 +PyYAML==6.0.2 +requests==2.20.1 +scikit-learn==1.6.1 +scipy==1.13.1 +semver==3.0.4 +six==1.12.0 +smmap==5.0.2 +soupsieve==2.7 +tabulate==0.9.0 +threadpoolctl==3.5.0 +tqdm==4.67.1 +typing_extensions==4.13.2 +tzdata==2025.2 +urllib3==1.24.2 +wcwidth==0.2.13 +yattag==1.16.1 +zipp==3.20.2 diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/main.h b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/main.h index a671b46e..79c3d65a 100644 --- a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/main.h +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/main.h @@ -27,11 +27,11 @@ extern "C" { #endif /* Includes ------------------------------------------------------------------*/ -#include "stm32l4xx_hal.h" /* Private includes ----------------------------------------------------------*/ /* USER CODE BEGIN Includes */ - +#include "sww_ref_util.h" +#include "sww_ref_util_submitter.h" /* USER CODE END Includes */ /* Exported types ------------------------------------------------------------*/ @@ -50,53 +50,12 @@ extern "C" { /* USER CODE END EM */ /* Exported functions prototypes ---------------------------------------------*/ -void Error_Handler(void); /* USER CODE BEGIN EFP */ /* USER CODE END EFP */ /* Private defines -----------------------------------------------------------*/ -#define B1_Pin GPIO_PIN_13 -#define B1_GPIO_Port GPIOC -#define timestamp_Pin GPIO_PIN_13 -#define timestamp_GPIO_Port GPIOF -#define Processing_Pin GPIO_PIN_9 -#define Processing_GPIO_Port GPIOE -#define LD3_Pin GPIO_PIN_14 -#define LD3_GPIO_Port GPIOB -#define STLK_RX_Pin GPIO_PIN_8 -#define STLK_RX_GPIO_Port GPIOD -#define STLK_TX_Pin GPIO_PIN_9 -#define STLK_TX_GPIO_Port GPIOD -#define USB_OverCurrent_Pin GPIO_PIN_5 -#define USB_OverCurrent_GPIO_Port GPIOG -#define USB_PowerSwitchOn_Pin GPIO_PIN_6 -#define USB_PowerSwitchOn_GPIO_Port GPIOG -#define STLINK_TX_Pin GPIO_PIN_7 -#define STLINK_TX_GPIO_Port GPIOG -#define STLINK_RX_Pin GPIO_PIN_8 -#define STLINK_RX_GPIO_Port GPIOG -#define USB_SOF_Pin GPIO_PIN_8 -#define USB_SOF_GPIO_Port GPIOA -#define USB_VBUS_Pin GPIO_PIN_9 -#define USB_VBUS_GPIO_Port GPIOA -#define USB_ID_Pin GPIO_PIN_10 -#define USB_ID_GPIO_Port GPIOA -#define USB_DM_Pin GPIO_PIN_11 -#define USB_DM_GPIO_Port GPIOA -#define USB_DP_Pin GPIO_PIN_12 -#define USB_DP_GPIO_Port GPIOA -#define TMS_Pin GPIO_PIN_13 -#define TMS_GPIO_Port GPIOA -#define TCK_Pin GPIO_PIN_14 -#define TCK_GPIO_Port GPIOA -#define SWO_Pin GPIO_PIN_3 -#define SWO_GPIO_Port GPIOB -#define LD2_Pin GPIO_PIN_7 -#define LD2_GPIO_Port GPIOB -#define WW_DETECTED_Pin GPIO_PIN_8 -#define WW_DETECTED_GPIO_Port GPIOB /* USER CODE BEGIN Private defines */ diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/sww_ref_util.h b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/sww_ref_util.h index da904fdf..c6886ab1 100644 --- a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/sww_ref_util.h +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/sww_ref_util.h @@ -8,13 +8,24 @@ #ifndef INC_SWW_UTIL_H_ #define INC_SWW_UTIL_H_ +// includes #include #include #include #include #include -#include "sww_model.h" +#include +#include +//#include "stm32l4xx_hal.h" +//#include "arm_math.h" +#include "feature_extraction.h" +#include "sww_ref_util_submitter.h" + +// needed for running the model and/or initializing inference setup +//#include "sww_model.h" +//#include "sww_model_data.h" +#include "fixed_data.h" #define EE_FW_VERSION "MLPerf Tiny Firmware V0.1.0" @@ -25,13 +36,6 @@ #define EE_MODEL_VERSION_IC01 "ic01" #define EE_MODEL_VERSION_SWW01 "sww01" - - - - -#define TH_MODEL_VERSION EE_MODEL_VERSION_SWW01 - - typedef enum { EE_ARG_CLAIMED, EE_ARG_UNCLAIMED } arg_claimed_t; typedef enum { EE_STATUS_OK = 0, EE_STATUS_ERROR } ee_status_t; @@ -40,7 +44,6 @@ typedef enum { EE_STATUS_OK = 0, EE_STATUS_ERROR } ee_status_t; #define EE_CMD_SIZE 1028u #define EE_CMD_DELIMITER " " #define EE_CMD_TERMINATOR '%' - #define EE_CMD_NAME "name" #define EE_CMD_TIMESTAMP "timestamp" @@ -48,12 +51,8 @@ typedef enum { EE_STATUS_OK = 0, EE_STATUS_ERROR } ee_status_t; #define EE_MSG_INIT_DONE "m-init-done\r\n" #define EE_MSG_NAME "m-name-%s-[%s]\r\n" #define EE_MSG_TIMESTAMP "m-lap-us-%lu\r\n" - #define EE_ERR_CMD "e-[Unknown command: %s]\r\n" -#define TH_VENDOR_NAME_STRING "ML Commons" - - #define SWW_WINLEN_SAMPLES 1024 #define SWW_WINSTRIDE_SAMPLES 512 #define SWW_MODEL_INPUT_SIZE 1200 @@ -79,21 +78,19 @@ typedef enum { } i2s_state_t; -void print_vals_int16(const int16_t *buffer, uint32_t num_vals); -void print_bytes(const uint8_t *buffer, uint32_t num_bytes); -void print_vals_float(const float *buffer, uint32_t num_vals); -void log_printf(LogBuffer *log, const char *format, ...); +void ee_print_vals_int16(const int16_t *buffer, uint32_t num_vals); +void ee_print_vals_int8(const int8_t *buffer, uint32_t num_vals); +void ee_print_bytes(const uint8_t *buffer, uint32_t num_bytes); +void ee_print_vals_float(const float *buffer, uint32_t num_vals); +void ee_log_printf(LogBuffer *log, const char *format, ...); -void process_command(char *full_command); +void ee_process_command(char *full_command); void ee_serial_callback(char c); -void th_timestamp(void); -void set_processing_pin_high(void); -void set_processing_pin_low(void); -void infer_static_wav(char *cmd_args[]); - -ai_error aiInit(void); -void setup_i2s_buffers(); -void compute_lfbe_f32(const int16_t *pSrc, float32_t *pDst, float32_t *pTmp); -void extract_features_on_chunk(char *cmd_args[]); +void ee_timestamp(void); +void ee_set_processing_pin_high(void); +void ee_set_processing_pin_low(void); + +void ee_setup_i2s_buffers(); +void ee_process_chunk_and_cont_capture(void *hsai); #endif /* INC_SWW_UTIL_H_ */ diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/sww_ref_util_submitter.h b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/sww_ref_util_submitter.h new file mode 100644 index 00000000..036fce45 --- /dev/null +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Inc/sww_ref_util_submitter.h @@ -0,0 +1,190 @@ +/* + * submitter_implemented.h + * + * Created on: Sep 3, 2025 + * Author: owen + * \file + * \brief Submitter implementations required to perform inference. + * \detail All methods starting with th_ are platform-specific and to be + * implemented by the submitter. All basic I/O, inference and timer APIs must + * be implemented in order for the benchmark to output useful results, but some + * auxiliary methods default to an empty implementation. These methods are + * provided to enable submitter optimizations and are not required for + * submission. + */ + +#ifndef __SWW_REF_UTIL_SUBMITTER_H__ +#define __SWW_REF_UTIL_SUBMITTER_H__ + +#include "stm32l4xx_hal.h" +#include +#include +#include +#include + +// needed for running the model and/or initializing inference setup +#include "sww_model.h" +#include "sww_model_data.h" +#include "fixed_data.h" + +#include "sww_ref_util.h" + +// I/O defines +#define B1_Pin GPIO_PIN_13 +#define B1_GPIO_Port GPIOC +#define timestamp_Pin GPIO_PIN_13 +#define timestamp_GPIO_Port GPIOF +#define Processing_Pin GPIO_PIN_9 +#define Processing_GPIO_Port GPIOE +#define LD3_Pin GPIO_PIN_14 +#define LD3_GPIO_Port GPIOB +#define STLK_RX_Pin GPIO_PIN_8 +#define STLK_RX_GPIO_Port GPIOD +#define STLK_TX_Pin GPIO_PIN_9 +#define STLK_TX_GPIO_Port GPIOD +#define USB_OverCurrent_Pin GPIO_PIN_5 +#define USB_OverCurrent_GPIO_Port GPIOG +#define USB_PowerSwitchOn_Pin GPIO_PIN_6 +#define USB_PowerSwitchOn_GPIO_Port GPIOG +#define STLINK_TX_Pin GPIO_PIN_7 +#define STLINK_TX_GPIO_Port GPIOG +#define STLINK_RX_Pin GPIO_PIN_8 +#define STLINK_RX_GPIO_Port GPIOG +#define USB_SOF_Pin GPIO_PIN_8 +#define USB_SOF_GPIO_Port GPIOA +#define USB_VBUS_Pin GPIO_PIN_9 +#define USB_VBUS_GPIO_Port GPIOA +#define USB_ID_Pin GPIO_PIN_10 +#define USB_ID_GPIO_Port GPIOA +#define USB_DM_Pin GPIO_PIN_11 +#define USB_DM_GPIO_Port GPIOA +#define USB_DP_Pin GPIO_PIN_12 +#define USB_DP_GPIO_Port GPIOA +#define TMS_Pin GPIO_PIN_13 +#define TMS_GPIO_Port GPIOA +#define TCK_Pin GPIO_PIN_14 +#define TCK_GPIO_Port GPIOA +#define SWO_Pin GPIO_PIN_3 +#define SWO_GPIO_Port GPIOB +#define LD2_Pin GPIO_PIN_7 +#define LD2_GPIO_Port GPIOB +#define WW_DETECTED_Pin GPIO_PIN_8 +#define WW_DETECTED_GPIO_Port GPIOB + +// platform-specific defines +// used for the time-critical register reads and writes on the GPIO and timer +#define TH_VENDOR_NAME_STRING "ML Commons" +#define TH_MODEL_VERSION EE_MODEL_VERSION_SWW01 +#define TH_I2S_OK HAL_OK +#define TH_GPIO_WRITE(__port__, __pin__, __value__) HAL_GPIO_WritePin(__port__, __pin__, __value__) +//#define TH_TIMER16_GET() __HAL_TIM_GET_COUNTER(&htim16) +#if !defined(TH_GPIO_WRITE) || !defined(TH_VENDOR_NAME_STRING) \ + || !defined(TH_I2S_OK) +#error Unmapped macros detected. Make sure all macros in submitter_implemented are defined. +#endif + +// core API functions +/** + * @brief block for a given number of microseconds + */ +void th_delay_us(int delay_len_us); + +/** + * @brief initialize GPIO and necessary peripherals on DUT + */ +void th_hardware_init(void); + +/** + * @brief sets or reset a GPIO pin + * @param port Pointer to the port address + * @param pin Pin mask + * @param value 0 or 1 + */ +// void th_gpio_write(void *port, uint16_t pin, uint8_t value); + +/** + * @brief start the sixteen-bit timer peripheral on the device + */ +void th_timer16_start(void); + +/** + * @brief get 16-bit counter value of the timer peripheral + * @retval timer value (16-bit uint8_t) + */ +//uint16_t th_timer16_get(void); + + +/** + * @brief receive DMA data + * @param dma_addr Pointer to the DMA + * @param i2s_buffer Pointer to the I2S buffer + * @param size Number of data bytes to receive + * @retval DMA receive success code + */ +uint32_t th_dma_receive(uint8_t *i2s_buffer, uint16_t size); + +/** + * @brief stop DMA capture + * @param dma_addr Pointer to the DMA memory address + * @retval DMA status value + */ +uint32_t th_dma_stop(void); + +/** + * @brief get the state value for the I2S DMA object + */ +uint8_t th_dma_state(void); + +/** + * @brief receive from the device's UART + * @param uart Pointer to the UART peripheral + * @param data Pointer to the transmission data + * @param size Number of data bytes to receive + * @param timeout Timeout window in milliseconds + * @retval UART receive success code + */ +uint32_t th_uart_receive(uint8_t *data, uint16_t size, uint32_t timeout); + +/** + * @brief create and initialize the c-model and generate and map pointers to + * the input and output tensors of the model + * @attention Reference implementation uses the ai_error datatype from STM's + * AI middlewares, return type must be changed to the equivalent + * for the submitter's platform before build + * @retval error encoding from platform ML engine + */ +ai_error th_ai_init(void); + +/** + * @brief run inference engine using the previously setup ML model + * @attention Reference implementation uses the ai_error datatype from STM's + * AI middlewares, return type must be changed to the equivalent + * for the submitter's platform before build + * @retval error encoding from platform ML engine + */ +ai_error th_ai_run(const void *in_data, void *out_data); + +void th_run_model_on_test_data(char *cmd_args[]); + +void th_infer_static_wav(char *cmd_args[]); + +void th_extract_features_on_chunk(char *cmd_args[]); + +void th_run_extraction(char *cmd_args[]); + +void th_process_chunk_and_cont_streaming(void *hsai); + +void th_compute_lfbe_f32(const int16_t *pSrc, float32_t *pDst, float32_t *pTmp); + +// private functions, originally from main.c +void SystemClock_Config(void); +void Error_Handler(void); + +// These defines and this function are to get printf() working +#ifdef __GNUC__ +#define PUTCHAR_PROTOTYPE int __io_putchar(int ch) +#else +#define PUTCHAR_PROTOTYPE int fputc(int ch, FILE *f) +#endif + +#endif diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/feature_extraction.c b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/feature_extraction.c index 7825390c..d54133df 100644 --- a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/feature_extraction.c +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/feature_extraction.c @@ -94,7 +94,7 @@ void test_extraction(const float32_t input_signal[]) // printf("Mag: %3.4f, %3.4f, %3.4f, %3.4f\r\n", // testOutput[0], testOutput[1], testOutput[2], testOutput[3]); printf("Magnitude output\r\n"); - print_vals_float(testOutput, TEST_LENGTH_SAMPLES/2); + ee_print_vals_float(testOutput, TEST_LENGTH_SAMPLES/2); /* Calculates maxValue and returns corresponding BIN value */ arm_max_f32(testOutput, fftSize, &maxValue, &testIndex); diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/main.c b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/main.c index f2b9042a..a2177913 100644 --- a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/main.c +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/main.c @@ -22,18 +22,6 @@ /* Private includes ----------------------------------------------------------*/ /* USER CODE BEGIN Includes */ -#include -#include -#include -#include - -// needed for running the model and/or initializing inference setup -#include "sww_model.h" -#include "sww_model_data.h" -#include "fixed_data.h" - -#include "sww_ref_util.h" - /* USER CODE END Includes */ /* Private typedef -----------------------------------------------------------*/ @@ -52,29 +40,14 @@ /* USER CODE END PM */ /* Private variables ---------------------------------------------------------*/ -UART_HandleTypeDef hlpuart1; -UART_HandleTypeDef huart3; - -SAI_HandleTypeDef hsai_BlockA1; -DMA_HandleTypeDef hdma_sai1_a; - -TIM_HandleTypeDef htim16; - -PCD_HandleTypeDef hpcd_USB_OTG_FS; +extern UART_HandleTypeDef hlpuart1; +extern TIM_HandleTypeDef htim16; /* USER CODE BEGIN PV */ /* USER CODE END PV */ /* Private function prototypes -----------------------------------------------*/ -void SystemClock_Config(void); -static void MX_GPIO_Init(void); -static void MX_DMA_Init(void); -static void MX_LPUART1_UART_Init(void); -static void MX_USART3_UART_Init(void); -static void MX_USB_OTG_FS_PCD_Init(void); -static void MX_SAI1_Init(void); -static void MX_TIM16_Init(void); /* USER CODE BEGIN PFP */ /* USER CODE END PFP */ @@ -82,20 +55,6 @@ static void MX_TIM16_Init(void); /* Private user code ---------------------------------------------------------*/ /* USER CODE BEGIN 0 */ -// These defines and this function are to get printf() working -#ifdef __GNUC__ -#define PUTCHAR_PROTOTYPE int __io_putchar(int ch) -#else -#define PUTCHAR_PROTOTYPE int fputc(int ch, FILE *f) -#endif - - -PUTCHAR_PROTOTYPE -{ - HAL_UART_Transmit(&hlpuart1, (uint8_t *)&ch, 1, HAL_MAX_DELAY); - return ch; -} - /* USER CODE END 0 */ /** @@ -113,45 +72,23 @@ int main(void) /* USER CODE END 1 */ - /* MCU Configuration--------------------------------------------------------*/ - - /* Reset of all peripherals, Initializes the Flash interface and the Systick. */ - HAL_Init(); - - /* USER CODE BEGIN Init */ - - /* USER CODE END Init */ - - /* Configure the system clock */ - SystemClock_Config(); + th_hardware_init(); - /* USER CODE BEGIN SysInit */ - - /* USER CODE END SysInit */ - - /* Initialize all configured peripherals */ - MX_GPIO_Init(); - MX_DMA_Init(); - MX_LPUART1_UART_Init(); - MX_USART3_UART_Init(); - MX_USB_OTG_FS_PCD_Init(); - MX_SAI1_Init(); - MX_TIM16_Init(); /* USER CODE BEGIN 2 */ - setup_i2s_buffers(); // allocate memory for I2S reception - HAL_TIM_Base_Start(&htim16); // start timer - th_timestamp(); // Toggle D7 pin on startup + ee_setup_i2s_buffers(); // allocate memory for I2S reception + th_timer16_start(); // start timer + ee_timestamp(); // Toggle D7 pin on startup /* USER CODE END 2 */ /* Infinite loop */ /* USER CODE BEGIN WHILE */ - aiInit(); + th_ai_init(); char ch_from_uart= (char) 0; while (1) { - uart_status = HAL_UART_Receive(&hlpuart1, (uint8_t *)&ch_from_uart, 1, uart_timeout_ms); + uart_status = th_uart_receive((uint8_t *)&ch_from_uart, 1, uart_timeout_ms); if(uart_status == HAL_OK) {// otherwise timeout => no key input ee_serial_callback(ch_from_uart); } @@ -161,384 +98,3 @@ int main(void) } /* USER CODE END 3 */ } - -/** - * @brief System Clock Configuration - * @retval None - */ -void SystemClock_Config(void) -{ - RCC_OscInitTypeDef RCC_OscInitStruct = {0}; - RCC_ClkInitTypeDef RCC_ClkInitStruct = {0}; - - /** Configure the main internal regulator output voltage - */ - if (HAL_PWREx_ControlVoltageScaling(PWR_REGULATOR_VOLTAGE_SCALE1_BOOST) != HAL_OK) - { - Error_Handler(); - } - - /** Initializes the RCC Oscillators according to the specified parameters - * in the RCC_OscInitTypeDef structure. - */ - RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI48|RCC_OSCILLATORTYPE_HSI; - RCC_OscInitStruct.HSIState = RCC_HSI_ON; - RCC_OscInitStruct.HSI48State = RCC_HSI48_ON; - RCC_OscInitStruct.HSICalibrationValue = RCC_HSICALIBRATION_DEFAULT; - RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON; - RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSI; - RCC_OscInitStruct.PLL.PLLM = 2; - RCC_OscInitStruct.PLL.PLLN = 30; - RCC_OscInitStruct.PLL.PLLP = RCC_PLLP_DIV2; - RCC_OscInitStruct.PLL.PLLQ = RCC_PLLQ_DIV2; - RCC_OscInitStruct.PLL.PLLR = RCC_PLLR_DIV2; - if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK) - { - Error_Handler(); - } - - /** Initializes the CPU, AHB and APB buses clocks - */ - RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK - |RCC_CLOCKTYPE_PCLK1|RCC_CLOCKTYPE_PCLK2; - RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK; - RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1; - RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV2; - RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1; - - if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_5) != HAL_OK) - { - Error_Handler(); - } -} - -/** - * @brief LPUART1 Initialization Function - * @param None - * @retval None - */ -static void MX_LPUART1_UART_Init(void) -{ - - /* USER CODE BEGIN LPUART1_Init 0 */ - - /* USER CODE END LPUART1_Init 0 */ - - /* USER CODE BEGIN LPUART1_Init 1 */ - - /* USER CODE END LPUART1_Init 1 */ - hlpuart1.Instance = LPUART1; - hlpuart1.Init.BaudRate = 115200; - hlpuart1.Init.WordLength = UART_WORDLENGTH_8B; - hlpuart1.Init.StopBits = UART_STOPBITS_1; - hlpuart1.Init.Parity = UART_PARITY_NONE; - hlpuart1.Init.Mode = UART_MODE_TX_RX; - hlpuart1.Init.HwFlowCtl = UART_HWCONTROL_NONE; - hlpuart1.Init.OneBitSampling = UART_ONE_BIT_SAMPLE_DISABLE; - hlpuart1.Init.ClockPrescaler = UART_PRESCALER_DIV1; - hlpuart1.AdvancedInit.AdvFeatureInit = UART_ADVFEATURE_NO_INIT; - hlpuart1.FifoMode = UART_FIFOMODE_DISABLE; - if (HAL_UART_Init(&hlpuart1) != HAL_OK) - { - Error_Handler(); - } - if (HAL_UARTEx_SetTxFifoThreshold(&hlpuart1, UART_TXFIFO_THRESHOLD_1_8) != HAL_OK) - { - Error_Handler(); - } - if (HAL_UARTEx_SetRxFifoThreshold(&hlpuart1, UART_RXFIFO_THRESHOLD_1_8) != HAL_OK) - { - Error_Handler(); - } - if (HAL_UARTEx_DisableFifoMode(&hlpuart1) != HAL_OK) - { - Error_Handler(); - } - /* USER CODE BEGIN LPUART1_Init 2 */ - - /* USER CODE END LPUART1_Init 2 */ - -} - -/** - * @brief USART3 Initialization Function - * @param None - * @retval None - */ -static void MX_USART3_UART_Init(void) -{ - - /* USER CODE BEGIN USART3_Init 0 */ - - /* USER CODE END USART3_Init 0 */ - - /* USER CODE BEGIN USART3_Init 1 */ - - /* USER CODE END USART3_Init 1 */ - huart3.Instance = USART3; - huart3.Init.BaudRate = 115200; - huart3.Init.WordLength = UART_WORDLENGTH_8B; - huart3.Init.StopBits = UART_STOPBITS_1; - huart3.Init.Parity = UART_PARITY_NONE; - huart3.Init.Mode = UART_MODE_TX_RX; - huart3.Init.HwFlowCtl = UART_HWCONTROL_NONE; - huart3.Init.OverSampling = UART_OVERSAMPLING_16; - huart3.Init.OneBitSampling = UART_ONE_BIT_SAMPLE_DISABLE; - huart3.Init.ClockPrescaler = UART_PRESCALER_DIV1; - huart3.AdvancedInit.AdvFeatureInit = UART_ADVFEATURE_NO_INIT; - if (HAL_UART_Init(&huart3) != HAL_OK) - { - Error_Handler(); - } - if (HAL_UARTEx_SetTxFifoThreshold(&huart3, UART_TXFIFO_THRESHOLD_1_8) != HAL_OK) - { - Error_Handler(); - } - if (HAL_UARTEx_SetRxFifoThreshold(&huart3, UART_RXFIFO_THRESHOLD_1_8) != HAL_OK) - { - Error_Handler(); - } - if (HAL_UARTEx_DisableFifoMode(&huart3) != HAL_OK) - { - Error_Handler(); - } - /* USER CODE BEGIN USART3_Init 2 */ - - /* USER CODE END USART3_Init 2 */ - -} - -/** - * @brief SAI1 Initialization Function - * @param None - * @retval None - */ -static void MX_SAI1_Init(void) -{ - - /* USER CODE BEGIN SAI1_Init 0 */ - - /* USER CODE END SAI1_Init 0 */ - - /* USER CODE BEGIN SAI1_Init 1 */ - - /* USER CODE END SAI1_Init 1 */ - hsai_BlockA1.Instance = SAI1_Block_A; - hsai_BlockA1.Init.AudioMode = SAI_MODESLAVE_RX; - hsai_BlockA1.Init.Synchro = SAI_ASYNCHRONOUS; - hsai_BlockA1.Init.OutputDrive = SAI_OUTPUTDRIVE_DISABLE; - hsai_BlockA1.Init.FIFOThreshold = SAI_FIFOTHRESHOLD_EMPTY; - hsai_BlockA1.Init.SynchroExt = SAI_SYNCEXT_DISABLE; - hsai_BlockA1.Init.MonoStereoMode = SAI_STEREOMODE; - hsai_BlockA1.Init.CompandingMode = SAI_NOCOMPANDING; - hsai_BlockA1.Init.TriState = SAI_OUTPUT_NOTRELEASED; - if (HAL_SAI_InitProtocol(&hsai_BlockA1, SAI_I2S_STANDARD, SAI_PROTOCOL_DATASIZE_16BIT, 2) != HAL_OK) - { - Error_Handler(); - } - /* USER CODE BEGIN SAI1_Init 2 */ - - /* USER CODE END SAI1_Init 2 */ - -} - -/** - * @brief TIM16 Initialization Function - * @param None - * @retval None - */ -static void MX_TIM16_Init(void) -{ - - /* USER CODE BEGIN TIM16_Init 0 */ - - /* USER CODE END TIM16_Init 0 */ - - /* USER CODE BEGIN TIM16_Init 1 */ - - /* USER CODE END TIM16_Init 1 */ - htim16.Instance = TIM16; - htim16.Init.Prescaler = 120-1; - htim16.Init.CounterMode = TIM_COUNTERMODE_UP; - htim16.Init.Period = 65535; - htim16.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1; - htim16.Init.RepetitionCounter = 0; - htim16.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE; - if (HAL_TIM_Base_Init(&htim16) != HAL_OK) - { - Error_Handler(); - } - /* USER CODE BEGIN TIM16_Init 2 */ - HAL_TIM_Base_MspInit(&htim16); - /* USER CODE END TIM16_Init 2 */ - -} - -/** - * @brief USB_OTG_FS Initialization Function - * @param None - * @retval None - */ -static void MX_USB_OTG_FS_PCD_Init(void) -{ - - /* USER CODE BEGIN USB_OTG_FS_Init 0 */ - - /* USER CODE END USB_OTG_FS_Init 0 */ - - /* USER CODE BEGIN USB_OTG_FS_Init 1 */ - - /* USER CODE END USB_OTG_FS_Init 1 */ - hpcd_USB_OTG_FS.Instance = USB_OTG_FS; - hpcd_USB_OTG_FS.Init.dev_endpoints = 6; - hpcd_USB_OTG_FS.Init.speed = PCD_SPEED_FULL; - hpcd_USB_OTG_FS.Init.phy_itface = PCD_PHY_EMBEDDED; - hpcd_USB_OTG_FS.Init.Sof_enable = ENABLE; - hpcd_USB_OTG_FS.Init.low_power_enable = DISABLE; - hpcd_USB_OTG_FS.Init.lpm_enable = DISABLE; - hpcd_USB_OTG_FS.Init.battery_charging_enable = ENABLE; - hpcd_USB_OTG_FS.Init.use_dedicated_ep1 = DISABLE; - hpcd_USB_OTG_FS.Init.vbus_sensing_enable = ENABLE; - if (HAL_PCD_Init(&hpcd_USB_OTG_FS) != HAL_OK) - { - Error_Handler(); - } - /* USER CODE BEGIN USB_OTG_FS_Init 2 */ - - /* USER CODE END USB_OTG_FS_Init 2 */ - -} - -/** - * Enable DMA controller clock - */ -static void MX_DMA_Init(void) -{ - - /* DMA controller clock enable */ - __HAL_RCC_DMAMUX1_CLK_ENABLE(); - __HAL_RCC_DMA1_CLK_ENABLE(); - - /* DMA interrupt init */ - /* DMA1_Channel1_IRQn interrupt configuration */ - HAL_NVIC_SetPriority(DMA1_Channel1_IRQn, 0, 0); - HAL_NVIC_EnableIRQ(DMA1_Channel1_IRQn); - -} - -/** - * @brief GPIO Initialization Function - * @param None - * @retval None - */ -static void MX_GPIO_Init(void) -{ - GPIO_InitTypeDef GPIO_InitStruct = {0}; -/* USER CODE BEGIN MX_GPIO_Init_1 */ -/* USER CODE END MX_GPIO_Init_1 */ - - /* GPIO Ports Clock Enable */ - __HAL_RCC_GPIOE_CLK_ENABLE(); - __HAL_RCC_GPIOC_CLK_ENABLE(); - __HAL_RCC_GPIOH_CLK_ENABLE(); - __HAL_RCC_GPIOF_CLK_ENABLE(); - __HAL_RCC_GPIOB_CLK_ENABLE(); - __HAL_RCC_GPIOD_CLK_ENABLE(); - __HAL_RCC_GPIOG_CLK_ENABLE(); - HAL_PWREx_EnableVddIO2(); - __HAL_RCC_GPIOA_CLK_ENABLE(); - - /*Configure GPIO pin Output Level */ - HAL_GPIO_WritePin(timestamp_GPIO_Port, timestamp_Pin, GPIO_PIN_SET); - - /*Configure GPIO pin Output Level */ - HAL_GPIO_WritePin(Processing_GPIO_Port, Processing_Pin, GPIO_PIN_RESET); - - /*Configure GPIO pin Output Level */ - HAL_GPIO_WritePin(GPIOB, LD3_Pin|LD2_Pin, GPIO_PIN_RESET); - - /*Configure GPIO pin Output Level */ - HAL_GPIO_WritePin(USB_PowerSwitchOn_GPIO_Port, USB_PowerSwitchOn_Pin, GPIO_PIN_RESET); - - /*Configure GPIO pin Output Level */ - HAL_GPIO_WritePin(WW_DETECTED_GPIO_Port, WW_DETECTED_Pin, GPIO_PIN_SET); - - /*Configure GPIO pin : B1_Pin */ - GPIO_InitStruct.Pin = B1_Pin; - GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING; - GPIO_InitStruct.Pull = GPIO_NOPULL; - HAL_GPIO_Init(B1_GPIO_Port, &GPIO_InitStruct); - - /*Configure GPIO pin : timestamp_Pin */ - GPIO_InitStruct.Pin = timestamp_Pin; - GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; - GPIO_InitStruct.Pull = GPIO_NOPULL; - GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_MEDIUM; - HAL_GPIO_Init(timestamp_GPIO_Port, &GPIO_InitStruct); - - /*Configure GPIO pin : Processing_Pin */ - GPIO_InitStruct.Pin = Processing_Pin; - GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; - GPIO_InitStruct.Pull = GPIO_NOPULL; - GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_MEDIUM; - HAL_GPIO_Init(Processing_GPIO_Port, &GPIO_InitStruct); - - /*Configure GPIO pins : LD3_Pin LD2_Pin WW_DETECTED_Pin */ - GPIO_InitStruct.Pin = LD3_Pin|LD2_Pin|WW_DETECTED_Pin; - GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; - GPIO_InitStruct.Pull = GPIO_NOPULL; - GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW; - HAL_GPIO_Init(GPIOB, &GPIO_InitStruct); - - /*Configure GPIO pin : USB_OverCurrent_Pin */ - GPIO_InitStruct.Pin = USB_OverCurrent_Pin; - GPIO_InitStruct.Mode = GPIO_MODE_INPUT; - GPIO_InitStruct.Pull = GPIO_NOPULL; - HAL_GPIO_Init(USB_OverCurrent_GPIO_Port, &GPIO_InitStruct); - - /*Configure GPIO pin : USB_PowerSwitchOn_Pin */ - GPIO_InitStruct.Pin = USB_PowerSwitchOn_Pin; - GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; - GPIO_InitStruct.Pull = GPIO_NOPULL; - GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW; - HAL_GPIO_Init(USB_PowerSwitchOn_GPIO_Port, &GPIO_InitStruct); - -/* USER CODE BEGIN MX_GPIO_Init_2 */ -/* USER CODE END MX_GPIO_Init_2 */ -} - -/* USER CODE BEGIN 4 */ - -/* USER CODE END 4 */ - -/** - * @brief This function is executed in case of error occurrence. - * @retval None - */ -void Error_Handler(void) -{ - /* USER CODE BEGIN Error_Handler_Debug */ - /* User can add his own implementation to report the HAL error return state */ - __disable_irq(); - while (1) - { - } - /* USER CODE END Error_Handler_Debug */ -} - -#ifdef USE_FULL_ASSERT -/** - * @brief Reports the name of the source file and the source line number - * where the assert_param error has occurred. - * @param file: pointer to the source file name - * @param line: assert_param error line source number - * @retval None - */ -void assert_failed(uint8_t *file, uint32_t line) -{ - /* USER CODE BEGIN 6 */ - /* User can add his own implementation to report the file name and line number, - ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */ - /* USER CODE END 6 */ -} -#endif /* USE_FULL_ASSERT */ diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util.c b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util.c index ea9fdcff..dd5b0db9 100644 --- a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util.c +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util.c @@ -5,61 +5,49 @@ * Author: jeremy */ -#include -#include -#include -#include -#include -#include -#include - -#include "stm32l4xx_hal.h" -#include "arm_math.h" - - #include "sww_ref_util.h" -#include "feature_extraction.h" -#include "main.h" - -// needed for running the model and/or initializing inference setup -#include "sww_model.h" -#include "sww_model_data.h" -#include "fixed_data.h" -// I don't want to move the main declaration out of main.c because it is auto-generated by CubeMX -extern SAI_HandleTypeDef hsai_BlockA1; -extern TIM_HandleTypeDef htim16; +// I don't want to move the main declaration out of main.c because it is +// auto-generated by CubeMX +// extern SAI_HandleTypeDef hsai_BlockA1; +//extern TIM_HandleTypeDef htim16; -#define MAX_CMD_TOKENS 8 // maximum number of tokens in a command, including the command and arguments +#define MAX_CMD_TOKENS 8 // maximum number of tokens in a command, including + // the command and arguments // Command buffer (incoming commands from host) char g_cmd_buf[EE_CMD_SIZE + 1]; size_t g_cmd_pos = 0u; // variables for I2S receive uint32_t g_int16s_read = 0; -// chunk should be a 'window-stride' long = 32ms stride * 16kS/s * 2B/sample = 1024 +// chunk should be a 'window-stride' long +// = 32ms stride * 16kS/s * 2B/sample = 1024 // then double because we receive stereo (2 samples per time point) uint32_t g_i2s_chunk_size_bytes = 2048; -uint32_t g_i2s_status = HAL_OK; +uint32_t g_i2s_status = TH_I2S_OK; // two ping-pong byte buffers for DMA transfers from I2S port. int16_t *g_i2s_buffer0 = NULL; int16_t *g_i2s_buffer1 = NULL; -int16_t *g_i2s_current_buff = NULL; // will be either g_i2s_buffer0 or g_i2s_buffer1 -int g_i2s_buff_sel = 0; // 0 for buffer0, 1 for buffer1 -uint8_t *g_gp_buffer = NULL; // general-purpose buffer; for capturing a waveform or activations. +int16_t *g_i2s_current_buff = NULL; // will be either g_i2s_buffer0 or + // g_i2s_buffer1 +int g_i2s_buff_sel = 0; // 0 for buffer0, 1 for buffer1 +uint8_t *g_gp_buffer = NULL; // general-purpose buffer + // for capturing a waveform or activations. uint32_t g_gp_buff_bytes = 64000; -int16_t *g_wav_record = NULL; // buffer to store complete waveform -int8_t *g_act_buff = NULL; // jhdbg +int16_t *g_wav_record = NULL; // buffer to store complete waveform +int8_t *g_act_buff = NULL; // jhdbg int8_t *g_model_input; int g_buffer_alloc_success=0; -// length in (16b) samples, but I2S receives stereo, so actual length in time will be 1/2 this +// length in (16b) samples, but I2S receives stereo +// actual length in time will be 1/2 this uint32_t g_i2s_wav_len = 0; uint32_t g_first_frame = 1; -int16_t *g_wav_block_buff = NULL; // hold most recent SWW_WINLEN_SAMPLES for feature extraction +int16_t *g_wav_block_buff = NULL; // hold most recent SWW_WINLEN_SAMPLES + // for feature extraction LogBuffer g_log = { .buffer = {0}, .current_pos = 0 }; i2s_state_t g_i2s_state = Idle; @@ -67,7 +55,8 @@ i2s_state_t g_i2s_state = Idle; uint32_t g_act_idx = 0; #define ACT_BUFF_LEN 40000 -void setup_i2s_buffers() { +void ee_setup_i2s_buffers() +{ // set up variables for I2S receiving g_i2s_buffer0 = (int16_t *)malloc(g_i2s_chunk_size_bytes); g_i2s_buffer1 = (int16_t *)malloc(g_i2s_chunk_size_bytes); @@ -78,43 +67,31 @@ void setup_i2s_buffers() { g_gp_buffer = malloc(g_gp_buff_bytes); - if (!g_i2s_buffer0 || !g_i2s_buffer1 || !g_wav_block_buff || !g_model_input){ + if (!g_i2s_buffer0 || !g_i2s_buffer1 || !g_wav_block_buff || !g_model_input) + { g_buffer_alloc_success = 0; - printf("ERROR: Buffer allocation failed. Many operationw will fail.\r\n"); + printf("ERROR: Buffer allocation failed. Many operations will fail.\r\n"); } - else { + else g_buffer_alloc_success = 1; - } - if( !g_gp_buff_bytes) { - printf("WARNING: general-purpose buffer allocation failed. Wav and activation capture will fail.\r\n"); - } -} -void delay_us(int delay_len_us) { - // there may be a better way to implement this - // this will not give an accurate 1us delay, but - // for longer delays it should be accurate to within 1us. - int delay_start = __HAL_TIM_GET_COUNTER(&htim16); - while(__HAL_TIM_GET_COUNTER(&htim16) < delay_start + 1 ){ - ; - } + if (!g_gp_buff_bytes) + printf("WARNING: general-purpose buffer allocation failed. WAV and activation capture will fail.\r\n"); } -void print_vals_int16(const int16_t *buffer, uint32_t num_vals) +void ee_print_vals_int16(const int16_t *buffer, uint32_t num_vals) { const int vals_per_line = 16; char end_char; printf("["); - for(uint32_t i=0;i= num_vals) - { + end_char = (i + j == num_vals - 1) ? ']' : ','; + if (i + j >= num_vals) break; - } printf("%d%c ", buffer[i+j], end_char); } printf("\r\n"); @@ -122,21 +99,19 @@ void print_vals_int16(const int16_t *buffer, uint32_t num_vals) } -void print_vals_int8(const int8_t *buffer, uint32_t num_vals) +void ee_print_vals_int8(const int8_t *buffer, uint32_t num_vals) { const int vals_per_line = 16; char end_char; printf("["); - for(uint32_t i=0;i= num_vals) - { + end_char = (i + j == num_vals - 1) ? ' ' : ','; + if (i + j >= num_vals) break; - } printf("%d%c ", buffer[i+j], end_char); } printf("\r\n"); @@ -145,18 +120,16 @@ void print_vals_int8(const int8_t *buffer, uint32_t num_vals) // printf("]\r\n==== Done ====\r\n"); } -void print_bytes(const uint8_t *buffer, uint32_t num_bytes) +void ee_print_bytes(const uint8_t *buffer, uint32_t num_bytes) { const int vals_per_line = 16; printf("["); - for(uint32_t i=0;i= num_bytes) - { + if (i + j >= num_bytes) break; - } printf("0x%X, ", buffer[i+j]); } printf("\r\n"); @@ -165,20 +138,19 @@ void print_bytes(const uint8_t *buffer, uint32_t num_bytes) } -void print_vals_float(const float *buffer, uint32_t num_vals) +void ee_print_vals_float(const float *buffer, uint32_t num_vals) { const int vals_per_line = 8; - char end_char; // don't add a ',' after the last value, because it breaks JSON + char end_char; // don't add a ',' after the last value + // this breaks JSON printf("["); - for(uint32_t i=0;i= num_vals) - { + if (i + j >= num_vals) break; - } printf("%3.5e%c ", buffer[i+j], end_char); } printf("\r\n"); @@ -186,7 +158,8 @@ void print_vals_float(const float *buffer, uint32_t num_vals) // printf("]\r\n==== Done ====\r\n"); printf("]\r\n\r\n"); } -void log_printf(LogBuffer *log, const char *format, ...) { +void ee_log_printf(LogBuffer *log, const char *format, ...) +{ va_list args; char temp_buffer[LOG_BUFFER_SIZE]; int written; @@ -201,15 +174,18 @@ void log_printf(LogBuffer *log, const char *format, ...) { va_end(args); // Check if the formatted string fits in the remaining buffer - if (log->current_pos + written >= LOG_BUFFER_SIZE) { + if (log->current_pos + written >= LOG_BUFFER_SIZE) + { // Buffer overflow: Zero out and reset to the beginning memset(log->buffer, 0, LOG_BUFFER_SIZE); log->current_pos = 0; } // Copy the formatted string to the log buffer - if (written > 0) { - size_t bytes_to_copy = (written < LOG_BUFFER_SIZE) ? written : LOG_BUFFER_SIZE - 1; + if (written > 0) + { + size_t bytes_to_copy = (written < LOG_BUFFER_SIZE) ? written + : LOG_BUFFER_SIZE - 1; strncpy(&log->buffer[log->current_pos], temp_buffer, bytes_to_copy); log->current_pos += bytes_to_copy; } @@ -223,143 +199,43 @@ void log_printf(LogBuffer *log, const char *format, ...) { * It is up to the application to then dispatch this command outside the ISR * as soon as possible by calling ee_serial_command_parser_callback(), below. */ -void ee_serial_callback(char c) { - if (c == EE_CMD_TERMINATOR) { - g_cmd_buf[g_cmd_pos] = (char)0; - process_command(g_cmd_buf); - g_cmd_pos = 0; - } else { - g_cmd_buf[g_cmd_pos] = c; - g_cmd_pos = g_cmd_pos >= EE_CMD_SIZE ? EE_CMD_SIZE : g_cmd_pos + 1; - } -} - - - - -/* Global handle to reference the instantiated C-model */ -static ai_handle sww_model = AI_HANDLE_NULL; - -/* Global c-array to handle the activations buffer */ -AI_ALIGNED(32) -static ai_i8 activations[AI_SWW_MODEL_DATA_ACTIVATIONS_SIZE]; - -/* Array to store the data of the input tensor */ -AI_ALIGNED(32) -static ai_i8 in_data[AI_SWW_MODEL_IN_1_SIZE]; -/* or static ai_i8 in_data[AI_SWW_MODEL_DATA_IN_1_SIZE_BYTES]; */ - -/* c-array to store the data of the output tensor */ -AI_ALIGNED(32) -static ai_i8 out_data[AI_SWW_MODEL_OUT_1_SIZE]; -/* static ai_i8 out_data[AI_SWW_MODEL_DATA_OUT_1_SIZE_BYTES]; */ - -/* Array of pointer to manage the model's input/output tensors */ -static ai_buffer *ai_input; -static ai_buffer *ai_output; - - -/* - * Bootstrap inference framework - */ -ai_error aiInit(void) { - ai_error err; - - /* Create and initialize the c-model */ - const ai_handle acts[] = { activations }; - err = ai_sww_model_create_and_init(&sww_model, acts, NULL); - - if (err.type != AI_ERROR_NONE) { - ; - }; - - /* Reteive pointers to the model's input/output tensors */ - ai_input = ai_sww_model_inputs_get(sww_model, NULL); - ai_output = ai_sww_model_outputs_get(sww_model, NULL); - - return err; -} - - - -/* - * Run inference - */ -ai_error aiRun(const void *in_data, void *out_data) { - ai_i32 n_batch; - ai_error err; - - /* 1 - Update IO handlers with the data payload */ - ai_input[0].data = AI_HANDLE_PTR(in_data); - ai_output[0].data = AI_HANDLE_PTR(out_data); - - /* 2 - Perform the inference */ - n_batch = ai_sww_model_run(sww_model, &ai_input[0], &ai_output[0]); - if (n_batch != 1) { - err = ai_sww_model_get_error(sww_model); - - }; - - return err; -} - -void run_model_on_test_data(char *cmd_args[]) { -// acquire_and_process_data(in_data); - const int8_t *input_source=NULL; - uint16_t timer_start, timer_stop, timer_diff; - - printf("In run_model. about to run model\r\n"); - if (strcmp(cmd_args[1], "class0") == 0) { - input_source = test_input_class0; - } - else if (strcmp(cmd_args[1], "class1") == 0) { - input_source = test_input_class1; - } - else if (strcmp(cmd_args[1], "class2") == 0) { - input_source = test_input_class2; - } - else { - printf("Unknown input tensor name, defaulting to test_input_class0\r\n"); - input_source = test_input_class0; - } - for(int i=0;i= EE_CMD_SIZE ? EE_CMD_SIZE : g_cmd_pos + 1; } - printf("]\r\n"); } -void load_or_print_buff(char *cmd_args[]) { +void ee_load_or_print_buff(char *cmd_args[]) +{ // process the 'db' command // `db load N` -- prepares to load N bytes. // `db ff0055aa` -- loads 5 bytes ([0xff, 0x00, 0x55, 0xaa]) - // `db print [N]` prints N bytes from the buffer, defaulting to the whole thing + // `db print [N]` prints N bytes from the buffer, defaulting to the whole + // thing static int db_state = 0; // 0=idle, 1=after 'db load', waiting for bytes' static int transfer_size = 0; // `db load N` sets transfer_size to N static int bytes_loaded = 0; // bytes loaded since last `db load` - char *byte_buff = (char *)g_i2s_buffer0; // g_i2s_buffer0 is in int16 pointer + char *byte_buff = (char *)g_i2s_buffer0; // g_i2s_buffer0: int16 pointer int buff_size = g_i2s_chunk_size_bytes; - if (cmd_args[1] == NULL) { + if (cmd_args[1] == NULL) printf("Error: db requires a sub-command: 'db load '; 'db print [Nbytes]', 'db '\r\n"); - } - else if (strcmp(cmd_args[1], "load") == 0) { + else if (strcmp(cmd_args[1], "load") == 0) + { transfer_size = atoi(cmd_args[2]); - if (transfer_size == 0) { + if (transfer_size == 0) + { printf("Error: Transfer size (%s) must be valid int; greater than 0.\r\n", cmd_args[2]); printf("Usage: 'db load N'; N>0\r\n"); db_state = 0; @@ -370,26 +246,31 @@ void load_or_print_buff(char *cmd_args[]) { printf("Expecting %d bytes\r\n", transfer_size); return; } - else if (isxdigit((int)cmd_args[1][0])) { // e.g. `db ff001234` actually loads the data` + else if (isxdigit((int)cmd_args[1][0])) + { // e.g. `db ff001234` actually loads the data` int num_chars = strlen(cmd_args[1]); uint8_t next_byte = 0; - if (db_state != 1) { + if (db_state != 1) + { printf("Error: Must issue db load command before transmitting data.\r\n"); return; } - if (num_chars % 2 != 0) { + if (num_chars % 2 != 0) + { printf("Error: number of hex digits in data string must be even. Received %d\r\n", num_chars); printf("Still waiting for data\r\n"); return; } char tmp_str[3] = {'\0', '\0', '\0'}; - for (int i=0;i= buff_size || bytes_loaded >= transfer_size) { + if (bytes_loaded >= buff_size || bytes_loaded >= transfer_size) + { db_state = 0; printf("m-load-done\r\n"); return; @@ -405,100 +287,67 @@ void load_or_print_buff(char *cmd_args[]) { } printf("%d bytes received\r\n", bytes_loaded); } - else if (strcmp(cmd_args[1], "getptr") == 0) { + else if (strcmp(cmd_args[1], "getptr") == 0) printf("m-buff-ptr-%d\r\n", bytes_loaded); - } - else if (strcmp(cmd_args[1], "setptr") == 0) { - if (cmd_args[2] != NULL) { + else if (strcmp(cmd_args[1], "setptr") == 0) + { + if (cmd_args[2] != NULL) bytes_loaded = atoi(cmd_args[2]); - } - else { + else printf("Error: setptr requires a numeric argument: 'db setptr 123%%'"); - } } - else if (strcmp(cmd_args[1], "print") == 0) { + else if (strcmp(cmd_args[1], "print") == 0) + { int bytes_to_print = 0; - if (cmd_args[2] != NULL) { + if (cmd_args[2] != NULL) bytes_to_print = atoi(cmd_args[2]); - } - if (bytes_to_print <= 0 || bytes_to_print > buff_size) { + if (bytes_to_print <= 0 || bytes_to_print > buff_size) bytes_to_print = buff_size; - } printf("m-buffer-"); - for(int i=0; i buff_size/2) { + if (vals_to_print <= 0 || vals_to_print > buff_size/2) vals_to_print = buff_size/2; - } - print_vals_int16((int16_t *)byte_buff, vals_to_print); + ee_print_vals_int16((int16_t *)byte_buff, vals_to_print); } - else { + else printf("Error: db: Unrecognized sub-command %s\r\n", cmd_args[1]); - } -} -void run_extraction(char *cmd_args[]) { - - // Feature extraction work - float32_t test_out[1024] = {0.0}; - float32_t dsp_buff[1024] = {0.0}; - // this will only operate on the first block_size (1024) elements of the input wav - - uint32_t timer_start, timer_stop; - char *endptr; - uint32_t offset; - - // Optional offset arg. "extract 1024", if cmd_arg[1] is present, convert to long - if (cmd_args[1] != NULL && *cmd_args[1] != '\0') { - offset = strtol(cmd_args[1], &endptr, 10); - } - else { - offset = 0; - } - timer_start = __HAL_TIM_GET_COUNTER(&htim16); - compute_lfbe_f32(test_wav_marvin+offset, test_out, dsp_buff); - timer_stop = __HAL_TIM_GET_COUNTER(&htim16); - - printf("TIM16: compute_lfbe_f32 took (%lu : %lu) = %lu TIM16 cycles\r\n", timer_start, timer_stop, timer_stop-timer_start); - printf("\r\n{\r\n"); - printf("\"Input\": "); - print_vals_int16(test_wav_marvin+offset, 1024); - printf(",\r\n \"Output\": "); - print_vals_float(test_out, 40); - printf("}\r\n"); } -void stop_detection(char *cmd_args[]) { - switch(g_i2s_state) { - // the stopping/idle combination may not be necessary, but it was originally set up - // to go to Stopping then wait for the current transaction to complete before Idle. - // But sometimes the current transaction never completes, leaving the program hung. +void ee_stop_detection(char *cmd_args[]) +{ + switch(g_i2s_state) + { + // the stopping/idle combination may not be necessary, but it was originally + // set up to go to Stopping then wait for the current transaction to + // complete before Idle. But sometimes the current transaction never + // completes, leaving the program hung. case Streaming: g_i2s_state = Stopping; - g_i2s_status = HAL_SAI_DMAStop(&hsai_BlockA1); + g_i2s_status = th_dma_stop(); g_i2s_state = Idle; - th_timestamp(); // this timestamp will stop the measurement of power + ee_timestamp(); // this timestamp will stop the measurement of power printf("Streaming stopped.\r\n"); printf("target activations: \r\n"); - print_vals_int8(g_act_buff, g_act_idx); // jhdbg + ee_print_vals_int8(g_act_buff, g_act_idx); // jhdbg g_act_buff = NULL; break; case FileCapture: g_i2s_state = Stopping; - g_i2s_status = HAL_SAI_DMAStop(&hsai_BlockA1); + g_i2s_status = th_dma_stop(); g_i2s_state = Idle; free(g_wav_record); g_wav_record = NULL; @@ -516,73 +365,83 @@ void stop_detection(char *cmd_args[]) { } } -void start_detection(char *cmd_args[]) { - if(g_i2s_state != Idle) { - printf("I2S Rx currently in progress. Ignoring request\r\n"); - } - else { - g_i2s_state = Streaming; +void ee_start_detection(char *cmd_args[]) +{ + if (g_i2s_state != Idle) + printf("I2S Rx currently in progress. Ignoring request\r\n"); + else + { + g_i2s_state = Streaming; - g_act_buff = (int8_t *)g_gp_buffer; - if( !g_act_buff ) { - printf("WARNING: Activation buffer malloc failed. Activation logging will not work.\r\n"); - } - g_int16s_read = 0; // jhdbg -- only needed when we're capturing the waveform in addition to detecting - g_first_frame = 1; // on the first frame of a recording we pulse the detection GPIO to synchronize timing. + g_act_buff = (int8_t *)g_gp_buffer; + if (!g_act_buff) + printf("WARNING: Activation buffer malloc failed. Activation logging will not work.\r\n"); + g_int16s_read = 0; // jhdbg -- only needed when we're capturing the + // waveform in addition to detecting + g_first_frame = 1; // on the first frame of a recording we pulse the + // detection GPIO to synchronize timing. - memset(g_act_buff, 0, g_gp_buff_bytes); - g_act_idx = 0; + memset(g_act_buff, 0, g_gp_buff_bytes); + g_act_idx = 0; - printf("Listening for I2S data ... \r\n"); + printf("Listening for I2S data ... \r\n"); - // these memsets are not really needed, but they make it easier to tell - // if the write never happened. - memset(g_i2s_buffer0, 0xFF, g_i2s_chunk_size_bytes); - memset(g_i2s_buffer1, 0xFF, g_i2s_chunk_size_bytes); + // these memsets are not really needed, but they make it easier to tell + // if the write never happened. + memset(g_i2s_buffer0, 0xFF, g_i2s_chunk_size_bytes); + memset(g_i2s_buffer1, 0xFF, g_i2s_chunk_size_bytes); - // first several cycles won't fully populate g_model_input, so initialize - // it with 0s to avoid unpredictable detections at the beginning - memset(g_model_input, 0x00, SWW_MODEL_INPUT_SIZE*sizeof(int8_t)); - memset(g_wav_block_buff, 0x00, SWW_WINLEN_SAMPLES*sizeof(int16_t)); + // first several cycles won't fully populate g_model_input, so + // init with 0s to avoid unpredictable detections at the beginning + memset(g_model_input, 0x00, SWW_MODEL_INPUT_SIZE*sizeof(int8_t)); + memset(g_wav_block_buff, 0x00, SWW_WINLEN_SAMPLES*sizeof(int16_t)); - th_timestamp(); // this timestamp will start the measurement of power - set_processing_pin_low(); // end of processing, used for duty cycle measurement - // pulse processing pin for 1us to align the duty cycle, energy measurements, and detections - set_processing_pin_high(); // end of processing, used for duty cycle measurement - delay_us(1); - set_processing_pin_low(); // end of processing, used for duty cycle measurement + ee_timestamp(); // this timestamp will start the measurement of power + ee_set_processing_pin_low(); // end of processing + // used for duty cycle measurement + // pulse processing pin for 1us to align the duty cycle, energy + // measurements, and detections + ee_set_processing_pin_high(); // end of processing + th_delay_us(1); + ee_set_processing_pin_low(); // end of processing - g_i2s_status = HAL_SAI_Receive_DMA(&hsai_BlockA1, (uint8_t *)g_i2s_current_buff, g_i2s_chunk_size_bytes/2); - printf("DMA receive initiated.\r\n"); + g_i2s_status = th_dma_receive((uint8_t *)g_i2s_current_buff, + g_i2s_chunk_size_bytes/2); + printf("DMA receive initiated.\r\n"); } } -void i2s_capture(char *cmd_args[]) { - if(g_i2s_state != Idle ) { - printf("I2S Rx currently in progress. Ignoring request\r\n"); - return; +void ee_i2s_capture(char *cmd_args[]) +{ + if (g_i2s_state != Idle ) + { + printf("I2S Rx currently in progress. Ignoring request\r\n"); + return; } - if (cmd_args[1]) { + if (cmd_args[1]) + { g_i2s_wav_len = atoi(cmd_args[1]); - if( g_i2s_wav_len > g_gp_buff_bytes/2) { + if (g_i2s_wav_len > g_gp_buff_bytes / 2) + { printf("Requested length %lu exceeds available memory. Capturing %lu samples\r\n", g_i2s_wav_len, g_gp_buff_bytes/2); g_i2s_wav_len = g_gp_buff_bytes/4; } } - else { + else + { g_i2s_wav_len = g_gp_buff_bytes/2; // 2 bytes/sample - printf("No length specified. Capturing %lu samples\r\n", g_i2s_wav_len); + printf("No length specified. Capturing %lu samples\r\n", + g_i2s_wav_len); } g_i2s_state = FileCapture; g_int16s_read = 0; g_wav_record = (int16_t *)g_gp_buffer; // g_gp_buff_bytes bytes - if( !g_wav_record ) { + if (!g_wav_record) printf("WARNING: Recording buffer has no allocated memory. I2S Capture will fail.\r\n"); - } printf("Listening for I2S data ... \r\n"); memset(g_wav_record, 0, g_gp_buff_bytes); // *2 b/c wav_len is int16s // these memsets are not really needed, but they make it easier to tell @@ -590,13 +449,16 @@ void i2s_capture(char *cmd_args[]) { memset(g_i2s_buffer0, 0xFF, g_i2s_chunk_size_bytes); memset(g_i2s_buffer1, 0xFF, g_i2s_chunk_size_bytes); - g_i2s_status = HAL_SAI_Receive_DMA(&hsai_BlockA1, (uint8_t *)g_i2s_current_buff, g_i2s_chunk_size_bytes/2); + g_i2s_status = th_dma_receive((uint8_t *)g_i2s_current_buff, + g_i2s_chunk_size_bytes/2); // you can also check hsai->State - printf("DMA receive initiated. status=%lu, state=%d\r\n", g_i2s_status, hsai_BlockA1.State); + printf("DMA receive initiated. status=%lu, state=%d\r\n", g_i2s_status, + th_dma_state()); printf(" Status: 0=OK, 1=Error, 2=Busy, 3=Timeout; State: 0=Reset, 1=Ready, 2=Busy (internal process), 18=Busy (Tx), 34=Busy (Rx)\r\n"); } -void print_help(char *cmd_args[]) { +void ee_print_help(char *cmd_args[]) +{ char help_message[] = "name -- print out an identifying message\r\n" "run_model -- run the NN model. An optional argument class0, class1, or class2 runs the model\r\n" @@ -613,19 +475,23 @@ void print_help(char *cmd_args[]) { printf(help_message); } -void print_and_clear_log(char *cmd_args[]) { +void ee_print_and_clear_log(char *cmd_args[]) +{ printf("Log contents[cp=%u]:\r\n<%s>\r\n", g_log.current_pos, g_log.buffer); memset(g_log.buffer, 0, LOG_BUFFER_SIZE); g_log.current_pos = 0; } -void print_state(char *cmd_args[]) { - printf("g_i2s_status=%lu, SAI state=%d\r\n", g_i2s_status, hsai_BlockA1.State); +void ee_th_print_state(char *cmd_args[]) +{ + printf("g_i2s_status=%lu, SAI state=%d\r\n", g_i2s_status, th_dma_state()); printf(" Status: 0=OK, 1=Error, 2=Busy, 3=Timeout; State: 0=Reset, 1=Ready, 2=Busy (internal process), 18=Busy (Tx), 34=Busy (Rx)\r\n"); - printf("g_i2s_state = %d, g_int16s_read=%lu\r\n", g_i2s_state, g_int16s_read); + printf("g_i2s_state = %d, g_int16s_read=%lu\r\n", g_i2s_state, + g_int16s_read); } -void process_command(char *full_command) { +void ee_process_command(char *full_command) +{ char *cmd_args[MAX_CMD_TOKENS] = {NULL}; printf("Received command: %s\r\n", full_command); @@ -635,446 +501,122 @@ void process_command(char *full_command) { cmd_args[0] = token; // and cmd_args[1:] are the arguments - for(int i=1;i%s\r\n", i, (void *)cmd_args[i], cmd_args[i]); // } - // full_command should be " " (command and args delimited by spaces) + // full_command should be " " (command and args + // delimited by spaces) // put the command and arguments into the array cmd_arg[] - if (strcmp(cmd_args[0], "name") == 0) { + if (strcmp(cmd_args[0], "name") == 0) printf(EE_MSG_NAME, EE_DEVICE_NAME, TH_VENDOR_NAME_STRING); - } - else if (strcmp(cmd_args[0], "profile") == 0) { + else if (strcmp(cmd_args[0], "profile") == 0) + { printf("m-profile-[%s]\r\n", EE_FW_VERSION); printf("m-model-[%s]\r\n", TH_MODEL_VERSION); } - else if(strcmp(cmd_args[0], "run_model") == 0) { - run_model_on_test_data(cmd_args); - } - else if(strcmp(cmd_args[0], "extract") == 0) { - run_extraction(cmd_args); - } - else if(strcmp(cmd_args[0], "i2scap") == 0) { - i2s_capture(cmd_args); - } - else if(strcmp(cmd_args[0], "log") == 0) { - print_and_clear_log(cmd_args); - } - else if(strcmp(cmd_args[0], "start") == 0) { - start_detection(cmd_args); - } - else if(strcmp(cmd_args[0], "stop") == 0) { - stop_detection(cmd_args); - } - else if(strcmp(cmd_args[0], "state") == 0) { - print_state(cmd_args); - } - else if(strcmp(cmd_args[0], "db") == 0) { - load_or_print_buff(cmd_args); - } - else if(strcmp(cmd_args[0], "help") == 0) { - print_help(cmd_args); - } - else if(strcmp(cmd_args[0], "timestamp") == 0) { - th_timestamp(); // mostly useful for testing the timestamp code - } + else if (strcmp(cmd_args[0], "run_model") == 0) + th_run_model_on_test_data(cmd_args); + else if (strcmp(cmd_args[0], "extract") == 0) + th_run_extraction(cmd_args); + else if (strcmp(cmd_args[0], "i2scap") == 0) + ee_i2s_capture(cmd_args); + else if (strcmp(cmd_args[0], "log") == 0) + ee_print_and_clear_log(cmd_args); + else if (strcmp(cmd_args[0], "start") == 0) + ee_start_detection(cmd_args); + else if (strcmp(cmd_args[0], "stop") == 0) + ee_stop_detection(cmd_args); + else if (strcmp(cmd_args[0], "state") == 0) + ee_th_print_state(cmd_args); + else if (strcmp(cmd_args[0], "db") == 0) + ee_load_or_print_buff(cmd_args); + else if (strcmp(cmd_args[0], "help") == 0) + ee_print_help(cmd_args); + else if (strcmp(cmd_args[0], "timestamp") == 0) + ee_timestamp(); // mostly useful for testing the timestamp code // These next two are mostly useful for testing - else if(strcmp(cmd_args[0], "proc_hi") == 0) { - set_processing_pin_high(); - } - else if(strcmp(cmd_args[0], "proc_lo") == 0) { - set_processing_pin_low(); - } - else if(strcmp(cmd_args[0], "infer_wav") == 0) { - infer_static_wav(cmd_args); - } - else if(strcmp(cmd_args[0], "extract_uart_stream") == 0) { - extract_features_on_chunk(cmd_args); - } - else if(cmd_args[0] == 0) { - printf("Empty command (only a %% read). Type 'help%%' for help\r\n"); // %% => % - } - else { + else if (strcmp(cmd_args[0], "proc_hi") == 0) + ee_set_processing_pin_high(); + else if (strcmp(cmd_args[0], "proc_lo") == 0) + ee_set_processing_pin_low(); + else if (strcmp(cmd_args[0], "infer_wav") == 0) + th_infer_static_wav(cmd_args); + else if (strcmp(cmd_args[0], "extract_uart_stream") == 0) + th_extract_features_on_chunk(cmd_args); + else if (cmd_args[0] == 0) + printf("Empty command (only a %% read). Type 'help%%' for help\r\n"); + else printf("Unrecognized command %s\r\n", full_command); - } printf(EE_MSG_READY); } -void th_timestamp(void) { - HAL_GPIO_WritePin(timestamp_GPIO_Port, timestamp_Pin, GPIO_PIN_RESET); - delay_us(1); - HAL_GPIO_WritePin(timestamp_GPIO_Port, timestamp_Pin, GPIO_PIN_SET); +void ee_timestamp(void) +{ + TH_GPIO_WRITE(timestamp_GPIO_Port, timestamp_Pin, GPIO_PIN_RESET); + th_delay_us(1); + TH_GPIO_WRITE(timestamp_GPIO_Port, timestamp_Pin, GPIO_PIN_SET); // unsigned long microSeconds = 0ul; // microSeconds = us_ticker_read(); // th_printf(EE_MSG_TIMESTAMP, microSeconds); } -void set_processing_pin_high(void) { - HAL_GPIO_WritePin(Processing_GPIO_Port, Processing_Pin, GPIO_PIN_SET); +void ee_set_processing_pin_high(void) +{ + TH_GPIO_WRITE(Processing_GPIO_Port, Processing_Pin, GPIO_PIN_SET); } -void set_processing_pin_low(void) { - HAL_GPIO_WritePin(Processing_GPIO_Port, Processing_Pin, GPIO_PIN_RESET); +void ee_set_processing_pin_low(void) +{ + TH_GPIO_WRITE(Processing_GPIO_Port, Processing_Pin, GPIO_PIN_RESET); } -void infer_static_wav(char *cmd_args[]) { - // feature_buff is used internally as a 2nd internal scratch space, - // in the FFT domain, so it needs to be winlen_samples long, even though - // ultimately it will only hold NUM_MEL_FILTERS values. This can probably - // be improved with a refactored compute_lfbe_f32(). - static float32_t feature_buff[SWW_WINLEN_SAMPLES]; - static float32_t dsp_buff[SWW_WINLEN_SAMPLES]; - int num_steps; // jhdbg - int offset; - uint32_t wav_len=0; - const int16_t *wav_ptr=NULL; - - offset = atoi(cmd_args[1]); - wav_ptr = test_wav_long + offset; - wav_len = test_wav_long_len-offset; - printf("Infering on static wav with offset = %d\r\n", offset); - - num_steps = (wav_len - (SWW_WINLEN_SAMPLES - SWW_WINSTRIDE_SAMPLES))/SWW_WINSTRIDE_SAMPLES; - - // extract the input scale factor from the (file-global) ai_input - float32_t input_scale_factor = *(ai_input[0].meta_info->intq_info->info->scale); - - // initialize model input buffer to 0s. - for(int i=0;i DETECT_THRESHOLD || g_first_frame) { - printf("[%d]: Detection (%d). g_first_frame=%lu\r\n", idx_step, out_data[0], g_first_frame); - log_printf(&g_log, "[%d]: Detection (%d). g_first_frame=%lu\r\n", idx_step, out_data[0], g_first_frame); - g_first_frame = 0; - } - else if( out_data[0] > 100) { - printf("[%d]: Near miss (%d). \r\n", idx_step, out_data[0]); - } - - printf("%d), \r\n", out_data[0]); - } -} -void process_chunk_and_cont_capture(SAI_HandleTypeDef *hsai) { +void ee_process_chunk_and_cont_capture(void *hsai) +{ int reading_complete=0; g_int16s_read += g_i2s_chunk_size_bytes/2; // idle_buffer is the one that will be idle after we switch int16_t* idle_buffer = g_i2s_buff_sel ? g_i2s_buffer1 : g_i2s_buffer0; - g_i2s_buff_sel = g_i2s_buff_sel ^ 1; // toggle between 0/1 => g_i2s_buffer0/1 + g_i2s_buff_sel = g_i2s_buff_sel ^ 1; // toggle between 0/1=>g_i2s_buffer0/1 g_i2s_current_buff = g_i2s_buff_sel ? g_i2s_buffer1 : g_i2s_buffer0; - if(g_int16s_read + g_i2s_chunk_size_bytes/2 <= g_i2s_wav_len){ - // there is space left for a full chunk - g_i2s_status = HAL_SAI_Receive_DMA(hsai, (uint8_t *)g_i2s_current_buff, g_i2s_chunk_size_bytes/2); - } - else { - // if there is only space for a partial read - // i.e. (g_int16s_read < g_i2s_wav_len < g_int16s_read + g_i2s_chunk_size_bytes/2) - // don't start the read, b/c you'll overflow the allocated buffer - // that means you'll read less than requested, but avoid a seg-fault. + // check for space for a full chunk + // if there is only space for a partial read + // i.e. (g_int16s_read < g_i2s_wav_len + // < g_int16s_read + g_i2s_chunk_size_bytes/2) + // don't start the read, b/c you'll overflow the allocated buffer + // that means you'll read less than requested, but avoid a seg-fault. + if (g_int16s_read + g_i2s_chunk_size_bytes/2 <= g_i2s_wav_len) + g_i2s_status = th_dma_receive((uint8_t *)g_i2s_current_buff, + g_i2s_chunk_size_bytes/2); + else reading_complete = 1; - } - HAL_GPIO_WritePin(GPIOB, GPIO_PIN_8, GPIO_PIN_SET); + TH_GPIO_WRITE(GPIOB, GPIO_PIN_8, GPIO_PIN_SET); // for 1024 bytes, this memcpy takes about 50 us. - memcpy((uint8_t*)(g_wav_record+g_int16s_read-g_i2s_chunk_size_bytes/2), idle_buffer, g_i2s_chunk_size_bytes); + memcpy((uint8_t*)(g_wav_record+g_int16s_read-g_i2s_chunk_size_bytes/2), + idle_buffer, g_i2s_chunk_size_bytes); - if( reading_complete ){ - printf("DMA Receive completed %lu int16s read out of %lu requested\r\n", g_int16s_read, g_i2s_wav_len); - print_vals_int16(g_wav_record, g_int16s_read); + if (reading_complete) + { + printf("DMA Receive completed %lu int16s read out of %lu requested\r\n", + g_int16s_read, g_i2s_wav_len); + ee_print_vals_int16(g_wav_record, g_int16s_read); g_wav_record = NULL; g_i2s_state = Idle; } - HAL_GPIO_WritePin(GPIOB, GPIO_PIN_8, GPIO_PIN_RESET); + TH_GPIO_WRITE(GPIOB, GPIO_PIN_8, GPIO_PIN_RESET); } - - -void extract_features_on_chunk(char *cmd_args[]) { - - - - // feature_buff is used internally as a 2nd internal scratch space, - // in the FFT domain, so it needs to be winlen_samples long, even though - // ultimately it will only hold NUM_MEL_FILTERS values. This can probably - // be improved with a refactored compute_lfbe_f32(). - static float32_t feature_buff[SWW_WINLEN_SAMPLES]; - static float32_t dsp_buff[SWW_WINLEN_SAMPLES]; - static int num_calls=0; - - // extract the input scale factor from the (file-global) ai_input - float32_t input_scale_factor = *(ai_input[0].meta_info->intq_info->info->scale); - - - if( num_calls == 0) { - for(int i=0;i] are old samples to be - // shifted to the beginning of the clip. After this block, - // g_wav_block_buff[0:(winlen-winstride)] is populated - for(int i=SWW_WINSTRIDE_SAMPLES;iintq_info->info->scale); - - // idle_buffer is the one that will be idle after we switch - int16_t *idle_buffer = g_i2s_buff_sel ? g_i2s_buffer1 : g_i2s_buffer0; - g_i2s_buff_sel = g_i2s_buff_sel ^ 1; // toggle between 0/1 => g_i2s_buffer0/1 - g_i2s_current_buff = g_i2s_buff_sel ? g_i2s_buffer1 : g_i2s_buffer0; - - g_i2s_status = HAL_SAI_Receive_DMA(hsai, (uint8_t *)g_i2s_current_buff, g_i2s_chunk_size_bytes/2); - - // g_wav_block_buff[SWW_WINSTRIDE_SAMPLES:] are old samples to be - // shifted to the beginning of the clip. After this block, - // g_wav_block_buff[0:(winlen-winstride)] is populated - for(int i=SWW_WINSTRIDE_SAMPLES;i DETECT_THRESHOLD || g_first_frame) { - HAL_GPIO_WritePin(WW_DETECTED_GPIO_Port, WW_DETECTED_Pin, GPIO_PIN_RESET); - delay_us(1); - HAL_GPIO_WritePin(WW_DETECTED_GPIO_Port, WW_DETECTED_Pin, GPIO_PIN_SET); - g_first_frame = 0; - } - - if ( g_act_idx < (g_gp_buff_bytes/sizeof(g_act_buff[0])) ) { - g_act_buff[g_act_idx++] = out_data[0]; - } - - num_calls++; - set_processing_pin_low(); // end of processing, used for duty cycle measurement -} - -void HAL_SAI_RxCpltCallback(SAI_HandleTypeDef *hsai) { - if( g_i2s_state == FileCapture) { - process_chunk_and_cont_capture(hsai); - } - else if( g_i2s_state == Streaming) { - process_chunk_and_cont_streaming(hsai); - } - else if( g_i2s_state == Stopping) { - printf("Streaming stopped\r\n"); - g_i2s_state = Idle; - } -} - -void compute_lfbe_f32(const int16_t *pSrc, float32_t *pDst, float32_t *pTmp) -{ - const uint32_t block_length=SWW_WINLEN_SAMPLES; - const float32_t inv_block_length=1.0/SWW_WINLEN_SAMPLES; - const uint32_t spec_len = SWW_WINLEN_SAMPLES/2+1; - const float32_t preemphasis_coef = 0.96875; // 1.0 - 2.0 ** -5; - const float32_t power_offset = 52.0; - const uint32_t num_filters = 40; - int i; // for looping - // to maintain continuity in pre-emphasis over segment boundaries - static float32_t last_value = 0.0; - arm_status op_result = ARM_MATH_SUCCESS; - - // convert int16_t pSrc to float32_t. range [-32768:32767] => [-1.0,1.0) - // WINLEN - WINSTRIDE of these have already been converted once, so a little speedup - // could probably be gained by factoring this out into process_chunk_and_continue_streaming - for(i=0;i pDst[1:] - arm_sub_f32 (pDst+1, pTmp, pDst+1, block_length-1); - - // apply hamming window to pDst and put results in pTmp. - arm_mult_f32(pDst, hamm_win_1024, pTmp, block_length); - - - /* RFFT based implementation */ - arm_rfft_fast_instance_f32 rfft_s; - op_result = arm_rfft_fast_init_f32(&rfft_s, block_length); - if (op_result != ARM_MATH_SUCCESS) { - printf("Error %d in arm_rfft_fast_init_f32", op_result); - } - arm_rfft_fast_f32(&rfft_s,pTmp,pDst,0); // use config rfft_s; FFT(pTmp) => pDst, ifft=0 - - // Now we need to take the magnitude of the spectrum. For block_length=1024, it will be 513 elements - // we'll use pTmp as an array of block_length/2+1 real values. - // the N/2th element is real and stuck in pDst[1] (where fft[0].imag=0 should be) - // move that to pTmp[block_length/2] - pTmp[block_length/2] = pDst[1]; // real value corresponding to fsamp/2 - pDst[1] = 0; // so now pDst[0,1] = real,imag elements at f=0 (always real, so imag=0) - arm_cmplx_mag_f32(pDst,pTmp,block_length/2); // mag(pDst) => pTmp. pTmp[512] already set. - - // powspec = (1 / data_config['window_size_samples']) * tf.square(magspec) - arm_mult_f32(pTmp, pTmp,pDst, spec_len); // pDst[0:513] = pTmp[0:513]^2 - arm_scale_f32(pDst, inv_block_length, pTmp, spec_len); - - - // The original lin2mel matrix is spec_len x num_filters, where each column holds one mel filter, - // lin2mel_packed_x has all the non-zero elements packed together in one 1D array - // _filter_starts are the locations in each *original* column where the non-zero elements start - // _filter_lens is how many non-zero elements are in each original column - // So the i_th filter start in lin2mel_packed at sum(_filter_lens[:i]) - // And the corresponding spectrum segment starts at linear_spectrum[_filter_starts[i]] - int lin2mel_coeff_idx = 0; - /* Apply MEL filters; linear spectrum is now in pTmp[0:spec_len], put mel spectrum in pDst[0:num_filters] */ - for(i=0; i 1e-30) ? pDst[i] : 1e-30; - } - - for(i=0; i 1.0) ? 1.0 : pTmp[i]); - } -} - - diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util_submitter.c b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util_submitter.c new file mode 100644 index 00000000..d87865e8 --- /dev/null +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/Core/Src/sww_ref_util_submitter.c @@ -0,0 +1,963 @@ +/* + * submitter_implemented.c + * + * Created on: Sep 3, 2025 + * Author: owen + */ + +#include "sww_ref_util_submitter.h" + +// private variables from main.c +UART_HandleTypeDef hlpuart1; +UART_HandleTypeDef huart3; + +SAI_HandleTypeDef hsai_BlockA1; +DMA_HandleTypeDef hdma_sai1_a; + +TIM_HandleTypeDef htim16; + +PCD_HandleTypeDef hpcd_USB_OTG_FS; + +extern i2s_state_t g_i2s_state; + +// variables from ST middlewares, only for use on ST hardware +/* Global handle to reference the instantiated C-model */ +static ai_handle sww_model = AI_HANDLE_NULL; + +/* Global c-array to handle the activations buffer */ +AI_ALIGNED(32) +static ai_i8 activations[AI_SWW_MODEL_DATA_ACTIVATIONS_SIZE]; + +/* Array to store the data of the input tensor */ +AI_ALIGNED(32) +static ai_i8 in_data[AI_SWW_MODEL_IN_1_SIZE]; +/* or static ai_i8 in_data[AI_SWW_MODEL_DATA_IN_1_SIZE_BYTES]; */ + +/* c-array to store the data of the output tensor */ +AI_ALIGNED(32) +static ai_i8 out_data[AI_SWW_MODEL_OUT_1_SIZE]; +/* static ai_i8 out_data[AI_SWW_MODEL_DATA_OUT_1_SIZE_BYTES]; */ + +/* Array of pointer to manage the model's input/output tensors */ +static ai_buffer *ai_input; +static ai_buffer *ai_output; + +// from sw_ref_util, for platform-specific I2S functionality +extern uint32_t g_i2s_chunk_size_bytes; +extern int8_t *g_model_input; +extern uint32_t g_first_frame; +extern LogBuffer g_log; +extern int16_t *g_wav_block_buff; +extern int16_t *g_i2s_buffer0; +extern int16_t *g_i2s_buffer1; +extern int16_t *g_i2s_current_buff; +extern int g_i2s_buff_sel; +extern uint32_t g_gp_buff_bytes; +extern int8_t *g_act_buff; +extern uint32_t g_i2s_status; +extern uint32_t g_act_idx; + +// static function prototypes +static void MX_GPIO_Init(void); +static void MX_DMA_Init(void); +static void MX_LPUART1_UART_Init(void); +static void MX_USART3_UART_Init(void); +static void MX_USB_OTG_FS_PCD_Init(void); +static void MX_SAI1_Init(void); +static void MX_TIM16_Init(void); + +PUTCHAR_PROTOTYPE +{ + HAL_UART_Transmit(&hlpuart1, (uint8_t *)&ch, 1, HAL_MAX_DELAY); + return ch; +} + +/// Core API function implementations +void th_delay_us(int delay_len_us) +{ + // there may be a better way to implement this + // this will not give an accurate 1us delay, but + // for longer delays it should be accurate to within 1us. + int delay_start = __HAL_TIM_GET_COUNTER(&htim16); + while (__HAL_TIM_GET_COUNTER(&htim16) < delay_start + 1); +} + +void th_hardware_init(void) +{ + /* MCU Configuration------------------------------------------------------*/ + + /* Reset of all peripherals, Initializes the Flash interface and the + Systick. */ + HAL_Init(); + + /* USER CODE BEGIN Init */ + + /* USER CODE END Init */ + + /* Configure the system clock */ + SystemClock_Config(); + + /* USER CODE BEGIN SysInit */ + + /* USER CODE END SysInit */ + + /* Initialize all configured peripherals */ + MX_GPIO_Init(); + MX_DMA_Init(); + MX_LPUART1_UART_Init(); + MX_USART3_UART_Init(); + MX_USB_OTG_FS_PCD_Init(); + MX_SAI1_Init(); + MX_TIM16_Init(); +} + +// implementation of th_timer16_start +void th_timer16_start(void) { HAL_TIM_Base_Start(&htim16); } + +// implementation of th_timer16_get +//uint16_t th_timer16_get(void) { return __HAL_TIM_GET_COUNTER(&htim16); } + +// implementation of th_dma_receive +uint32_t th_dma_receive(uint8_t *i2s_buffer, uint16_t size) +{ + return HAL_SAI_Receive_DMA(&hsai_BlockA1, i2s_buffer, size); +} + +// implementation of th_dma_stop +uint32_t th_dma_stop(void) +{ + return HAL_SAI_DMAStop(&hsai_BlockA1); +} + +// implementation of th_dma_stae +uint8_t th_dma_state(void) +{ + return hsai_BlockA1.State; +} + +// implementation of th_uart_receive +uint32_t th_uart_receive(uint8_t *data, uint16_t size, uint32_t timeout) +{ + return HAL_UART_Receive(&hlpuart1, data, size, timeout); +} + +/* + * Bootstrap inference framework + */ +ai_error th_ai_init(void) +{ + ai_error err; + + /* Create and initialize the c-model */ + const ai_handle acts[] = { activations }; + err = ai_sww_model_create_and_init(&sww_model, acts, NULL); + + if (err.type != AI_ERROR_NONE) + { + ; + }; + + /* Reteive pointers to the model's input/output tensors */ + ai_input = ai_sww_model_inputs_get(sww_model, NULL); + ai_output = ai_sww_model_outputs_get(sww_model, NULL); + + return err; +} + +/* + * Run inference + */ +ai_error th_ai_run(const void *in_data, void *out_data) +{ + ai_i32 n_batch; + ai_error err; + + /* 1 - Update IO handlers with the data payload */ + ai_input[0].data = AI_HANDLE_PTR(in_data); + ai_output[0].data = AI_HANDLE_PTR(out_data); + + /* 2 - Perform the inference */ + n_batch = ai_sww_model_run(sww_model, &ai_input[0], &ai_output[0]); + if (n_batch != 1) + err = ai_sww_model_get_error(sww_model); + + return err; +} + +// implementation of th_run_model_on_test_data +void th_run_model_on_test_data(char *cmd_args[]) +{ +// acquire_and_process_data(in_data); + const int8_t *input_source=NULL; + uint16_t timer_start, timer_stop, timer_diff; + + printf("In run_model. about to run model\r\n"); + if (strcmp(cmd_args[1], "class0") == 0) + input_source = test_input_class0; + else if (strcmp(cmd_args[1], "class1") == 0) + input_source = test_input_class1; + else if (strcmp(cmd_args[1], "class2") == 0) + input_source = test_input_class2; + else + { + printf("Unknown input tensor name, defaulting to test_input_class0\r\n"); + input_source = test_input_class0; + } + for (int i = 0 ; i < AI_SWW_MODEL_IN_1_SIZE ; i++) + in_data[i] = (ai_i8)input_source[i]; + ee_set_processing_pin_high(); + timer_start = __HAL_TIM_GET_COUNTER(&htim16); + /* Call inference engine */ + th_ai_run(in_data, out_data); + timer_stop = __HAL_TIM_GET_COUNTER(&htim16); + ee_set_processing_pin_low(); + timer_diff = timer_stop-timer_start; + printf("TIM16: th_ai_run took (%u : %u) = %u TIM16 cycles\r\n", timer_start, + timer_stop, timer_diff); + + printf("Output = ["); + for (int i = 0 ; i < AI_SWW_MODEL_OUT_1_SIZE ; i++) + printf("%02d, ", out_data[i]); + printf("]\r\n"); +} + +// implementation of th_infer_static_wav +void th_infer_static_wav(char *cmd_args[]) +{ + // feature_buff is used internally as a 2nd internal scratch space, + // in the FFT domain, so it needs to be winlen_samples long, even though + // ultimately it will only hold NUM_MEL_FILTERS values. This can probably + // be improved with a refactored th_compute_lfbe_f32(). + static float32_t feature_buff[SWW_WINLEN_SAMPLES]; + static float32_t dsp_buff[SWW_WINLEN_SAMPLES]; + int num_steps; // jhdbg + int offset; + uint32_t wav_len=0; + const int16_t *wav_ptr=NULL; + + offset = atoi(cmd_args[1]); + wav_ptr = test_wav_long + offset; + wav_len = test_wav_long_len-offset; + printf("Infering on static wav with offset = %d\r\n", offset); + + num_steps = (wav_len - (SWW_WINLEN_SAMPLES - SWW_WINSTRIDE_SAMPLES)) + / SWW_WINSTRIDE_SAMPLES; + + // extract the input scale factor from the (file-global) ai_input + float32_t input_scale_factor + = *(ai_input[0].meta_info->intq_info->info->scale); + + // initialize model input buffer to 0s. + for (int i = 0 ; i < SWW_MODEL_INPUT_SIZE ; i++) + g_model_input[i] = 0; + + for (int idx_step = 0 ; idx_step < num_steps ; idx_step++) + { + + th_compute_lfbe_f32(wav_ptr+(idx_step*SWW_WINSTRIDE_SAMPLES), + feature_buff, dsp_buff); + + // shift current features in g_model_input[] and add new ones. + for (int i = 0 ; i < SWW_MODEL_INPUT_SIZE - NUM_MEL_FILTERS ; i++) + g_model_input[i] = g_model_input[i+NUM_MEL_FILTERS]; + + for (int i = 0 ; i < NUM_MEL_FILTERS ; i++) + g_model_input[i+SWW_MODEL_INPUT_SIZE-NUM_MEL_FILTERS] + = (int8_t)(feature_buff[i]/input_scale_factor-128); + + for (int i = 0 ; i < AI_SWW_MODEL_IN_1_SIZE ; i++) + in_data[i] = (ai_i8)g_model_input[i]; + + // print out the newest vector of features as int8 + printf("("); + ee_print_vals_int8(g_model_input+SWW_MODEL_INPUT_SIZE-NUM_MEL_FILTERS, + NUM_MEL_FILTERS); + printf(", "); + + /* Call inference engine */ + th_ai_run(in_data, out_data); + + if (out_data[0] > DETECT_THRESHOLD || g_first_frame) + { + printf("[%d]: Detection (%d). g_first_frame=%lu\r\n", idx_step, + out_data[0], g_first_frame); + ee_log_printf(&g_log, "[%d]: Detection (%d). g_first_frame=%lu\r\n", + idx_step, out_data[0], g_first_frame); + g_first_frame = 0; + } + else if( out_data[0] > 100) + printf("[%d]: Near miss (%d). \r\n", idx_step, out_data[0]); + + printf("%d), \r\n", out_data[0]); + } +} + +// implementation of th_extract_features_on_chunk +void th_extract_features_on_chunk(char *cmd_args[]) +{ + // feature_buff is used internally as a 2nd internal scratch space, + // in the FFT domain, so it needs to be winlen_samples long, even though + // ultimately it will only hold NUM_MEL_FILTERS values. This can probably + // be improved with a refactored th_compute_lfbe_f32(). + static float32_t feature_buff[SWW_WINLEN_SAMPLES]; + static float32_t dsp_buff[SWW_WINLEN_SAMPLES]; + static int num_calls=0; + + // extract the input scale factor from the (file-global) ai_input + float32_t input_scale_factor + = *(ai_input[0].meta_info->intq_info->info->scale); + + + if (num_calls == 0) + for(int i=0;i] are old samples to be + // shifted to the beginning of the clip. After this block, + // g_wav_block_buff[0:(winlen-winstride)] is populated + for (int i = SWW_WINSTRIDE_SAMPLES ; i < SWW_WINLEN_SAMPLES ; i++) + g_wav_block_buff[i-SWW_WINSTRIDE_SAMPLES] = g_wav_block_buff[i]; + + // Now fill in g_wav_block_buff[(winlen-winstride):] with winstride new samples + // no 2* here because UART transmits mono, unlike I2S buffer, which is stereo + for (int i = SWW_WINLEN_SAMPLES - SWW_WINSTRIDE_SAMPLES ; + i < SWW_WINLEN_SAMPLES ; i++) + g_wav_block_buff[i] + = g_i2s_buffer0[i-(SWW_WINLEN_SAMPLES-SWW_WINSTRIDE_SAMPLES)]; + + th_compute_lfbe_f32(g_wav_block_buff, feature_buff, dsp_buff); + + // shift current features in g_model_input[] and add new ones. + for (int i = 0 ; i < SWW_MODEL_INPUT_SIZE - NUM_MEL_FILTERS ; i++) + g_model_input[i] = g_model_input[i + NUM_MEL_FILTERS]; + + for (int i = 0 ; i < NUM_MEL_FILTERS ; i++) + g_model_input[i + SWW_MODEL_INPUT_SIZE - NUM_MEL_FILTERS] + = (int8_t)(feature_buff[i] / input_scale_factor - 128); + + for (int i = 0 ; i < AI_SWW_MODEL_IN_1_SIZE ; i++) + in_data[i] = (ai_i8)g_model_input[i]; + + /* Call inference engine */ + th_ai_run(in_data, out_data); + + num_calls++; + + printf("m-features-["); + for (int i = 0 ; i < NUM_MEL_FILTERS ; i++) + { + printf("%+3d", (int8_t)(feature_buff[i]/input_scale_factor-128)); + if (i < NUM_MEL_FILTERS -1) + printf(", "); + } + printf("]\r\n"); + + printf("m-activations-[%+3d, %+3d, %+3d]\r\n", out_data[0], out_data[1], + out_data[2]); +} + +// implementation of th_run_extraction +// internally-implemented for performance reasons (timer) +void th_run_extraction(char *cmd_args[]) +{ + // Feature extraction work + float32_t test_out[1024] = {0.0}; + float32_t dsp_buff[1024] = {0.0}; + // this will only operate on the first block_size (1024) elements of the + // input wav + + uint32_t timer_start, timer_stop; + char *endptr; + uint32_t offset; + + // Optional offset arg. "extract 1024" + // if cmd_arg[1] is present, convert to long + if (cmd_args[1] != NULL && *cmd_args[1] != '\0') + offset = strtol(cmd_args[1], &endptr, 10); + else + offset = 0; + timer_start = __HAL_TIM_GET_COUNTER(&htim16); + th_compute_lfbe_f32(test_wav_marvin+offset, test_out, dsp_buff); + timer_stop = __HAL_TIM_GET_COUNTER(&htim16); + + printf("TIM16: th_compute_lfbe_f32 took (%lu : %lu) = %lu TIM16 cycles\r\n", + timer_start, timer_stop, timer_stop-timer_start); + printf("\r\n{\r\n"); + printf("\"Input\": "); + ee_print_vals_int16(test_wav_marvin+offset, 1024); + printf(",\r\n \"Output\": "); + ee_print_vals_float(test_out, 40); + printf("}\r\n"); +} + +// implementation of th_process_chunk_and_cont_streaming +void th_process_chunk_and_cont_streaming(void *hsai) +{ + + // feature_buff is used internally as a 2nd internal scratch space, + // in the FFT domain, so it needs to be winlen_samples long, even though + // ultimately it will only hold NUM_MEL_FILTERS values. This can probably + // be improved with a refactored th_compute_lfbe_f32(). + static float32_t feature_buff[SWW_WINLEN_SAMPLES]; + static float32_t dsp_buff[SWW_WINLEN_SAMPLES]; + static int num_calls = 0; // jhdbg + + // start of processing, used for duty cycle measurement + ee_set_processing_pin_high(); + + // extract the input scale factor from the (file-global) ai_input + float32_t input_scale_factor + = *(ai_input[0].meta_info->intq_info->info->scale); + + // idle_buffer is the one that will be idle after we switch + int16_t *idle_buffer = g_i2s_buff_sel ? g_i2s_buffer1 : g_i2s_buffer0; + g_i2s_buff_sel = g_i2s_buff_sel ^ 1; // toggle between 0/1=>g_i2s_buffer0/1 + g_i2s_current_buff = g_i2s_buff_sel ? g_i2s_buffer1 : g_i2s_buffer0; + + g_i2s_status = th_dma_receive((uint8_t *)g_i2s_current_buff, + g_i2s_chunk_size_bytes/2); + + // g_wav_block_buff[SWW_WINSTRIDE_SAMPLES:] are old samples to be + // shifted to the beginning of the clip. After this block, + // g_wav_block_buff[0:(winlen-winstride)] is populated + for (int i = SWW_WINSTRIDE_SAMPLES ; i < SWW_WINLEN_SAMPLES ; i++) + g_wav_block_buff[i - SWW_WINSTRIDE_SAMPLES] = g_wav_block_buff[i]; + + // Now fill in g_wav_block_buff[(winlen-winstride):] with winstride new samples + // 2* is because the I2S buffer is in stereo + for (int i = SWW_WINLEN_SAMPLES - SWW_WINSTRIDE_SAMPLES + ; i < SWW_WINLEN_SAMPLES ; i++) + g_wav_block_buff[i] + = idle_buffer[2*(i-(SWW_WINLEN_SAMPLES-SWW_WINSTRIDE_SAMPLES))]; + + th_compute_lfbe_f32(g_wav_block_buff, feature_buff, dsp_buff); + + // shift current features in g_model_input[] and add new ones. + for (int i = 0 ; i < SWW_MODEL_INPUT_SIZE - NUM_MEL_FILTERS ; i++) + g_model_input[i] = g_model_input[i+NUM_MEL_FILTERS]; + + for (int i=0 ; i < NUM_MEL_FILTERS ; i++) + g_model_input[i+SWW_MODEL_INPUT_SIZE-NUM_MEL_FILTERS] + = (int8_t)(feature_buff[i]/input_scale_factor-128); + + for (int i=0 ; i < AI_SWW_MODEL_IN_1_SIZE ; i++) + in_data[i] = (ai_i8)g_model_input[i]; + + /* Call inference engine */ + th_ai_run(in_data, out_data); + + if (out_data[0] > DETECT_THRESHOLD || g_first_frame) + { + TH_GPIO_WRITE(WW_DETECTED_GPIO_Port, WW_DETECTED_Pin, GPIO_PIN_RESET); + th_delay_us(1); + TH_GPIO_WRITE(WW_DETECTED_GPIO_Port, WW_DETECTED_Pin, GPIO_PIN_SET); + g_first_frame = 0; + } + + if (g_act_idx < (g_gp_buff_bytes / sizeof(g_act_buff[0]))) + g_act_buff[g_act_idx++] = out_data[0]; + + num_calls++; + ee_set_processing_pin_low(); // end of processing + // used for duty cycle measurement +} + +// implementation of th_compute_lfbe_f32 +void th_compute_lfbe_f32(const int16_t *pSrc, float32_t *pDst, float32_t *pTmp) +{ + const uint32_t block_length=SWW_WINLEN_SAMPLES; + const float32_t inv_block_length=1.0/SWW_WINLEN_SAMPLES; + const uint32_t spec_len = SWW_WINLEN_SAMPLES/2+1; + const float32_t preemphasis_coef = 0.96875; // 1.0 - 2.0 ** -5; + const float32_t power_offset = 52.0; + const uint32_t num_filters = 40; + int i; // for looping + // to maintain continuity in pre-emphasis over segment boundaries + static float32_t last_value = 0.0; + arm_status op_result = ARM_MATH_SUCCESS; + + // convert int16_t pSrc to float32_t. range [-32768:32767] => [-1.0,1.0) + // WINLEN - WINSTRIDE of these have already been converted once, so a + // little speedup + // could probably be gained by factoring this out + // into process_chunk_and_continue_streaming + for (i = 0 ; i < block_length ; i++) + pDst[i] = ((float32_t)pSrc[i]) / 32768.0; + + // Apply pre-emphasis: zero-pad input by 1 + // then x' = x[1:]-pe_coeff*x[:-1], so len(x')==len(x) + // Start by scaling w/ coeff; pTmp = preemphasis_coef * input + arm_scale_f32(pDst, preemphasis_coef, pTmp, block_length); + // calculate pDst[0] separately since it uses a value from the last segment + pDst[0] = pDst[0] - last_value * preemphasis_coef; + + // in the next frame pDst[SWW_WINSTRIDE_SAMPLES-1] will be 1 sample older + // than the 1st sample, so it will be used in the pre-emphasis for pDst[0] + last_value = pDst[SWW_WINSTRIDE_SAMPLES - 1]; + + // use pDst as a 2nd temp buffer pDst[1:] - pTmp => pDst[1:] + arm_sub_f32 (pDst+1, pTmp, pDst+1, block_length-1); + + // apply hamming window to pDst and put results in pTmp. + arm_mult_f32(pDst, hamm_win_1024, pTmp, block_length); + + + /* RFFT based implementation */ + arm_rfft_fast_instance_f32 rfft_s; + op_result = arm_rfft_fast_init_f32(&rfft_s, block_length); + if (op_result != ARM_MATH_SUCCESS) + printf("Error %d in arm_rfft_fast_init_f32", op_result); + arm_rfft_fast_f32(&rfft_s,pTmp,pDst,0); // use config rfft_s + // FFT(pTmp) => pDst, ifft=0 + + // Now we need to take the magnitude of the spectrum. + // For block_length=1024, it will be 513 elements + // we'll use pTmp as an array of block_length/2+1 real values. + // the N/2th element is real and stuck in pDst[1] (where fft[0].imag=0 + // should be), move that to pTmp[block_length/2] + pTmp[block_length/2] = pDst[1]; // real value corresponding to fsamp/2 + pDst[1] = 0; // so now pDst[0,1] = real,imag elements at f=0 + // (always real, so imag=0) + arm_cmplx_mag_f32(pDst,pTmp,block_length/2); // mag(pDst) => pTmp + // pTmp[512] already set. + + // powspec = (1 / data_config['window_size_samples']) * tf.square(magspec) + arm_mult_f32(pTmp, pTmp,pDst, spec_len); // pDst[0:513] = pTmp[0:513]^2 + arm_scale_f32(pDst, inv_block_length, pTmp, spec_len); + + + // The original lin2mel matrix is spec_len x num_filters, where each column + // holds one mel filter, lin2mel_packed_x has all the non-zero + // elements packed together in one 1D array _filter_starts are the locations + // in each *original* column where the non-zero elements start + // _filter_lens is how many non-zero elements are in each original column + // So the i_th filter start in lin2mel_packed at sum(_filter_lens[:i]) + // And the corresponding spectrum segment starts at + // linear_spectrum[_filter_starts[i]] + int lin2mel_coeff_idx = 0; + /* Apply MEL filters; linear spectrum is now in pTmp[0:spec_len], put mel + spectrum in pDst[0:num_filters] */ + for (i = 0 ; i < num_filters ; i++) + { + arm_dot_prod_f32 (pTmp+lin2mel_513x40_filter_starts[i], + lin2mel_packed_513x40+lin2mel_coeff_idx, + lin2mel_513x40_filter_lens[i], + pDst+i); + + lin2mel_coeff_idx += lin2mel_513x40_filter_lens[i]; + } + + // powspec_max = tf.reduce_max(input_tensor=powspec) + // powspec = tf.clip_by_value(powspec, 1e-30, powspec_max) + // # prevent -infinity on log + for (i = 0 ; i < num_filters ; i++) + pDst[i] = (pDst[i] > 1e-30) ? pDst[i] : 1e-30; + + for (i = 0 ; i < num_filters ; i++) + pDst[i] = 10*log10(pDst[i]); + + //log_mel_spec = (log_mel_spec + power_offset - 32 + 32.0) / 64.0 + arm_offset_f32 (pDst, power_offset, pDst, num_filters); + arm_scale_f32(pDst, (1.0/64.0), pTmp, num_filters); + + //log_mel_spec = tf.clip_by_value(log_mel_spec, 0, 1) + for(i = 0 ; i < num_filters ; i++) + pDst[i] = (pTmp[i] < 0.0) ? 0.0 : ((pTmp[i] > 1.0) ? 1.0 : pTmp[i]); +} + +/// Private functions +// private functions, formerly from main.c, mainly STM Cube auto-generated stuff +/** + * @brief System Clock Configuration + * @retval None + */ +void SystemClock_Config(void) +{ + RCC_OscInitTypeDef RCC_OscInitStruct = {0}; + RCC_ClkInitTypeDef RCC_ClkInitStruct = {0}; + + /** Configure the main internal regulator output voltage + */ + if (HAL_PWREx_ControlVoltageScaling(PWR_REGULATOR_VOLTAGE_SCALE1_BOOST) + != HAL_OK) + { + Error_Handler(); + } + + /** Initializes the RCC Oscillators according to the specified parameters + * in the RCC_OscInitTypeDef structure. + */ + RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI48 + | RCC_OSCILLATORTYPE_HSI; + RCC_OscInitStruct.HSIState = RCC_HSI_ON; + RCC_OscInitStruct.HSI48State = RCC_HSI48_ON; + RCC_OscInitStruct.HSICalibrationValue = RCC_HSICALIBRATION_DEFAULT; + RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON; + RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSI; + RCC_OscInitStruct.PLL.PLLM = 2; + RCC_OscInitStruct.PLL.PLLN = 30; + RCC_OscInitStruct.PLL.PLLP = RCC_PLLP_DIV2; + RCC_OscInitStruct.PLL.PLLQ = RCC_PLLQ_DIV2; + RCC_OscInitStruct.PLL.PLLR = RCC_PLLR_DIV2; + if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK) + { + Error_Handler(); + } + + /** Initializes the CPU, AHB and APB buses clocks + */ + RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK | RCC_CLOCKTYPE_SYSCLK + | RCC_CLOCKTYPE_PCLK1 | RCC_CLOCKTYPE_PCLK2; + RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK; + RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1; + RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV2; + RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1; + + if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_5) != HAL_OK) + { + Error_Handler(); + } +} + +/** + * @brief LPUART1 Initialization Function + * @param None + * @retval None + */ +static void MX_LPUART1_UART_Init(void) +{ + /* USER CODE BEGIN LPUART1_Init 0 */ + + /* USER CODE END LPUART1_Init 0 */ + + /* USER CODE BEGIN LPUART1_Init 1 */ + + /* USER CODE END LPUART1_Init 1 */ + hlpuart1.Instance = LPUART1; + hlpuart1.Init.BaudRate = 115200; + hlpuart1.Init.WordLength = UART_WORDLENGTH_8B; + hlpuart1.Init.StopBits = UART_STOPBITS_1; + hlpuart1.Init.Parity = UART_PARITY_NONE; + hlpuart1.Init.Mode = UART_MODE_TX_RX; + hlpuart1.Init.HwFlowCtl = UART_HWCONTROL_NONE; + hlpuart1.Init.OneBitSampling = UART_ONE_BIT_SAMPLE_DISABLE; + hlpuart1.Init.ClockPrescaler = UART_PRESCALER_DIV1; + hlpuart1.AdvancedInit.AdvFeatureInit = UART_ADVFEATURE_NO_INIT; + hlpuart1.FifoMode = UART_FIFOMODE_DISABLE; + if (HAL_UART_Init(&hlpuart1) != HAL_OK) + { + Error_Handler(); + } + if (HAL_UARTEx_SetTxFifoThreshold(&hlpuart1, UART_TXFIFO_THRESHOLD_1_8) + != HAL_OK) + { + Error_Handler(); + } + if (HAL_UARTEx_SetRxFifoThreshold(&hlpuart1, UART_RXFIFO_THRESHOLD_1_8) + != HAL_OK) + { + Error_Handler(); + } + if (HAL_UARTEx_DisableFifoMode(&hlpuart1) != HAL_OK) + { + Error_Handler(); + } + /* USER CODE BEGIN LPUART1_Init 2 */ + + /* USER CODE END LPUART1_Init 2 */ +} + +/** + * @brief USART3 Initialization Function + * @param None + * @retval None + */ +static void MX_USART3_UART_Init(void) +{ + /* USER CODE BEGIN USART3_Init 0 */ + + /* USER CODE END USART3_Init 0 */ + + /* USER CODE BEGIN USART3_Init 1 */ + + /* USER CODE END USART3_Init 1 */ + huart3.Instance = USART3; + huart3.Init.BaudRate = 115200; + huart3.Init.WordLength = UART_WORDLENGTH_8B; + huart3.Init.StopBits = UART_STOPBITS_1; + huart3.Init.Parity = UART_PARITY_NONE; + huart3.Init.Mode = UART_MODE_TX_RX; + huart3.Init.HwFlowCtl = UART_HWCONTROL_NONE; + huart3.Init.OverSampling = UART_OVERSAMPLING_16; + huart3.Init.OneBitSampling = UART_ONE_BIT_SAMPLE_DISABLE; + huart3.Init.ClockPrescaler = UART_PRESCALER_DIV1; + huart3.AdvancedInit.AdvFeatureInit = UART_ADVFEATURE_NO_INIT; + if (HAL_UART_Init(&huart3) != HAL_OK) + { + Error_Handler(); + } + if (HAL_UARTEx_SetTxFifoThreshold(&huart3, UART_TXFIFO_THRESHOLD_1_8) + != HAL_OK) + { + Error_Handler(); + } + if (HAL_UARTEx_SetRxFifoThreshold(&huart3, UART_RXFIFO_THRESHOLD_1_8) + != HAL_OK) + { + Error_Handler(); + } + if (HAL_UARTEx_DisableFifoMode(&huart3) != HAL_OK) + { + Error_Handler(); + } + /* USER CODE BEGIN USART3_Init 2 */ + + /* USER CODE END USART3_Init 2 */ +} + +/** + * @brief SAI1 Initialization Function + * @param None + * @retval None + */ +static void MX_SAI1_Init(void) +{ + /* USER CODE BEGIN SAI1_Init 0 */ + + /* USER CODE END SAI1_Init 0 */ + + /* USER CODE BEGIN SAI1_Init 1 */ + + /* USER CODE END SAI1_Init 1 */ + hsai_BlockA1.Instance = SAI1_Block_A; + hsai_BlockA1.Init.AudioMode = SAI_MODESLAVE_RX; + hsai_BlockA1.Init.Synchro = SAI_ASYNCHRONOUS; + hsai_BlockA1.Init.OutputDrive = SAI_OUTPUTDRIVE_DISABLE; + hsai_BlockA1.Init.FIFOThreshold = SAI_FIFOTHRESHOLD_EMPTY; + hsai_BlockA1.Init.SynchroExt = SAI_SYNCEXT_DISABLE; + hsai_BlockA1.Init.MonoStereoMode = SAI_STEREOMODE; + hsai_BlockA1.Init.CompandingMode = SAI_NOCOMPANDING; + hsai_BlockA1.Init.TriState = SAI_OUTPUT_NOTRELEASED; + if (HAL_SAI_InitProtocol(&hsai_BlockA1, SAI_I2S_STANDARD, + SAI_PROTOCOL_DATASIZE_16BIT, 2) != HAL_OK) + { + Error_Handler(); + } + /* USER CODE BEGIN SAI1_Init 2 */ + + /* USER CODE END SAI1_Init 2 */ +} + +/** + * @brief TIM16 Initialization Function + * @param None + * @retval None + */ +static void MX_TIM16_Init(void) +{ + /* USER CODE BEGIN TIM16_Init 0 */ + + /* USER CODE END TIM16_Init 0 */ + + /* USER CODE BEGIN TIM16_Init 1 */ + + /* USER CODE END TIM16_Init 1 */ + htim16.Instance = TIM16; + htim16.Init.Prescaler = 120-1; + htim16.Init.CounterMode = TIM_COUNTERMODE_UP; + htim16.Init.Period = 65535; + htim16.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1; + htim16.Init.RepetitionCounter = 0; + htim16.Init.AutoReloadPreload = TIM_AUTORELOAD_PRELOAD_DISABLE; + if (HAL_TIM_Base_Init(&htim16) != HAL_OK) + { + Error_Handler(); + } + /* USER CODE BEGIN TIM16_Init 2 */ + HAL_TIM_Base_MspInit(&htim16); + /* USER CODE END TIM16_Init 2 */ +} + +/** + * @brief USB_OTG_FS Initialization Function + * @param None + * @retval None + */ +static void MX_USB_OTG_FS_PCD_Init(void) +{ + /* USER CODE BEGIN USB_OTG_FS_Init 0 */ + + /* USER CODE END USB_OTG_FS_Init 0 */ + + /* USER CODE BEGIN USB_OTG_FS_Init 1 */ + + /* USER CODE END USB_OTG_FS_Init 1 */ + hpcd_USB_OTG_FS.Instance = USB_OTG_FS; + hpcd_USB_OTG_FS.Init.dev_endpoints = 6; + hpcd_USB_OTG_FS.Init.speed = PCD_SPEED_FULL; + hpcd_USB_OTG_FS.Init.phy_itface = PCD_PHY_EMBEDDED; + hpcd_USB_OTG_FS.Init.Sof_enable = ENABLE; + hpcd_USB_OTG_FS.Init.low_power_enable = DISABLE; + hpcd_USB_OTG_FS.Init.lpm_enable = DISABLE; + hpcd_USB_OTG_FS.Init.battery_charging_enable = ENABLE; + hpcd_USB_OTG_FS.Init.use_dedicated_ep1 = DISABLE; + hpcd_USB_OTG_FS.Init.vbus_sensing_enable = ENABLE; + if (HAL_PCD_Init(&hpcd_USB_OTG_FS) != HAL_OK) + { + Error_Handler(); + } + /* USER CODE BEGIN USB_OTG_FS_Init 2 */ + + /* USER CODE END USB_OTG_FS_Init 2 */ +} + +/** + * Enable DMA controller clock + */ +static void MX_DMA_Init(void) +{ + /* DMA controller clock enable */ + __HAL_RCC_DMAMUX1_CLK_ENABLE(); + __HAL_RCC_DMA1_CLK_ENABLE(); + + /* DMA interrupt init */ + /* DMA1_Channel1_IRQn interrupt configuration */ + HAL_NVIC_SetPriority(DMA1_Channel1_IRQn, 0, 0); + HAL_NVIC_EnableIRQ(DMA1_Channel1_IRQn); +} + +/** + * @brief GPIO Initialization Function + * @param None + * @retval None + */ +static void MX_GPIO_Init(void) +{ + GPIO_InitTypeDef GPIO_InitStruct = {0}; + /* USER CODE BEGIN MX_GPIO_Init_1 */ + /* USER CODE END MX_GPIO_Init_1 */ + + /* GPIO Ports Clock Enable */ + __HAL_RCC_GPIOE_CLK_ENABLE(); + __HAL_RCC_GPIOC_CLK_ENABLE(); + __HAL_RCC_GPIOH_CLK_ENABLE(); + __HAL_RCC_GPIOF_CLK_ENABLE(); + __HAL_RCC_GPIOB_CLK_ENABLE(); + __HAL_RCC_GPIOD_CLK_ENABLE(); + __HAL_RCC_GPIOG_CLK_ENABLE(); + HAL_PWREx_EnableVddIO2(); + __HAL_RCC_GPIOA_CLK_ENABLE(); + + /*Configure GPIO pin Output Level */ + HAL_GPIO_WritePin(timestamp_GPIO_Port, timestamp_Pin, GPIO_PIN_SET); + + /*Configure GPIO pin Output Level */ + HAL_GPIO_WritePin(Processing_GPIO_Port, Processing_Pin, GPIO_PIN_RESET); + + /*Configure GPIO pin Output Level */ + HAL_GPIO_WritePin(GPIOB, LD3_Pin|LD2_Pin, GPIO_PIN_RESET); + + /*Configure GPIO pin Output Level */ + HAL_GPIO_WritePin(USB_PowerSwitchOn_GPIO_Port, USB_PowerSwitchOn_Pin, + GPIO_PIN_RESET); + + /*Configure GPIO pin Output Level */ + HAL_GPIO_WritePin(WW_DETECTED_GPIO_Port, WW_DETECTED_Pin, GPIO_PIN_SET); + + /*Configure GPIO pin : B1_Pin */ + GPIO_InitStruct.Pin = B1_Pin; + GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING; + GPIO_InitStruct.Pull = GPIO_NOPULL; + HAL_GPIO_Init(B1_GPIO_Port, &GPIO_InitStruct); + + /*Configure GPIO pin : timestamp_Pin */ + GPIO_InitStruct.Pin = timestamp_Pin; + GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; + GPIO_InitStruct.Pull = GPIO_NOPULL; + GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_MEDIUM; + HAL_GPIO_Init(timestamp_GPIO_Port, &GPIO_InitStruct); + + /*Configure GPIO pin : Processing_Pin */ + GPIO_InitStruct.Pin = Processing_Pin; + GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; + GPIO_InitStruct.Pull = GPIO_NOPULL; + GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_MEDIUM; + HAL_GPIO_Init(Processing_GPIO_Port, &GPIO_InitStruct); + + /*Configure GPIO pins : LD3_Pin LD2_Pin WW_DETECTED_Pin */ + GPIO_InitStruct.Pin = LD3_Pin|LD2_Pin|WW_DETECTED_Pin; + GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; + GPIO_InitStruct.Pull = GPIO_NOPULL; + GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW; + HAL_GPIO_Init(GPIOB, &GPIO_InitStruct); + + /*Configure GPIO pin : USB_OverCurrent_Pin */ + GPIO_InitStruct.Pin = USB_OverCurrent_Pin; + GPIO_InitStruct.Mode = GPIO_MODE_INPUT; + GPIO_InitStruct.Pull = GPIO_NOPULL; + HAL_GPIO_Init(USB_OverCurrent_GPIO_Port, &GPIO_InitStruct); + + /*Configure GPIO pin : USB_PowerSwitchOn_Pin */ + GPIO_InitStruct.Pin = USB_PowerSwitchOn_Pin; + GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; + GPIO_InitStruct.Pull = GPIO_NOPULL; + GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW; + HAL_GPIO_Init(USB_PowerSwitchOn_GPIO_Port, &GPIO_InitStruct); + + /* USER CODE BEGIN MX_GPIO_Init_2 */ + /* USER CODE END MX_GPIO_Init_2 */ +} + +/* USER CODE BEGIN 4 */ + +/* USER CODE END 4 */ + +// interrupt request handler for the I2S DMA +void HAL_SAI_RxCpltCallback(SAI_HandleTypeDef *hsai) +{ + if (g_i2s_state == FileCapture) + ee_process_chunk_and_cont_capture(hsai); + else if( g_i2s_state == Streaming) + th_process_chunk_and_cont_streaming(hsai); + else if( g_i2s_state == Stopping) + { + printf("Streaming stopped\r\n"); + g_i2s_state = Idle; + } +} + +/** + * @brief This function is executed in case of error occurrence. + * @retval None + */ +void Error_Handler(void) +{ + /* USER CODE BEGIN Error_Handler_Debug */ + /* User can add his own implementation to report the HAL error + return state */ + __disable_irq(); + while (1) + { + } +/* USER CODE END Error_Handler_Debug */ +} + +#ifdef USE_FULL_ASSERT +/** + * @brief Reports the name of the source file and the source line number + * where the assert_param error has occurred. + * @param file: pointer to the source file name + * @param line: assert_param error line source number + * @retval None + */ +void assert_failed(uint8_t *file, uint32_t line) +{ + /* USER CODE BEGIN 6 */ + /* User can add his own implementation to report the file name and + line number, ex: printf("Wrong parameters value: file %s on line %d\r\n", + file, line) */ + /* USER CODE END 6 */ +} +#endif /* USE_FULL_ASSERT */ diff --git a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/sww_ref_l4r5zi Debug.launch b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/sww_ref_l4r5zi Debug.launch index f013a7e5..8a56a585 100644 --- a/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/sww_ref_l4r5zi Debug.launch +++ b/benchmark/reference_submissions/streaming_wakeword/sww_ref_l4r5zi/sww_ref_l4r5zi Debug.launch @@ -2,13 +2,16 @@ + + + @@ -37,7 +40,7 @@ - + diff --git a/benchmark/runner/README.md b/benchmark/runner/README.md index ffe879f5..bf756e29 100644 --- a/benchmark/runner/README.md +++ b/benchmark/runner/README.md @@ -83,7 +83,7 @@ The dataset path is the location of the dataset files. If you have used the EEM * Performance: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_performance.yaml --device_list=devices_ad.yaml --mode=p` * Accuracy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_accuracy.yaml --device_list=devices_ad.yaml --mode=a` * Streaming Wakeword - * SWW measures performance, accuracy, and accuracy in one run. It should be run in energy mode. + * SWW measures performance, accuracy, and energy in one run. It should be run in energy mode. * Energy: `python main.py --dataset_path=/path/to/datasets/ --test_script=tests_energy.yaml --device_list=devices_sww.yaml --mode=e` diff --git a/benchmark/runner/devices_sww.yaml b/benchmark/runner/devices_sww.yaml index 7bd18fdb..53acf5fc 100644 --- a/benchmark/runner/devices_sww.yaml +++ b/benchmark/runner/devices_sww.yaml @@ -19,8 +19,8 @@ type: power baud: 3686400 echo: False - preference: 2 # set to higher preference thatn js220 to use lpm01a - voltage: 1.8 # <-- Voltage for DUT + preference: 1 # set to higher preference thatn js220 to use lpm01a + voltage: 3.3 # <-- Voltage for DUT usb: 0x0483: 0x5740 # Ensures detection by VID/PID (1155 / 22336) - name: l4r5zi @@ -34,7 +34,7 @@ - name: js220 type: power interface: direct_usb - preference: 1 # set to higher preference thatn lpm01a to use js220 + preference: 2 # set to higher preference thatn lpm01a to use js220 raw_sampling_rate: 1000000 virtual_sampling_rate: 1000 usb: diff --git a/benchmark/runner/sww_data_dir/marvin_617de221_0.wav b/benchmark/runner/sww_data_dir/marvin_617de221_0.wav new file mode 100755 index 00000000..aa288583 Binary files /dev/null and b/benchmark/runner/sww_data_dir/marvin_617de221_0.wav differ diff --git a/benchmark/runner/sww_data_dir/med_wav_2m_8p.wav b/benchmark/runner/sww_data_dir/med_wav_2m_8p.wav new file mode 100644 index 00000000..2344acac Binary files /dev/null and b/benchmark/runner/sww_data_dir/med_wav_2m_8p.wav differ