TDAAD is a Python package for unsupervised anomaly detection in multivariate time series using Topological Data Analysis (TDA). Website and documentation: https://irt-systemx.github.io/tdaad/
It builds upon two powerful open-source libraries:
GUDHI for efficient and scalable computation of persistent homology and topological features,
scikit-learn for core machine learning utilities like
Pipelineand objects likeEllipticEnvelope.
TDAAD implements the methodology introduced in:
Chazal, F., Levrard, C., & Royer, M. (2024). Topological Analysis for Detecting Anomalies (TADA) in dependent sequences: application to Time Series. Journal of Machine Learning Research, 25(365), 1–49. https://www.jmlr.org/papers/v25/24-0853.html
- Unsupervised anomaly detection in multivariate time series
- Topological embedding using persistent homology
- Scikit-learn–style API (
fit,transform,score_samples) - Configurable embedding dimension, window size, and topological parameters
- Works with NumPy arrays or pandas DataFrames
Install from PyPI (recommended):
pip install tdaadOr install from source:
git clone https://github.com/IRT-SystemX/tdaad.git
cd tdaad
pip install .Requirements:
- Python ≥ 3.7
- See
requirements.txtfor full dependency list
Here’s a minimal example using TopologicalAnomalyDetector:
import numpy as np
from tdaad.anomaly_detectors import TopologicalAnomalyDetector
# Example multivariate time series with shape (n_samples, n_features)
X = np.random.randn(1000, 3)
# Initialize and fit the detector
detector = TopologicalAnomalyDetector(window_size=100, n_centers_by_dim=3)
detector.fit(X)
# Compute anomaly scores
scores = detector.score_samples(X)You can also use pandas.DataFrame instead of a NumPy array — column names will be preserved in the output.
For more advanced usage (e.g. custom embeddings, parameter tuning), see the examples folder or API documentation
- TDAAD is designed for multivariate time series (2D inputs) — univariate data is not supported.
- The core detection method relies on sliding-window embeddings and persistent homology to identify structural changes in the signal.
- The key parameters that impact results and runtime are:
window_sizecontrols the time resolution — larger windows capture slower anomalies, smaller ones detect more localized changes.n_centers_by_dimcontrols the number of reference shapes used per homology dimension (e.g. connected components in H0, loops in H1, ...). Increasing this improves sensitivity but adds computation time.tda_max_dimsets the maximum topological feature dimension computed (0 = connected components, 1 = loops, 2 = voids, ...). Higher values increase runtime and memory usage.
- Inputs can be
numpy.ndarrayorpandas.DataFrame. Column names are preserved in the output when using DataFrames.
⚙️ You can typically handle ~100 sensors and a few hundred time steps per window on a modern machine.
- Total complexity scales with:
$O(N × (w × p)^{(d+2)})$ where$w$ is the time resolution (orwindow_size, number of time steps per window),$p$ is the number of variables (features/sensors),$d$ is the maximum homology dimensiontda_max_dim, and$N$ is the total number of sliding windows. - So note that increasing max homology dimension
draises the exponent, causing exponential growth. The number of centersn_centers_by_dimused after the PH computation does not significantly affect the overall complexity.
To regenerate the documentation, rerun the following commands from the project root, adapting if necessary:
pip install -r docs/docs_requirements.txt -r requirements.txt
sphinx-apidoc -o docs/source/generated tdaad
sphinx-build -M html docs/source docs/build -W --keep-going
This work has been supported by the French government under the "France 2030” program, as part of the SystemX Technological Research Institute within the Confiance.ai project.
TDAAD is developed by IRT SystemX and supported by the European Trustworthy AI Association


