This project explores probabilistic and streaming algorithms for analyzing clickstream data under memory constraints. It compares approximate methods against exact results to study the trade-offs between efficiency, scalability, and accuracy.
- Bloom Filter
- Count-Min Sketch
- Flajolet–Martin
- DGIM sliding window approximation
Clickstream Data for Online Shopping
UCI Machine Learning Repository
Python, NumPy, pandas, matplotlib, seaborn, mmh3