Skip to content

adimyth/apache-streaming-systems-playground

Repository files navigation

Overview

Apache Kafka is a distributed event store & real-time streaming platform. It is an event store and message broker.

Kafka Use Cases

Apache Flink is a distributed stream processing framework for real-time data processing. It processes data as it flows through, rather than waiting to collect all data first.

Other stream processing frameworks are Apache Kafka Streams (primarly works in Java), Apache Spark Streaming, etc.

Apache Flink allows for:

  • Stateful Processing: Remembers information across multiple events.
  • Stateless Processing: Each event is processed independently, no memory of previous events
  • Windowing: Group events into time-based batches for aggregation. Supports tumbling (non-overlapping) and sliding (overlapping) windows.

Stream Processing vs Batch Processing

Batch Processing:

  • Processes large volumes of data at rest.
  • High latency (minutes to hours).
  • Apache Spark is a batch processing framework.
  • Examples: ETL jobs, data warehousing.

Stream Processing:

  • Processes data as it arrives.
  • Low latency (milliseconds to seconds).
  • Apache Flink is a stream processing framework.
  • Examples: Real-time analytics, fraud detection.

How Apache Kafka and Apache Flink work together

Kafka and Flink form a powerful combination for real-time analytics:

  • Kafka acts as the data ingestion and storage layer, collecting events from various sources
  • Flink consumes data from Kafka topics, processes it in real-time, and can write results back to Kafka or other systems
  • This creates a complete streaming pipeline: Ingestion → Processing → Output

E-commerce Real-Time Analytics with Kafka & Flink

About

Imagine an e-commerce store like Amazon. Every second thousands of users can:

  • Browse products
  • Search for products
  • Add products to their cart
  • Purchase products

The system needs to:

  • Track user behavior in real-time (page views, items added to cart, etc.)
  • Detect suspicious activity (fraud, bot traffic, unusual purchase patterns)
  • Real-time analytics (top-selling products, user engagement metrics)
  • AI-powered recommendations (personalized product suggestions)

Setup

Kafka:

  • Handle millions of events per second
  • Store events reliably
  • Multiple systems can consume the same event
  • Decouple producers and consumers

Flink:

  • Process events with sub-second latency
  • Maintains state across events
  • Exact once processing. Example: If we purchase a product, we want to process the event exactly once.

Running the project

1️⃣ Run the Kafka & Flink Services

docker compose up -d

Create Kafka topics

./setup_kafka.sh

2️⃣ Produce Events

# Install dependencies
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Produce events
python data-generator/producer.py

This will produce events to the Kafka topic user-events.

3️⃣ Run the Flink Job Install dependencies

docker exec flink-jobmanager pip install -r /opt/requirements.txt
docker exec flink-taskmanager pip install -r /opt/requirements.txt

Submit the jobs

docker exec flink-jobmanager flink run -py /opt/flink/jobs/product-analytics.py

docker exec flink-jobmanager flink run -py /opt/flink/jobs/fraud-detection.py

4️⃣ View the results

Flink Jobs

Product Analytics Job

Track how each product is performing in real-time

  • How many people viewed the product?
  • How many added it to cart?
  • How many actually bought it?
  • What's the conversion rate?
  • How much revenue did it generate?

Note

Stateless processing: Each window is independent. Windowing: We are grouping events by product_id.

Example Result for a single window (1 min duration):

flink-taskmanager  | 1> {"product_id": "Apple Watch", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 90, "cart_additions": 39, "purchases": 7, "revenue": 3199.8399999999997, "conversion_rate": 0.07777777777777778}
flink-taskmanager  | 2> {"product_id": "shoes-001", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 226, "cart_additions": 59, "purchases": 20, "revenue": 97828.18623999583, "conversion_rate": 0.08849557522123894}
flink-taskmanager  | 1> {"product_id": "Airpods Max", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 93, "cart_additions": 23, "purchases": 8, "revenue": 6399.84, "conversion_rate": 0.08602150537634409}
flink-taskmanager  | 1> {"product_id": "tablet-001", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 95, "cart_additions": 20, "purchases": 6, "revenue": 7799.870000000001, "conversion_rate": 0.06315789473684211}
flink-taskmanager  | 1> {"product_id": "Apple Watch Pro", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 112, "cart_additions": 16, "purchases": 6, "revenue": 51953.62847056468, "conversion_rate": 0.05357142857142857}
flink-taskmanager  | 1> {"product_id": "laptop-002", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 101, "cart_additions": 22, "purchases": 8, "revenue": 45999.770000000004, "conversion_rate": 0.07920792079207921}
flink-taskmanager  | 1> {"product_id": "book-001", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 391, "cart_additions": 69, "purchases": 29, "revenue": 9394.419177383934, "conversion_rate": 0.0741687979539642}
flink-taskmanager  | 1> {"product_id": "Airpods Prod", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 93, "cart_additions": 31, "purchases": 11, "revenue": 4999.75, "conversion_rate": 0.11827956989247312}
flink-taskmanager  | 1> {"product_id": "phone-002", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 91, "cart_additions": 28, "purchases": 6, "revenue": 361574.586116209, "conversion_rate": 0.06593406593406594}
flink-taskmanager  | 1> {"product_id": "phone-002", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 1, "cart_additions": 1, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 1> {"product_id": "book-001", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 4, "cart_additions": 1, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 1> {"product_id": "tablet-001", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 1, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 1> {"product_id": "laptop-002", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 4, "cart_additions": 1, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 1> {"product_id": "Airpods Max", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 2, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 1> {"product_id": "Airpods Prod", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 2, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 1> {"product_id": "Apple Watch Pro", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 0, "cart_additions": 2, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 2> {"product_id": "phone-001", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 106, "cart_additions": 19, "purchases": 5, "revenue": 8099.91, "conversion_rate": 0.04716981132075472}
flink-taskmanager  | 2> {"product_id": "tablet-002", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 92, "cart_additions": 27, "purchases": 8, "revenue": 17999.85, "conversion_rate": 0.08695652173913043}
flink-taskmanager  | 2> {"product_id": "jeans-001", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 248, "cart_additions": 71, "purchases": 13, "revenue": 2239.72, "conversion_rate": 0.05241935483870968}
flink-taskmanager  | 2> {"product_id": "laptop-001", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 102, "cart_additions": 19, "purchases": 8, "revenue": 20799.84, "conversion_rate": 0.0784313725490196}
flink-taskmanager  | 2> {"product_id": "book-002", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 342, "cart_additions": 96, "purchases": 36, "revenue": 8438.888277539389, "conversion_rate": 0.10526315789473684}
flink-taskmanager  | 2> {"product_id": "shirt-001", "window_start": "2025-09-21T13:36:00", "window_end": "2025-09-21T13:37:00", "view_count": 250, "cart_additions": 56, "purchases": 15, "revenue": 11336.492395660473, "conversion_rate": 0.06}
flink-taskmanager  | 2> {"product_id": "shoes-001", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 2, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 2> {"product_id": "book-002", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 4, "cart_additions": 1, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 2> {"product_id": "laptop-001", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 1, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 2> {"product_id": "shirt-001", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 3, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 2> {"product_id": "tablet-002", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 1, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 2> {"product_id": "jeans-001", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 3, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}
flink-taskmanager  | 2> {"product_id": "phone-001", "window_start": "2025-09-21T13:37:00", "window_end": "2025-09-21T13:38:00", "view_count": 1, "cart_additions": 0, "purchases": 0, "revenue": 0.0, "conversion_rate": 0.0}

Similarly for the second window, I got per product the following summary

flink-taskmanager  | 1> {"product_id": "Airpods Max", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 52, "cart_additions": 14, "purchases": 2, "revenue": 1599.96, "conversion_rate": 0.038461538461538464}
flink-taskmanager  | 2> {"product_id": "shoes-001", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 104, "cart_additions": 38, "purchases": 7, "revenue": 11263.260781608495, "conversion_rate": 0.0673076923076923}
flink-taskmanager  | 1> {"product_id": "laptop-002", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 42, "cart_additions": 10, "purchases": 3, "revenue": 15999.92, "conversion_rate": 0.07142857142857142}
flink-taskmanager  | 1> {"product_id": "Apple Watch", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 52, "cart_additions": 13, "purchases": 1, "revenue": 399.98, "conversion_rate": 0.019230769230769232}
flink-taskmanager  | 1> {"product_id": "tablet-001", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 63, "cart_additions": 7, "purchases": 2, "revenue": 2999.95, "conversion_rate": 0.031746031746031744}
flink-taskmanager  | 1> {"product_id": "book-001", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 248, "cart_additions": 60, "purchases": 19, "revenue": 19066.57753044404, "conversion_rate": 0.07661290322580645}
flink-taskmanager  | 1> {"product_id": "Apple Watch Pro", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 63, "cart_additions": 13, "purchases": 3, "revenue": 3199.92, "conversion_rate": 0.047619047619047616}
flink-taskmanager  | 1> {"product_id": "Airpods Prod", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 67, "cart_additions": 12, "purchases": 2, "revenue": 1199.94, "conversion_rate": 0.029850746268656716}
flink-taskmanager  | 1> {"product_id": "phone-002", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 41, "cart_additions": 15, "purchases": 4, "revenue": 564574.1042868246, "conversion_rate": 0.0975609756097561}
flink-taskmanager  | 2> {"product_id": "phone-001", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 50, "cart_additions": 7, "purchases": 1, "revenue": 1799.98, "conversion_rate": 0.02}
flink-taskmanager  | 2> {"product_id": "book-002", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 275, "cart_additions": 70, "purchases": 15, "revenue": 9091.994571621532, "conversion_rate": 0.05454545454545454}
flink-taskmanager  | 2> {"product_id": "jeans-001", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 121, "cart_additions": 35, "purchases": 8, "revenue": 959.88, "conversion_rate": 0.06611570247933884}
flink-taskmanager  | 2> {"product_id": "laptop-001", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 47, "cart_additions": 16, "purchases": 7, "revenue": 20799.84, "conversion_rate": 0.14893617021276595}
flink-taskmanager  | 2> {"product_id": "shirt-001", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 117, "cart_additions": 32, "purchases": 8, "revenue": 599.8000000000001, "conversion_rate": 0.06837606837606838}
flink-taskmanager  | 2> {"product_id": "tablet-002", "window_start": "2025-09-21T13:45:00", "window_end": "2025-09-21T13:46:00", "view_count": 39, "cart_additions": 8, "purchases": 2, "revenue": 4799.96, "conversion_rate": 0.05128205128205128}

You will see new records every minute ✅

Fraud Detection Job

Detect suspicious user behavior in real-time to prevent fraud

  • Identify users making too many purchases quickly
  • Flag unusually high spending amounts
  • Detect rapid successive high-value purchases
  • Alert immediately when patterns are detected

Note

Stateful processing: We need to remember user behaviour across multiple events. No windowing as fraud detection needs to be done immediately.

Exmaple Result:

flink-taskmanager  | 2> {"user_id": "user_056", "alert_type": "high_spending", "reason": "$5399.94 spent in 5 minutes", "risk_score": 0.539994, "timestamp": 1758461139530}
flink-taskmanager  | 2> {"user_id": "user_056", "alert_type": "successive_high_value", "reason": "Multiple high-value purchases", "risk_score": 0.9, "timestamp": 1758461139530}
flink-taskmanager  | 1> {"user_id": "user_018", "alert_type": "high_spending", "reason": "$442650.94 spent in 5 minutes", "risk_score": 1.0, "timestamp": 1758461143339}
flink-taskmanager  | 1> {"user_id": "user_018", "alert_type": "high_spending", "reason": "$455550.13 spent in 5 minutes", "risk_score": 1.0, "timestamp": 1758461145632}
flink-taskmanager  | 1> {"user_id": "user_020", "alert_type": "high_spending", "reason": "$8599.95 spent in 5 minutes", "risk_score": 0.8599950000000001, "timestamp": 1758461155209}
flink-taskmanager  | 1> {"user_id": "user_018", "alert_type": "high_spending", "reason": "$612941.33 spent in 5 minutes", "risk_score": 1.0, "timestamp": 1758461156239}
flink-taskmanager  | 1> {"user_id": "user_018", "alert_type": "successive_high_value", "reason": "Multiple high-value purchases", "risk_score": 0.9, "timestamp": 1758461156239}

Benefits

Scalability:

  1. Adding more TaskManagers increases parallelism.
  2. Increase Kafka partitions to improve throughput.

Fault Tolerance:

  1. Checkpointing: Flink saves its progress every 5 seconds. If a task fails, Flink can recover from the checkpoint.
  2. Automatic recovery: Flink can recover from task failures.
  3. Exactly-once processing: Flink can process events exactly once.

Low Latency:

  1. Results available within 1 minute + watermark delay
  2. Real-time business insights

TODO

  • Make File Sink Work
  • Add more flink job examples. Ex - ML prediction, etc

About

Realtime Stream Processing using Apache Kafka & Apache Flink

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors