Skip to content

daveray-net/blaze-stream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

project blaze-stream

Experimental financial market data pipeline


  • Not Financial Advice! This is Experimental Software. Use At Your Own Risk!

  • Parser for market data processing using Apache Flink
  • Vibe coded with help from Google Gemini
  • This is a work in progress
  • docker-compose managaged network,
  • react typescript frontend, spring-boot backend, flink-jobmanager, flink-taskmanager, dynamodb
  • rest api middle tier and service api consumer


🛠️ Local Development & Testing Loop

This repository is optimized for "lean and mean" development. You do not need Java, Maven, or Node installed on your host machine; everything executes securely inside Docker environments.

1. Run the Unit & Integration Test Suites

To verify code serialization, data boundary handling (dropping file comments and headers), and aggregation math, fire up the isolated test profile:

docker compose run --rm backend-test

2. Clean Rebuild of All Services

To completely flush image layers and rebuild the production application binaries from a completely scratch state:

docker compose build --no-cache

3. Launch the Stack End-to-End

Spin up your distributed network—including Spring Boot, the Apache Flink JobManager, TaskManagers, Postgres, and DynamoDB Local:

docker compose up

📂 Project Structure


├── docker-compose.yml
│
├── backend
│   ├── Dockerfile
│   ├── pom.xml
│   ├── src
│       ├── main
│       │   ├── java
│       │   │   └── net
│       │   │       └── daveray
│       │   │           ├── BlazeStreamApplication.java
│       │   │           └── flink
│       │   │               │   └── FlinkConfig.java
│       │   │               ├── executor
│       │   │               │   ├── FlinkExecutor.java
│       │   │               │   └── FlinkJobRunner.java
│       │   │               ├── pipeline
│       │   │               │   ├── Orchestrator.java
│       │   │               │   └── StreamingJson.java
│       │   │               ├── pojo
│       │   │               │   └── CandlestickPojo.java
│       │   │               ├── provider
│       │   │               ├── FlinkSourceProvider.java
│       │   │               ├── FlinkSourceProviderFactory.java
│       │   │               ├── file
│       │   │               │   │   ├── FileBacktestSourceProvider.java
│       │   │               │   │   └── FileLineReaderSource.java
│       │   │               │   └── sigr
│       │   │               │       └── SignalRSourceProvider.java
│       │   │               └── transformer
│       │   │                   ├── JsonCandleWrapper.java
│       │   │                   ├── TimeSeriesAggregator.java
│       │   │                   └── Transformer.java
│       │   └── resources
│       │       └── application.yaml
│       └── test
│           ├── java
│           │   └── net
│           │       └── daveray
│           │           └── flink
│           │               ├── JsonCandleWrapperTest.java
│           │               ├── OrchestratorTest.java
│           │               └── TimeSeriesAggregatorTest.java
│           └── resources
│               └── logback-test.xml
│ 
├── data
│   └── NQ_futures_data.tsf
│
├── frontend
│   ├── Dockerfile
│   ├── index.html
│   ├── nginx.conf
│   ├── package.json
│   ├── src
│   │   ├── App.tsx
│   │   ├── main.tsx
│   │   ├── react
│   │   │   ├── jsx
│   │   │   └── jsx-runtime
│   │   └── StreamingData.tsx
│   ├── tsconfig.json
│   └── vite.config.ts
│
├── log4j-console.xml
├── README.md


🏗️ Architecture & Design Patterns

The backend code is designed with enterprise-grade architecture:

  • Pipeline Orchestrator Pattern: Implements decoupled, pluggable Transformer<I, O> interfaces. Operations can be sequentially chained together (Orchestrator.add(step)) within an isolated execution pipeline.
  • Spring IoC Factory Lookup: Features a dynamic FlinkSourceProviderFactory that queries Spring’s ApplicationContext at runtime. This allows you to toggle data sources (e.g., file-backtests vs. live web sockets) purely via configuration changes.
  • Cluster Serialization Safety: Refactored entirely into static inner functions to eliminate NotSerializableException faults across remote cluster nodes.


Flink Ingestion Engine 🚀

A high-performance, distributed stream processing backend built with Spring Boot and Apache Flink (v1.19). This engine orchestrates real-time financial market data streams, normalizes mixed historical time-series cadences, and prepares standard data payloads for automated trade strategy execution.

📊 Time series data processing

During backtesting integration, mixed historical data sources introduced a severe time-series anomaly:

  • Source A (yfinance) natively provides 2-minute candlestick bars.
  • Source B (ProjectX API retrieveBars()) natively outputs 1-minute candlestick bars.

When blended together, raw sequential processing caused downstream calculation errors due to intermediate, unfinalized 1-minute blocks leaking into consumers.

Streaming aggregation

This project implements a custom, look-back state machine using Flink's Managed Keyed State (ValueState).

  1. Dynamic Time-Delta Tracking: It calculates the physical Duration time-gap between the current bar and the previous bar dynamically, avoiding fragile hardcoded timestamp rules.
  2. Stateful Buffering: If a 1-minute bar arrives, it is safely buffered in state. When its companion bar arrives (delta < 2m), they are mathematically aggregated into a single, finalized 2-minute candlestick and immediately emitted.
  3. Standalone Pass-through: If a native 2-minute bar arrives (delta >= 2m), the previous buffered bar is flushed, and the native bar passes through untouched.


📈 Future Horizon Roadmap

  • SignalR Streaming Aggregator: Integrate ProjectX's live SignalR web socket stream and implement a 1-minute tumbling/sliding window function to process live ticks into uniform bars.
  • Execution Layer Gateway: Establish automated API endpoints to pass order signals directly to ProjectX broker placement endpoints.
  • Strategy Engine Port: Migrate the core backtesting mathematical strategy routines from Python over to native, stateful Flink operators for sub-millisecond execution matching.
  • Front End: build out with react to manage streaming market data and live trade order management.




Author: daveray@daveray.net
Date: 2026-05-05
License: Apache 2.0

  • Not Financial Advice! This is Experimental Software. Use At Your Own Risk!

   Copyright 2026 David Ray Reizes <daveray@daveray.net>

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


About

project blaze-stream - experimental financial market data pipeline

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors