Skip to content

HemantBK/Algorithmic-Trading-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stocex Logo

Stocex — AI-Powered Algorithmic Trading System

Real-time news sentiment analysis + time-series forecasting = actionable trade signals

Python FinBERT TimeGPT Streamlit License

ArchitecturePipelineML ModelsDashboardQuickstartAPIs


The Problem

Retail traders are overwhelmed. Thousands of news articles drop daily, markets move in milliseconds, and manual analysis can't keep up. By the time you've read the headline, the price has already moved.

The Solution

Stocex is an end-to-end algorithmic trading pipeline that automates the entire workflow — from ingesting financial news to generating BUY/SELL/HOLD signals — by combining NLP-based sentiment analysis with transformer-powered price forecasting.

One script. Zero manual intervention. Actionable signals in minutes.


System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                          DATA INGESTION LAYER                              │
│                                                                             │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────────────────────┐  │
│   │   NewsAPI     │    │   Yahoo       │    │  S&P 500 Constituents      │  │
│   │  (Headlines)  │    │  Finance      │    │  (Ticker Mapping)          │  │
│   │  100+ daily   │    │  (OHLCV)      │    │  500+ companies            │  │
│   └──────┬───────┘    └──────┬───────┘    └──────────────┬───────────────┘  │
│          │                   │                            │                  │
└──────────┼───────────────────┼────────────────────────────┼──────────────────┘
           │                   │                            │
           ▼                   ▼                            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         NLP PROCESSING LAYER                               │
│                                                                             │
│   ┌──────────────────┐    ┌──────────────────┐    ┌───────────────────┐    │
│   │    spaCy NER      │    │     FinBERT       │    │  Fuzzy Matcher   │    │
│   │  en_core_web_sm   │    │  Sentiment Model  │    │  Company→Ticker  │    │
│   │                   │    │                   │    │                  │    │
│   │  Extract ORG      │───▶│  Score: 0.0-1.0   │    │  "Apple"→AAPL   │    │
│   │  entities from    │    │  Label: pos/neg/  │    │  "Microsoft"→   │    │
│   │  headlines        │    │  neutral           │    │  MSFT            │    │
│   └───────────────────┘    └──────────┬────────┘    └───────────────────┘    │
│                                       │                                     │
└───────────────────────────────────────┼─────────────────────────────────────┘
                                        │
                                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       FORECASTING ENGINE                                    │
│                                                                             │
│   ┌──────────────────────────────────────────────────────────────────┐      │
│   │                    TimeGPT-1 (Nixtla)                           │      │
│   │                                                                  │      │
│   │   Input: 30-day historical OHLCV data                           │      │
│   │   Output: 7-12 bar price forecast + confidence intervals        │      │
│   │   Confidence Bands: 80% | 90% | 95%                            │      │
│   │   Frequency: Daily / 5-min intraday                             │      │
│   └──────────────────────────────────────────────────────────────────┘      │
│                                                                             │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      SIGNAL GENERATION ENGINE                               │
│                                                                             │
│   ┌────────────────────────────────────────────────────────┐                │
│   │                                                        │                │
│   │   Sentiment Score ≥ 0.98  AND  Forecast ↑  →  BUY     │                │
│   │   Sentiment Score ≤ 0.02  AND  Forecast ↓  →  SELL    │                │
│   │   Conflicting / Neutral signals            →  HOLD    │                │
│   │                                                        │                │
│   └────────────────────────────────────────────────────────┘                │
│                                                                             │
│   Risk Controls:                                                            │
│   • High-confidence threshold (0.98) filters weak signals                   │
│   • Volatility-based position sizing                                        │
│   • Forecast confidence intervals as stop-loss boundaries                   │
│   • Minimum news mention count requirement                                  │
│                                                                             │
└─────────────────────────────────────┬───────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       PRESENTATION LAYER                                    │
│                                                                             │
│   ┌─────────────────┐  ┌────────────────┐  ┌──────────────────────────┐    │
│   │   Streamlit      │  │  Power BI       │  │  CSV Data Exports       │    │
│   │   Dashboard      │  │  Analytics      │  │                          │    │
│   │                  │  │                 │  │  • sentiment_summary     │    │
│   │  • Sentiment     │  │  • Advanced     │  │  • news_headlines        │    │
│   │  • News Feed     │  │    drill-down   │  │  • price_history         │    │
│   │  • Forecasts     │  │  • Cross-       │  │  • volatility_data       │    │
│   │  • AI Q&A        │  │    filtering    │  │  • forecast_results      │    │
│   │  • Dark Mode     │  │                 │  │                          │    │
│   └─────────────────┘  └────────────────┘  └──────────────────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Pipeline Deep Dive

The system executes a 7-stage sequential pipeline — each stage feeds into the next:

  NewsAPI          spaCy NER        FinBERT         yFinance        TimeGPT
    │                 │                │                │               │
    ▼                 ▼                ▼                ▼               ▼
┌────────┐     ┌───────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Stage 1 │────▶│  Stage 2  │───▶│ Stage 3  │───▶│ Stage 4  │───▶│ Stage 5  │
│ Fetch   │     │ Extract   │    │ Score    │    │ Get      │    │ Forecast │
│ News    │     │ Companies │    │ Sentiment│    │ Prices   │    │ Prices   │
└────────┘     └───────────┘    └──────────┘    └──────────┘    └────┬─────┘
                                                                     │
                                      ┌──────────────────────────────┘
                                      ▼
                               ┌──────────┐    ┌──────────┐
                               │ Stage 6  │───▶│ Stage 7  │
                               │ Epoch    │    │ Generate │
                               │ Analysis │    │ Signals  │
                               └──────────┘    └──────────┘

Stage 1 — News Ingestion

Pulls 100+ headlines daily from 15 premium financial sources via NewsAPI:

  • Bloomberg, WSJ, CNBC, Reuters, MarketWatch, Seeking Alpha, Barron's, Forbes, Fortune, TechCrunch, Business Insider, Yahoo Finance, Investopedia, The Motley Fool, CNN Business

Queries cover: earnings, mergers, acquisitions, IPOs, Fed decisions, guidance, revenue, buybacks, analyst ratings, and more.

Stage 2 — Named Entity Recognition

Uses spaCy's en_core_web_sm model to extract organization entities (ORG labels) from each headline. Extracted names are fuzzy-matched against the full S&P 500 constituent list to resolve company names to ticker symbols (e.g., "Apple" -> AAPL).

Stage 3 — Sentiment Scoring with FinBERT

Each headline is tokenized and passed through FinBERT (yiyanghkust/finbert-tone), a BERT model fine-tuned on 10,000+ financial texts. Outputs a 3-class probability distribution:

Input:  "Apple reports record Q4 earnings, beats estimates"
Output: { negative: 0.02, neutral: 0.08, positive: 0.90 }
        → Label: POSITIVE | Confidence: 0.90

Sentiment is aggregated per ticker across all mentioning articles. Only tickers exceeding the confidence threshold are forwarded.

Stage 4 — Historical Price Data

Downloads 5-year daily and 30-day intraday (5-min) OHLCV data from Yahoo Finance for each qualifying ticker. Includes resampling logic to convert intraday bars to daily frequency when needed.

Stage 5 — Price Forecasting with TimeGPT

Sends preprocessed time series to Nixtla's TimeGPT-1 — a transformer-based foundation model trained on 100B+ data points. Returns forecasted values with multi-level confidence intervals:

Forecast Horizon: 7-12 bars
Confidence Bands: 80% (±1.28σ) | 90% (±1.64σ) | 95% (±1.96σ)

Stage 6 — Superposed Epoch Analysis

A statistical technique borrowed from geophysics. Detects price spike events (Z-score > 2σ) and overlays price windows around each event to reveal average market behavior patterns before and after sentiment-driven events.

Stage 7 — Signal Generation

Combines sentiment direction + forecast trajectory into actionable signals:

Sentiment Forecast Signal
Positive (score >= 0.98) Price forecast UP BUY
Negative (score <= 0.02) Price forecast DOWN SELL
Neutral / Conflicting Any HOLD

ML Models

FinBERT — Financial Sentiment Analysis

Property Detail
Model yiyanghkust/finbert-tone
Base Architecture BERT (Bidirectional Encoder Representations from Transformers)
Training Data Financial news, analyst reports, earnings calls
Input Raw text, max 512 tokens
Output 3-class softmax: [negative, neutral, positive]
Framework HuggingFace Transformers + PyTorch
Why FinBERT over VADER/TextBlob? Domain-specific — understands that "short squeeze" is bearish context, not just the word "short"

TimeGPT-1 — Time Series Forecasting

Property Detail
Provider Nixtla
Architecture Transformer-based foundation model
Training Data 100B+ time series data points across industries
Input Historical OHLCV series (JSON)
Output Point forecasts + confidence intervals
Advantage Zero-shot forecasting — no fine-tuning required

spaCy NER — Entity Extraction

Property Detail
Model en_core_web_sm
Task Named Entity Recognition
Target Labels ORG (Organizations)
Post-Processing Fuzzy match against S&P 500 constituents list

Dashboard

Interactive Streamlit web dashboard with 5 tabs:

Tab Description
Sentiment Overview Histogram of sentiment scores, top mentioned tickers bar chart, volatility vs. return scatter plot, keyword frequency analysis
News Headlines Filterable news feed with date/ticker selectors
Historical Prices Multi-ticker price charts with 5-year lookback
Forecast 1-hour simulated price forecast with confidence bands
AI Q&A Natural language query interface — ask "highest sentiment?", "most volatile stock?", "return of AAPL?"

Features: Dark/light mode toggle, sidebar filters (sentiment type, min mentions, score range), fully responsive layout.

streamlit run src/app.py

Tech Stack

Layer Technology Purpose
NLP HuggingFace Transformers, PyTorch FinBERT sentiment analysis
NER spaCy (en_core_web_sm) Company name extraction
Forecasting Nixtla TimeGPT-1 API Price prediction
Market Data yfinance Historical & intraday OHLCV
News NewsAPI Financial headline ingestion
Dashboard Streamlit, Altair Interactive visualization
Analytics Power BI Advanced drill-down reporting
Data Processing Pandas, NumPy ETL and analysis
Visualization Matplotlib, Altair Charts and plots
Execution Google Colab Cloud-based pipeline runs

Quickstart

Prerequisites

Installation

# Clone the repository
git clone https://github.com/INFO-698-InfoSci-Capstone/algorithmic-trading-ai.git
cd algorithmic-trading-ai

# Install dependencies
pip install -r requirements.txt

# Download spaCy language model
python -m spacy download en_core_web_sm

Configuration

Set your API keys in src/Stocex.py:

NEWSAPI_KEY = "your_newsapi_key_here"
# TimeGPT key is set in the forecast_with_timegpt() function headers

Run the Pipeline

# Execute the full trading pipeline
python src/Stocex.py

Launch the Dashboard

# Start the interactive dashboard
streamlit run src/app.py

Project Structure

algorithmic-trading-ai/
│
├── src/
│   ├── Stocex.py                  # Core trading pipeline (7-stage engine)
│   └── app.py                     # Streamlit dashboard application
│
├── Notebooks/
│   ├── Capstone_Project_Trading_AI.ipynb   # Full implementation notebook
│   └── Capstone_Project_Trading_AI.pdf     # Notebook export
│
├── analysis/
│   ├── data/
│   │   ├── sentiment_summary.csv           # Per-ticker sentiment scores
│   │   ├── news_headlines.csv              # Daily ingested headlines
│   │   ├── stock_volatility_data.csv       # Volatility metrics
│   │   ├── scatter_volatility_return.csv   # Risk-return profiles
│   │   ├── combined_price_data.csv         # 5-year price history
│   │   ├── newsapi_last_30_days.csv        # 30-day news archive
│   │   ├── news_keyword_frequency.csv      # Keyword extraction results
│   │   └── historical_price_data/          # Per-ticker CSV files
│   └── logs/
│       └── log.md                          # Development logs
│
├── Final Report/
│   ├── Stocex Dashboard.pbix              # Power BI dashboard
│   └── Stocex Dashboard.pdf               # Dashboard export
│
├── Visualization/
│   └── Stocex Visualization.pdf           # Visual analysis report
│
├── Poster/
│   └── stocex_logo.png                    # Project logo
│
├── requirements.txt                        # Python dependencies
├── CONDUCT.md                              # Code of conduct
├── LICENSE                                 # GNU GPLv3
└── README.md                               # You are here

API Integrations

API Endpoint Auth Rate Limit Data
NewsAPI newsapi.org/v2/everything API Key 100 req/day (free) Headlines, descriptions, sources
TimeGPT api.nixtla.io/forecast Bearer Token Varies by plan Point forecasts, confidence intervals
Yahoo Finance via yfinance None Unofficial OHLCV, intraday, historical
S&P 500 List GitHub CSV None Unlimited Company name to ticker mapping

Data Flow

                    ┌─────────────┐
                    │   NewsAPI    │
                    │  100+ daily  │
                    │  headlines   │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  spaCy NER  │──── Extract company names
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │   FinBERT   │──── Score sentiment (0.0 → 1.0)
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐    │     ┌──────▼──────┐
       │  yFinance   │    │     │  Fuzzy Match │
       │  5yr daily  │    │     │  → S&P 500   │
       │  30d intra  │    │     │    tickers   │
       └──────┬──────┘    │     └─────────────┘
              │            │
       ┌──────▼──────┐    │
       │  TimeGPT-1  │    │
       │  7-12 bar   │    │
       │  forecast   │    │
       └──────┬──────┘    │
              │            │
              └────────────┤
                           │
                    ┌──────▼──────┐
                    │   Signal    │
                    │  Generator  │
                    │             │
                    │ BUY / SELL  │
                    │   / HOLD   │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐ ┌──▼───┐ ┌──────▼──────┐
       │  Streamlit  │ │ CSV  │ │  Power BI   │
       │  Dashboard  │ │Export │ │  Analytics  │
       └─────────────┘ └──────┘ └─────────────┘

Risk Management

Control Implementation
Signal Confidence Filter Only acts on sentiment scores >= 0.98 — filters out noise
Volume Validation Requires minimum article mention count before generating signals
Volatility Analysis Calculates annualized volatility per ticker for position sizing
Forecast Confidence Bands 80/90/95% intervals provide stop-loss and take-profit zones
Epoch Analysis Validates signals against historical event-driven price patterns

Sample Output

📅 News fetched for 2025-04-10 — 87 articles from 15 sources
🧠 spaCy extracted 23 unique company mentions
📊 FinBERT scored sentiment across all headlines

🔍 Top Tickers by Sentiment:
   AAPL  →  Score: 0.987  |  Mentions: 12  |  Sentiment: POSITIVE
   TSLA  →  Score: 0.034  |  Mentions: 8   |  Sentiment: NEGATIVE
   NCLH  →  Score: 0.512  |  Mentions: 3   |  Sentiment: NEUTRAL

📈 TimeGPT Forecasts:
   AAPL  →  Current: $178.32  |  Forecast (7d): $182.15  |  Direction: ↑
   TSLA  →  Current: $245.10  |  Forecast (7d): $238.47  |  Direction: ↓

✅ AAPL: BUY  (positive sentiment + forecast UP)
❌ TSLA: SELL (negative sentiment + forecast DOWN)
🤝 NCLH: HOLD (neutral — conflicting signals)

What I Learned Building This

  • Designed and implemented a multi-stage data pipeline that chains NLP, time-series, and signal generation
  • Integrated 3 ML models (FinBERT, TimeGPT, spaCy NER) into a single cohesive system
  • Built a production-style dashboard with Streamlit for real-time data exploration
  • Applied Superposed Epoch Analysis — a statistical technique from geophysics — to financial data
  • Worked with 5 external APIs and handled rate limiting, auth, and data normalization
  • Practiced end-to-end ML engineering: data ingestion, preprocessing, inference, post-processing, and visualization

Future Roadmap

  • Live paper trading — integrate with Alpaca API for simulated execution
  • WebSocket price feeds — real-time streaming instead of batch polling
  • Database backend — migrate from CSV to PostgreSQL/TimescaleDB
  • Backtesting framework — walk-forward optimization with Sharpe/Sortino/Calmar metrics
  • CI/CD pipeline — GitHub Actions for automated daily runs and testing
  • Alerting system — Slack/email notifications on high-confidence signals
  • Portfolio-level risk — correlation matrix, VaR, maximum drawdown limits
  • Multi-model ensemble — combine TimeGPT with LSTM and Prophet for robust forecasts

License

Licensed under the GNU General Public License v3.0 — see LICENSE for details.


Built by Stocex Team | University of Arizona — School of Information

"Let the AI read the news, so you can read the profits."

About

AI-powered algorithmic trading system that combines FinBERT sentiment analysis, spaCy NER, and TimeGPT forecasting to generate BUY/SELL/HOLD signals from real-time financial news.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors