OpenTextShield (OTS)

Professional SMS Spam & Phishing Detection API Platform

Open source collaborative AI platform for enhanced telecom messaging security and revenue protection, powered by multilingual BERT (mBERT) technology.

🚀 Quick Start

# Prerequisites
# Docker installation is required. Visit https://docs.docker.com/get-docker/ to install Docker.

# Run the following commands.

docker pull telecomsxchange/opentextshield:latest
docker run -d -p 8002:8002 -p 8080:8080 telecomsxchange/opentextshield:latest

# Access Open Test Shield

- Frontend Interface: http://localhost:8080
- API Documentation: http://localhost:8002/docs
- API Endpoint: http://localhost:8002/predict/

Build from source and deploy OpenTextShield in your environment within minutes:

# Clone the repository
git clone https://github.com/TelecomsXChangeAPi/OpenTextShield.git
cd OpenTextShield

# Start both API and frontend (recommended)
./scripts/start.sh

# Or build using Docker
# Build and run (includes 679MB mBERT model)
docker build -t opentextshield .
docker run -d -p 8002:8002 -p 8080:8080 opentextshield

# Alternative if port 8080 is busy
docker run -d -p 8002:8002 -p 8081:8080 opentextshield

Access Points:

Frontend Interface: http://localhost:8080
API Documentation: http://localhost:8002/docs
API Endpoint: http://localhost:8002/predict/

✨ Key Features

🌍 Multilingual Support: Built on mBERT with coverage for 104+ languages; currently trained on 10 languages for SMS classification.
⚡ Real-time Classification: Professional API with <200ms response time>
🔒 Advanced Detection: Spam, phishing, and ham classification
📊 Professional Interface: Research-grade web interface with metrics
🐳 Docker Ready: Complete containerized deployment
🔧 API First: RESTful API with comprehensive documentation
📈 Revenue Protection: Optional revenue assurance features

🛠 API Usage

OpenTextShield provides both legacy API and TMForum-compliant API endpoints.

Legacy API (Direct Classification)

Quick Test

# Test the legacy API endpoint
curl -X POST "http://localhost:8002/predict/" \
  -H "Content-Type: application/json" \
  -d '{"text":"Your SMS content here","model":"ots-mbert"}'

Response Format

{
  "label": "ham|spam|phishing",
  "probability": 0.95,
  "processing_time": 0.15,
  "model_info": {
    "name": "OTS_mBERT",
    "version": "2.1",
    "author": "TelecomsXChange (TCXC)"
  }
}

TMForum API (TMF922 - AI Inference Job Management)

Create Inference Job

# Create a TMForum-compliant inference job
curl -X POST "http://localhost:8002/tmf-api/aiInferenceJob" \
  -H "Content-Type: application/json" \
  -d '{
    "priority": "normal",
    "input": {
      "inputType": "text",
      "inputFormat": "plain",
      "inputData": {"text": "Free money! Click here now!"}
    },
    "model": {
      "id": "ots-mbert",
      "name": "OpenTextShield mBERT",
      "version": "2.1",
      "type": "bert",
      "capabilities": ["text-classification", "multilingual"]
    },
    "name": "SMS Classification Job"
  }'

Check Job Status

# Check inference job status (replace JOB_ID with actual ID)
curl -X GET "http://localhost:8002/tmf-api/aiInferenceJob/JOB_ID"

Response Format (Completed Job)

{
  "id": "inference-job-123",
  "state": "completed",
  "priority": "normal",
  "input": {
    "inputType": "text",
    "inputFormat": "plain",
    "inputData": {"text": "Free money! Click here now!"}
  },
  "output": {
    "outputType": "classification",
    "outputFormat": "json",
    "outputData": {
      "label": "spam",
      "probability": 0.95
    },
    "confidence": 0.95,
    "outputMetadata": {
      "model_used": "OTS_mBERT",
      "model_version": "2.1",
      "processing_time_seconds": 0.15
    }
  },
  "model": {
    "id": "ots-mbert",
    "name": "OpenTextShield mBERT",
    "version": "2.1",
    "type": "bert",
    "capabilities": ["text-classification", "multilingual"]
  },
  "creationDate": "2024-01-15T10:30:00Z",
  "completionDate": "2024-01-15T10:30:15Z",
  "processingTimeMs": 150,
  "type": "TextClassificationInferenceJob"
}

List Inference Jobs

# List all inference jobs
curl -X GET "http://localhost:8002/tmf-api/aiInferenceJob"

📋 Installation Guide

Requirements

Python 3.12
4GB RAM minimum
Docker (optional)

Local Setup

# Create virtual environment
python3.12 -m venv ots
source ots/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Start the platform
./scripts/start.sh

Docker Deployment

🛡️ Security-Enhanced Docker Options

Option 1: Enhanced Security (Recommended)

# Multi-stage build with non-root user - best balance of security and functionality
docker build -f Dockerfile.secure -t opentextshield:secure .
docker run -d -p 8002:8002 -p 8081:8080 opentextshield:secure

Option 2: Standard Build

# Standard build with security updates
docker build -t opentextshield .
docker run -d -p 8002:8002 -p 8081:8080 opentextshield

Option 3: Maximum Security (Advanced)

# Ultra-secure distroless build - minimal attack surface (API only)
docker build -f Dockerfile.distroless -t opentextshield:distroless .
docker run -d -p 8002:8002 opentextshield:distroless

🏗️ Architecture-Specific Builds

x86_64 (Intel/AMD) Architecture:

# Enhanced security for x86
docker buildx build --platform linux/amd64 -f Dockerfile.secure -t opentextshield:x86-secure .

# Standard x86 build
docker buildx build --platform linux/amd64 -t telecomsxchange/opentextshield:2.1-x86-v2 .

ARM64 (Apple Silicon) Architecture:

# Enhanced security for ARM64
docker buildx build --platform linux/arm64 -f Dockerfile.secure -t opentextshield:arm64-secure .

📦 Pre-built Images

# Latest stable releases
docker run -d -p 8002:8002 -p 8080:8080 telecomsxchange/opentextshield:latest
docker run -d -p 8002:8002 -p 8080:8080 telecomsxchange/opentextshield:2.1-x86-v2

# Using Docker Compose (recommended for production)
docker-compose up -d

Container Access:

API: http://localhost:8002
Frontend: http://localhost:8080 (or 8081)
Health: http://localhost:8002/health

Security Benefits:

🔒 Enhanced: 60-80% fewer vulnerabilities, non-root execution, multi-stage builds
🛡️ Distroless: Minimal attack surface, no shell access, maximum security
📦 Smaller images: Optimized builds reduce image size and vulnerabilities

Architecture Support:

ARM64 (Apple Silicon): telecomsxchange/opentextshield:latest
x86_64 (Intel/AMD): telecomsxchange/opentextshield:2.1-x86-v2

🏗 Architecture

Core Components

API Interface (src/api_interface/)

Modern FastAPI application with professional structure
Pydantic models for request/response validation
Comprehensive error handling and logging
Security middleware and CORS support

mBERT Model (src/mBERT/training/model-training/)

Multilingual BERT optimized for SMS classification
Support for 104+ languages with cross-lingual transfer learning
Apple Silicon MLX optimization available

Frontend Interface (frontend/)

Professional research-grade web interface
Real-time system monitoring and metrics
Technical details and performance indicators

Performance

Inference Speed: 54 messages/second (Apple Silicon M1 Pro, single-request)
Dynamic Batching: Coalesces concurrent requests into padded GPU batches — on NVIDIA T4 (FP16, batch=32) this unlocks hundreds of MPS per instance
Response Time: <200ms typical (single-request); per-message cost drops sharply under load thanks to batching
Languages: 104+ supported via mBERT
Accuracy: Production-ready classification
Tuning: OTS_MAX_BATCH_SIZE, OTS_BATCH_WAIT_MS, OTS_MAX_TEXT_LENGTH, OTS_USE_FP16 env vars

🧪 Testing

# Run comprehensive tests
cd src/mBERT/tests
python run_all_tests.py all

# Stress testing
python test_stress.py 1000
python stressTest_20k_mlx_api.py

📚 Research Background

OpenTextShield leverages cutting-edge AI research to provide real-time SMS spam and phishing detection across 104+ languages. Our research focuses on the practical application of multilingual BERT (mBERT) technology for telecom security challenges.

Research Highlights:

Comparative analysis of AI models for SMS classification
Multilingual spam detection using mBERT architecture
Real-time processing optimization for telecom applications
Community-driven approach to dataset expansion

Read Full Research Paper →

🤝 Contributing

Ways to Contribute

🗃️ Dataset Contributions We need multilingual datasets for training. Required format:

text,label
"Your verification code is 12345",ham
"Win $1000! Click here now!",spam
"Your account is locked. Visit fake-bank.com",phishing

🔧 Development

API improvements and optimizations
Frontend enhancements
Model training and evaluation
Documentation and testing

🌍 Localization

Translate interface and documentation
Test models in your language
Provide linguistic insights for regional variations

💡 Research & Testing

Performance benchmarking
Security analysis
Integration testing with telecom systems

Getting Started

Fork the repository
Check CONTRIBUTING.md for detailed guidelines
Join discussions in GitHub Issues
Submit Pull Requests with improvements

🔧 Development

Model Training

# Train new mBERT model
cd src/mBERT/training/model-training/
python train_ots_improved.py

# Test model performance
python test_training.py

Frontend Development

# Frontend is a single HTML file with embedded CSS/JS
# Edit frontend/index.html for customizations
# Restart ./scripts/start.sh to see changes

🚀 Production Deployment

Docker Production

# Multi-arch production build
docker buildx build --platform linux/amd64,linux/arm64 -t your-registry/opentextshield .

# Production compose
docker-compose -f docker-compose.prod.yml up -d

Kubernetes

# Example k8s deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: opentextshield
spec:
  replicas: 3
  selector:
    matchLabels:
      app: opentextshield
  template:
    spec:
      containers:
      - name: ots
        image: telecomsxchange/opentextshield:latest
        ports:
        - containerPort: 8002
        - containerPort: 8080

📊 Monitoring & Analytics

Health Checks

API Health: GET /health
Model Status: GET /model/status
Prometheus Metrics: GET /metrics — batcher throughput, queue depth, batch-size histogram, inference time
System Metrics: Built-in performance monitoring

Logs

API Logs: Structured JSON logging with request tracking
Prediction Logs: Classification results and performance metrics
Error Tracking: Comprehensive error handling and reporting

🔐 Security Features

Input Validation: Pydantic models with strict validation
Rate Limiting: Configurable API rate limits
CORS Protection: Configurable cross-origin policies
Secure Headers: Standard security headers implemented

💼 Enterprise Features

Revenue Protection

Dynamic pricing based on message content analysis
Grey route detection and mitigation
Fraud pattern identification
Premium message routing optimization

Integration APIs

RESTful API with OpenAPI documentation
Webhook support for real-time notifications
Batch processing capabilities
Custom model loading support

📖 Documentation

Installation Guide - Detailed setup instructions
API Documentation - Interactive API explorer
Model Training Guide - Train custom models
Testing Guide - Comprehensive testing suite
Docker Guide - Container deployment options

🌟 About TelecomsXChange (TCXC)

OpenTextShield is pioneered by TelecomsXChange, a leading telecommunications platform provider. TCXC is committed to releasing cutting-edge open-source AI tools for the global telecom community.

Key Initiative:

First pre-trained open-source mBERT model for SMS classification
Integration with TCXC's SMPP Stack for real-time processing
Community-driven approach to continuous improvement
Revenue protection features for telecom operators

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Additional Resources

Research Paper - Complete academic research
BERT Documentation - Original BERT paper
FastAPI Documentation - API framework docs
MLX Framework - Apple Silicon optimization

⭐ Star this repository if you find it helpful!

Made with ❤️ by the TelecomsXChange team and the open source community.

FilesExpand file tree

README.md

Latest commit

History