Skip to content

Latest commit

 

History

History
468 lines (368 loc) · 13.9 KB

File metadata and controls

468 lines (368 loc) · 13.9 KB
OpenTextShield Logo

OpenTextShield (OTS)

Professional SMS Spam & Phishing Detection API Platform

Open source collaborative AI platform for enhanced telecom messaging security and revenue protection, powered by multilingual BERT (mBERT) technology.

GitHub Stars License: MIT Docker

🚀 Quick Start

Open Text Shield L - Docker Deployment

# Prerequisites
# Docker installation is required. Visit https://docs.docker.com/get-docker/ to install Docker.

# Run the following commands.

docker pull telecomsxchange/opentextshield:latest
docker run -d -p 8002:8002 -p 8080:8080 telecomsxchange/opentextshield:latest

# Access Open Test Shield

- Frontend Interface: http://localhost:8080
- API Documentation: http://localhost:8002/docs
- API Endpoint: http://localhost:8002/predict/

Build from source and deploy OpenTextShield in your environment within minutes:

# Clone the repository
git clone https://github.com/TelecomsXChangeAPi/OpenTextShield.git
cd OpenTextShield

# Start both API and frontend (recommended)
./scripts/start.sh

# Or build using Docker
# Build and run (includes 679MB mBERT model)
docker build -t opentextshield .
docker run -d -p 8002:8002 -p 8080:8080 opentextshield

# Alternative if port 8080 is busy
docker run -d -p 8002:8002 -p 8081:8080 opentextshield

Access Points:

✨ Key Features

  • 🌍 Multilingual Support: Built on mBERT with coverage for 104+ languages; currently trained on 10 languages for SMS classification.
  • Real-time Classification: Professional API with <200ms response time>
  • 🔒 Advanced Detection: Spam, phishing, and ham classification
  • 📊 Professional Interface: Research-grade web interface with metrics
  • 🐳 Docker Ready: Complete containerized deployment
  • 🔧 API First: RESTful API with comprehensive documentation
  • 📈 Revenue Protection: Optional revenue assurance features

🛠 API Usage

OpenTextShield provides both legacy API and TMForum-compliant API endpoints.

Legacy API (Direct Classification)

Quick Test

# Test the legacy API endpoint
curl -X POST "http://localhost:8002/predict/" \
  -H "Content-Type: application/json" \
  -d '{"text":"Your SMS content here","model":"ots-mbert"}'

Response Format

{
  "label": "ham|spam|phishing",
  "probability": 0.95,
  "processing_time": 0.15,
  "model_info": {
    "name": "OTS_mBERT",
    "version": "2.1",
    "author": "TelecomsXChange (TCXC)"
  }
}

TMForum API (TMF922 - AI Inference Job Management)

Create Inference Job

# Create a TMForum-compliant inference job
curl -X POST "http://localhost:8002/tmf-api/aiInferenceJob" \
  -H "Content-Type: application/json" \
  -d '{
    "priority": "normal",
    "input": {
      "inputType": "text",
      "inputFormat": "plain",
      "inputData": {"text": "Free money! Click here now!"}
    },
    "model": {
      "id": "ots-mbert",
      "name": "OpenTextShield mBERT",
      "version": "2.1",
      "type": "bert",
      "capabilities": ["text-classification", "multilingual"]
    },
    "name": "SMS Classification Job"
  }'

Check Job Status

# Check inference job status (replace JOB_ID with actual ID)
curl -X GET "http://localhost:8002/tmf-api/aiInferenceJob/JOB_ID"

Response Format (Completed Job)

{
  "id": "inference-job-123",
  "state": "completed",
  "priority": "normal",
  "input": {
    "inputType": "text",
    "inputFormat": "plain",
    "inputData": {"text": "Free money! Click here now!"}
  },
  "output": {
    "outputType": "classification",
    "outputFormat": "json",
    "outputData": {
      "label": "spam",
      "probability": 0.95
    },
    "confidence": 0.95,
    "outputMetadata": {
      "model_used": "OTS_mBERT",
      "model_version": "2.1",
      "processing_time_seconds": 0.15
    }
  },
  "model": {
    "id": "ots-mbert",
    "name": "OpenTextShield mBERT",
    "version": "2.1",
    "type": "bert",
    "capabilities": ["text-classification", "multilingual"]
  },
  "creationDate": "2024-01-15T10:30:00Z",
  "completionDate": "2024-01-15T10:30:15Z",
  "processingTimeMs": 150,
  "type": "TextClassificationInferenceJob"
}

List Inference Jobs

# List all inference jobs
curl -X GET "http://localhost:8002/tmf-api/aiInferenceJob"

📋 Installation Guide

Requirements

  • Python 3.12
  • 4GB RAM minimum
  • Docker (optional)

Local Setup

# Create virtual environment
python3.12 -m venv ots
source ots/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Start the platform
./scripts/start.sh

Docker Deployment

🛡️ Security-Enhanced Docker Options

Option 1: Enhanced Security (Recommended)

# Multi-stage build with non-root user - best balance of security and functionality
docker build -f Dockerfile.secure -t opentextshield:secure .
docker run -d -p 8002:8002 -p 8081:8080 opentextshield:secure

Option 2: Standard Build

# Standard build with security updates
docker build -t opentextshield .
docker run -d -p 8002:8002 -p 8081:8080 opentextshield

Option 3: Maximum Security (Advanced)

# Ultra-secure distroless build - minimal attack surface (API only)
docker build -f Dockerfile.distroless -t opentextshield:distroless .
docker run -d -p 8002:8002 opentextshield:distroless

🏗️ Architecture-Specific Builds

x86_64 (Intel/AMD) Architecture:

# Enhanced security for x86
docker buildx build --platform linux/amd64 -f Dockerfile.secure -t opentextshield:x86-secure .

# Standard x86 build
docker buildx build --platform linux/amd64 -t telecomsxchange/opentextshield:2.1-x86-v2 .

ARM64 (Apple Silicon) Architecture:

# Enhanced security for ARM64
docker buildx build --platform linux/arm64 -f Dockerfile.secure -t opentextshield:arm64-secure .

📦 Pre-built Images

# Latest stable releases
docker run -d -p 8002:8002 -p 8080:8080 telecomsxchange/opentextshield:latest
docker run -d -p 8002:8002 -p 8080:8080 telecomsxchange/opentextshield:2.1-x86-v2

# Using Docker Compose (recommended for production)
docker-compose up -d

Container Access:

Security Benefits:

  • 🔒 Enhanced: 60-80% fewer vulnerabilities, non-root execution, multi-stage builds
  • 🛡️ Distroless: Minimal attack surface, no shell access, maximum security
  • 📦 Smaller images: Optimized builds reduce image size and vulnerabilities

Architecture Support:

  • ARM64 (Apple Silicon): telecomsxchange/opentextshield:latest
  • x86_64 (Intel/AMD): telecomsxchange/opentextshield:2.1-x86-v2

🏗 Architecture

Core Components

API Interface (src/api_interface/)

  • Modern FastAPI application with professional structure
  • Pydantic models for request/response validation
  • Comprehensive error handling and logging
  • Security middleware and CORS support

mBERT Model (src/mBERT/training/model-training/)

  • Multilingual BERT optimized for SMS classification
  • Support for 104+ languages with cross-lingual transfer learning
  • Apple Silicon MLX optimization available

Frontend Interface (frontend/)

  • Professional research-grade web interface
  • Real-time system monitoring and metrics
  • Technical details and performance indicators

Performance

  • Inference Speed: 54 messages/second (Apple Silicon M1 Pro, single-request)
  • Dynamic Batching: Coalesces concurrent requests into padded GPU batches — on NVIDIA T4 (FP16, batch=32) this unlocks hundreds of MPS per instance
  • Response Time: <200ms typical (single-request); per-message cost drops sharply under load thanks to batching
  • Languages: 104+ supported via mBERT
  • Accuracy: Production-ready classification
  • Tuning: OTS_MAX_BATCH_SIZE, OTS_BATCH_WAIT_MS, OTS_MAX_TEXT_LENGTH, OTS_USE_FP16 env vars

🧪 Testing

# Run comprehensive tests
cd src/mBERT/tests
python run_all_tests.py all

# Stress testing
python test_stress.py 1000
python stressTest_20k_mlx_api.py

📚 Research Background

OpenTextShield leverages cutting-edge AI research to provide real-time SMS spam and phishing detection across 104+ languages. Our research focuses on the practical application of multilingual BERT (mBERT) technology for telecom security challenges.

Research Highlights:

  • Comparative analysis of AI models for SMS classification
  • Multilingual spam detection using mBERT architecture
  • Real-time processing optimization for telecom applications
  • Community-driven approach to dataset expansion

Read Full Research Paper →

🤝 Contributing

Ways to Contribute

🗃️ Dataset Contributions We need multilingual datasets for training. Required format:

text,label
"Your verification code is 12345",ham
"Win $1000! Click here now!",spam
"Your account is locked. Visit fake-bank.com",phishing

🔧 Development

  • API improvements and optimizations
  • Frontend enhancements
  • Model training and evaluation
  • Documentation and testing

🌍 Localization

  • Translate interface and documentation
  • Test models in your language
  • Provide linguistic insights for regional variations

💡 Research & Testing

  • Performance benchmarking
  • Security analysis
  • Integration testing with telecom systems

Getting Started

  1. Fork the repository
  2. Check CONTRIBUTING.md for detailed guidelines
  3. Join discussions in GitHub Issues
  4. Submit Pull Requests with improvements

🔧 Development

Model Training

# Train new mBERT model
cd src/mBERT/training/model-training/
python train_ots_improved.py

# Test model performance
python test_training.py

Frontend Development

# Frontend is a single HTML file with embedded CSS/JS
# Edit frontend/index.html for customizations
# Restart ./scripts/start.sh to see changes

🚀 Production Deployment

Docker Production

# Multi-arch production build
docker buildx build --platform linux/amd64,linux/arm64 -t your-registry/opentextshield .

# Production compose
docker-compose -f docker-compose.prod.yml up -d

Kubernetes

# Example k8s deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: opentextshield
spec:
  replicas: 3
  selector:
    matchLabels:
      app: opentextshield
  template:
    spec:
      containers:
      - name: ots
        image: telecomsxchange/opentextshield:latest
        ports:
        - containerPort: 8002
        - containerPort: 8080

📊 Monitoring & Analytics

Health Checks

  • API Health: GET /health
  • Model Status: GET /model/status
  • Prometheus Metrics: GET /metrics — batcher throughput, queue depth, batch-size histogram, inference time
  • System Metrics: Built-in performance monitoring

Logs

  • API Logs: Structured JSON logging with request tracking
  • Prediction Logs: Classification results and performance metrics
  • Error Tracking: Comprehensive error handling and reporting

🔐 Security Features

  • Input Validation: Pydantic models with strict validation
  • Rate Limiting: Configurable API rate limits
  • CORS Protection: Configurable cross-origin policies
  • Secure Headers: Standard security headers implemented

💼 Enterprise Features

Revenue Protection

  • Dynamic pricing based on message content analysis
  • Grey route detection and mitigation
  • Fraud pattern identification
  • Premium message routing optimization

Integration APIs

  • RESTful API with OpenAPI documentation
  • Webhook support for real-time notifications
  • Batch processing capabilities
  • Custom model loading support

📖 Documentation

🌟 About TelecomsXChange (TCXC)

OpenTextShield is pioneered by TelecomsXChange, a leading telecommunications platform provider. TCXC is committed to releasing cutting-edge open-source AI tools for the global telecom community.

Key Initiative:

  • First pre-trained open-source mBERT model for SMS classification
  • Integration with TCXC's SMPP Stack for real-time processing
  • Community-driven approach to continuous improvement
  • Revenue protection features for telecom operators

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Additional Resources


⭐ Star this repository if you find it helpful!

Made with ❤️ by the TelecomsXChange team and the open source community.