Job Fraud Detection Web Application

A Django-based web application for job posting fraud detection using machine learning. Analyze datasets, explore four ML algorithms, compare performance, and predict fraud risk for individual postings.

Features

Dashboard: Quick access to all major sections with intuitive cards
Dataset Page: View dataset statistics (total, real, fake jobs) and data splits
Algorithms Page: Explore four ML models
- Logistic Regression (TF-IDF baseline)
- Random Forest (ensemble method)
- CNN (1D convolutional neural network)
- LSTM (recurrent neural network)
- Displays: confusion matrix, ROC curves, AUC, F1 scores, data distributions
Comparison Page: Side-by-side performance analysis of all models
Prediction Page: Input a job posting to predict fraud probability
- Fields: title, description, location
- Returns: fraud/legit label and confidence score
- Includes guidelines and sample postings

Project Structure

jobfraud/
├── manage.py                     # Django management script
├── jobfraud/                     # Project configuration
│   ├── settings.py              # Django settings (INSTALLED_APPS, templates, static paths)
│   ├── urls.py                  # Root URL routing
│   ├── wsgi.py                  # WSGI app
│   └── asgi.py
├── core/                         # Main app
│   ├── views.py                 # View functions (dashboard, dataset, algorithms, etc.)
│   ├── urls.py                  # App URL patterns
│   ├── models.py                # Django models (optional)
│   ├── admin.py                 # Django admin registration
│   └── apps.py
├── templates/                    # HTML templates
│   ├── base.html                # Base layout with sidebar navigation
│   ├── dashboard.html           # Dashboard with cards
│   ├── dataset.html             # Dataset stats
│   ├── algorithms.html          # Algorithm details and metrics
│   ├── comparison.html          # Model comparison
│   └── prediction.html          # Fraud prediction form
├── static/                       # Static files
│   └── css/
│       └── styles.css           # Custom CSS styling
├── ml/                          # Machine learning
│   ├── train.py                 # Training script (Logistic Regression, Random Forest, CNN, LSTM)
│   ├── data/                    # Dataset directory (place jobs_dataset.csv here)
│   ├── models/                  # Trained model artifacts (joblib, .h5)
│   ├── metrics/                 # Model metrics JSON files
│   └── plots/                   # Generated plots (confusion matrix, ROC, etc.)
└── .github/                      # GitHub-specific
    └── copilot-instructions.md  # Copilot setup instructions

Screenshots

Installation & Setup

1. Clone or navigate to the project

cd "C:\Users\kanmani dhaya\New folder"

2. Activate virtual environment

The project uses a Python virtual environment. Activate it:

Windows (PowerShell):

.venv\Scripts\Activate.ps1

Windows (Command Prompt):

.venv\Scripts\activate.bat

3. Install dependencies

pip install django djangorestframework scikit-learn pandas numpy matplotlib seaborn plotly joblib tensorflow-cpu

4. Run migrations

python manage.py migrate

5. Start the development server

python manage.py runserver

The app will be available at http://localhost:8000

Pages Overview

Dashboard (`/`)

Four cards linking to main sections
Quick navigation to dataset, algorithms, comparison, and prediction

Dataset (`/dataset/`)

Total jobs: 17,000 (placeholder—replace with actual count)
Real jobs vs fake jobs breakdown
Dataset statistics (avg words, median title length, etc.)
List of combined datasets and dataset variants

Algorithms (`/algorithms/`)

Logistic Regression: Fast, interpretable baseline
Random Forest: Ensemble, handles non-linear patterns
CNN: Convolutional layers for text, feature extraction
LSTM: Recurrent network with bidirectional processing

For each model, displays:

Split ratio (e.g., 70/15/15 for train/val/test)
AUC, Accuracy, F1 score
Confusion matrix
ROC curve (placeholder)

Comparison (`/comparison/`)

Tabular comparison of AUC, F1, Accuracy across all four models
Key insights (e.g., "tree-based models improve recall on minority class")
Placeholder for Plotly/Chart.js comparison chart

Prediction (`/prediction/`)

Form with three inputs: job title, description, location
Returns fraud probability and risk label (Fraud or Legit)
Right sidebar with:
- How-to guidelines
- Sample job postings (Data Scientist, Software Engineer)

Training & ML Pipeline

Load Your Dataset

Place your CSV file at ml/data/jobs_dataset.csv
Expected columns:
- title: Job title string
- description: Job description text
- location: Location string
- label: 0 (legitimate) or 1 (fraudulent)

Run Training Script

python ml/train.py

This script will:

Load and split the dataset (70/15/15 by default)
Preprocess text (TF-IDF vectorization)
Train Logistic Regression and Random Forest
Export models and metrics to ml/models/ and ml/metrics/
Print evaluation metrics

Integrate Trained Models into Views

After training, update core/views.py to load saved models:

import joblib

lr_model = joblib.load('ml/models/logistic_regression.pkl')
vectorizer = joblib.load('ml/models/vectorizer.pkl')

def predict(title, description, location):
    combined = f"{title} {description} {location}"
    X = vectorizer.transform([combined])
    prob = lr_model.predict_proba(X)[0, 1]
    return "Fraud" if prob > 0.5 else "Legit", prob

Customization

Add CNN & LSTM Models

The ml/train.py contains placeholder functions. To implement:

Use TensorFlow/Keras for neural networks
Define architecture (embeddings → Conv/LSTM → Dense)
Train on GPU if available (TensorFlow handles this)
Save as .h5 files

Example outline:

from tensorflow import keras

model = keras.Sequential([
    keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_len),
    keras.layers.Conv1D(filters=128, kernel_size=5, activation='relu'),
    keras.layers.GlobalMaxPooling1D(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=10)
model.save('ml/models/cnn_model.h5')

Update Metrics in Views

After training, replace placeholder values in algorithms and comparison views:

# Load from JSON files saved during training
import json
with open('ml/metrics/logistic_regression.json') as f:
    lr_metrics = json.load(f)

Styling

Modify static/css/styles.css to customize colors, fonts, and layout.

Development Notes

Framework: Django 6.0 with Bootstrap 5.3.3
Database: SQLite (default)
ML Libraries: scikit-learn, pandas, numpy, TensorFlow/Keras (optional for CNN/LSTM)
Visualization: Plotly, Matplotlib, Seaborn (plots saved as images/JSON)
Text Processing: TF-IDF (sklearn), Tokenizer (Keras)

Troubleshooting

Port 8000 already in use

python manage.py runserver 8001

Import errors for ML packages

pip install --upgrade scikit-learn pandas numpy tensorflow-cpu

Dataset not found

Place your CSV at ml/data/jobs_dataset.csv and re-run ml/train.py.

Migrations errors

python manage.py makemigrations
python manage.py migrate

Future Enhancements

Database Models: Store predictions and user history
API Endpoints: REST API for programmatic access
Real-time Metrics: Dashboard with live model performance tracking
Explainability: SHAP/LIME to explain individual predictions
Deployment: Docker, AWS/GCP cloud deployment
Authentication: User login and role-based access

License

This project is for educational purposes. Adjust licensing as needed.

Support

For issues or questions, refer to Django documentation and scikit-learn guides.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Screenshots		Screenshots
core		core
jobfraud		jobfraud
ml		ml
static/css		static/css
templates		templates
DEPLOYMENT_READY.txt		DEPLOYMENT_READY.txt
Procfile		Procfile
RAILWAY_DEPLOYMENT.md		RAILWAY_DEPLOYMENT.md
README.md		README.md
SETUP_COMPLETE.md		SETUP_COMPLETE.md
db.sqlite3		db.sqlite3
manage.py		manage.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Folders and files

Latest commit

History

Repository files navigation

Job Fraud Detection Web Application

Features

Project Structure

Installation & Setup

1. Clone or navigate to the project

2. Activate virtual environment

3. Install dependencies

4. Run migrations

5. Start the development server

Pages Overview

Dashboard (/)

Dataset (/dataset/)

Algorithms (/algorithms/)

Comparison (/comparison/)

Prediction (/prediction/)

Training & ML Pipeline

Load Your Dataset

Run Training Script

Integrate Trained Models into Views

Customization

Add CNN & LSTM Models

Update Metrics in Views

Styling

Development Notes

Troubleshooting

Port 8000 already in use

Import errors for ML packages

Dataset not found

Migrations errors

Future Enhancements

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Dashboard (`/`)

Dataset (`/dataset/`)

Algorithms (`/algorithms/`)

Comparison (`/comparison/`)

Prediction (`/prediction/`)

Packages