Skip to content

fandishemussa/Bert-Based-Comment-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

BERT-Based Comment Classification

This project demonstrates how to use the BERT (Bidirectional Encoder Representations from Transformers) model for classifying comments. It is designed to identify and categorize user comments based on their semantic content—useful for detecting spam, abuse, sentiment, or topic relevance in forums, customer feedback, or online platforms.

Features

Fine-tuning of a pretrained BERT model (bert-base-uncased) for text classification tasks

Data preprocessing and tokenization using Hugging Face's transformers and datasets

Training with GPU support and performance tracking via accuracy and loss metrics

Evaluation with classification report and confusion matrix for detailed insight

📁 Project Structure

📆comment-classification-bert

├ 📋comment_classification_task.ipynb

├ 📋README.md

┗ 📂data/ (optional)

🚀 How to Run

1. Clone the Repository

git clone https://github.com/yourusername/comment-classification-bert.git cd comment-classification-bert

2. Install Dependencies

pip install -r requirements.txt

You can also install manually:

pip install torch transformers datasets scikit-learn matplotlib seaborn

3. Run the Notebook

Launch the notebook in Jupyter or any compatible IDE:

jupyter notebook comment_classification_task.ipynb

🧪 Dataset

The dataset used consists of labeled user comments. Each comment is associated with a category label. The supported labels in this project are:

toxic

severe_toxic

obscene

threat

insult

identity_hate

non-toxic

You may customize the notebook to load your own dataset in CSV or JSON format with the following structure:

comment,text,label 1,"This is a great product!",non-toxic 2,"You are terrible!",toxic

🧠 Model Details

Model: bert-base-uncased (Hugging Face Transformers)

Fine-tuned for text classification

Optimizer: AdamW

Loss Function: CrossEntropyLoss

Evaluation: Accuracy, Confusion Matrix, Precision/Recall/F1

📊 Results

After training, the model is evaluated on a test set, and results include:

Accuracy score

Confusion matrix (visualized using seaborn)

Classification report with precision, recall, and F1-score

📈 Visualization

The notebook includes plots of training and validation loss, and confusion matrix for better interpretability.

🛠️ Customization

Replace dataset with your own comment data

Adjust number of epochs, batch size, and learning rate

Add early stopping or additional metrics

About

BERT-based comment classification model using Hugging Face Transformers. Fine-tunes bert-base-uncased to detect and categorize comments (e.g., toxic, positive). Includes training, evaluation, and visualization in a Jupyter Notebook.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors