Skip to content

Latest commit

Β 

History

History
123 lines (96 loc) Β· 4.54 KB

File metadata and controls

123 lines (96 loc) Β· 4.54 KB

🐦 Twitter Sentiment Analysis

πŸ“‹ Project Overview

This project aims to build a sentiment analysis pipeline that classifies tweets as positive 😊, negative 😠, or neutral 😐. Currently, the project uses a pre-existing dataset for model training and evaluation. The next stage will enhance the project to analyze real-time tweets fetched using the Twitter Developer API.

The project involves key Natural Language Processing (NLP) techniques like cleaning, tokenization, stopword removal, and stemming, followed by training machine learning models such as Naive Bayes, Support Vector Machines (SVM), and Logistic Regression.


πŸš€ Project Status

  • βœ… Current Stage:
    • Model training, evaluation, and testing completed using a pre-existing dataset.
    • Preliminary visualizations (e.g., word clouds, confusion matrices) completed.
  • πŸ”§ Next Steps:
    • Integration with Twitter Developer API for real-time data πŸ•’.
    • Deploying the project as a web application using Flask or Streamlit for user interaction.
    • Continuous model improvement with live data.

✨ Features

  1. πŸ”„ Data Preprocessing:
    • Cleaning tweets by removing URLs 🌐, punctuation ❗, numbers πŸ”’, and special characters.
    • Tokenization and stopword removal for feature extraction πŸ› οΈ.
    • Text normalization using stemming πŸ”.
  2. πŸ“Š Visualization:
    • Word clouds for positive πŸ’¬ and negative πŸ”΄ sentiments.
    • Confusion matrices for model evaluation.
  3. 🧠 Model Training:
    • Supports multiple machine learning models:
      • 🟑 Naive Bayes
      • 🟒 Support Vector Machines (SVM)
      • πŸ”΅ Logistic Regression
    • Evaluation metrics include accuracy βœ…, precision 🎯, recall πŸ”, F1-score πŸ†, and ROC-AUC curves πŸ“ˆ.
  4. 🌟 Future Plans:
    • Real-time tweet collection and analysis.
    • Interactive web-based interface for sentiment prediction.

πŸ› οΈ Technologies Used

  • Languages and Libraries:
    • Python 🐍, Numpy, Pandas, Scikit-learn, Matplotlib, Seaborn, NLTK, WordCloud
  • Visualization Tools:
    • Confusion Matrix, WordCloud, ROC Curve πŸ“‰
  • Future Integration:
    • Twitter Developer API for live tweet collection 🐦
    • Flask/Streamlit for web deployment 🌐

πŸ’» Getting Started

πŸ“‹ Prerequisites

  • Install Python 3.x 🐍
  • Install the required dependencies:
    pip install -r requirements.txt
    (The requirements.txt file includes dependencies like scikit-learn, NLTK, WordCloud, etc.)

βš™οΈ Setup Instructions

  1. πŸ—‚οΈ Clone the Repository:

    git clone https://github.com/<shreyadata804>/twitter-sentiment-analysis.git
    cd twitter-sentiment-analysis
  2. πŸ“‚ Data Preparation:

    • Pre-existing Dataset:
      • Place the dataset in the /data folder.
      • Run the notebook or script to clean, preprocess, and train models.
    • Real-Time Tweets (Future Plan):
      • Obtain API keys from the Twitter Developer Platform.
      • Configure the tweepy library to fetch live tweets.
  3. πŸš€ Run the Notebook: Open and execute the Jupyter Notebook to preprocess data, train the model, and evaluate results.


πŸ’‘ Example Usage

  1. Analyzing Preloaded Dataset:

    • Run the notebook to process the dataset and generate predictions.
  2. Future Use Case with Real-time Tweets:

    • Fetch live tweets using the Twitter Developer API.
    • Pass the tweets through the preprocessing and model pipeline.
    • Obtain sentiment predictions.

πŸ“… Project Roadmap

  1. βœ… Completed:
    • Training and evaluating models using a pre-existing dataset.
    • Initial visualizations and exploratory data analysis.
  2. πŸ”§ Ongoing:
    • Integration with Twitter Developer API for real-time data.
    • Deploying the model via Flask/Streamlit.
  3. πŸš€ Future Enhancements:
    • Model optimization for better accuracy with large-scale real-time data.
    • Addition of advanced NLP techniques like BERT for sentiment classification.

πŸ“£ Acknowledgements

  • Sentiment140 Dataset for initial analysis.
  • Python libraries like NLTK and Scikit-learn for NLP and machine learning tasks.
  • Future reliance on Twitter Developer API for live data integration.

🀝 Contributing

Contributions are welcome! Feel free to fork the project and submit pull requests.


πŸ“ž Contact

For questions or suggestions, contact: