Skip to content

RRonium/SentimentStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SentimentStream: NLP Sentiment Scorer

Overview

SentimentStream is a high-performance C++ console application designed to analyze the emotional tone of text. It utilizes natural language processing (NLP) techniques to tokenize input and compare it against a sentiment dictionary. The project demonstrates advanced C++ concepts, focusing on memory efficiency and object-oriented design.

Key Features

  • Dual Scoring Logic: Uses Polymorphism to provide both simple word counts and weighted intensity analysis.
  • Memory Optimization: Implements Move Semantics to handle large text bodies without expensive deep-copy operations.
  • High-Speed Lookup: Utilizes std::unordered_map for $O(1)$ average time complexity during dictionary lookups.
  • Robust Error Handling: Custom exception classes manage missing files or invalid input scenarios.
  • Generic Calculations: Uses Templates to calculate average scores across different numeric precisions.

Technical Architecture

1. Memory Management (Buffer Class)

The Buffer class manages raw text data using Direct Memory Access (DMA). It strictly follows the Rule of Five, ensuring that resources are correctly managed during copy and move operations. By using std::move, the program "transfers ownership" of string data rather than copying it, making it ideal for large-scale text analysis.

2. Rule-Based Scoring (Scorer Hierarchy)

The application uses an abstract base class Scorer to define the interface for sentiment analysis:

  • WordCountScorer: Provides a basic sentiment count (+1 for positive, -1 for negative).
  • WeightedScorer: Provides a nuanced score based on the specific "weight" assigned to a word in the dictionary (e.g., "excellent" is +5, "good" is +1).

3. Data Structures

  • std::unordered_map: Stores the sentiment dictionary for instant lookup.
  • std::stringstream: Used for efficient tokenization of the input buffer.

File Structure

  • main.cpp: Orchestrates user input, file I/O, and the analysis flow.
  • Buffer.h / Buffer.cpp: Handles DMA and Move Semantics.
  • Scorer.h: Contains the inheritance hierarchy for scoring algorithms.
  • Utils.h: Includes the calculateAverage<T> template and custom exceptions.
  • dictionary.txt: The sentiment reference file.
  • input.txt: Automatically generated file storing the latest analyzed text.

Getting Started

Prerequisites

  • A C++ compiler supporting the C++17 standard or higher (e.g., g++ or clang).

Compilation

From the terminal, compile all source files together:

g++ -std=c++17 main.cpp Buffer.cpp -o SentimentStream

About

A program that analyses the nature user-defined text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages