Skip to content

lCaptNemol/NewsDraft

Repository files navigation

Internal Newsletter Formatter

*Created using Replit's AI Agent and Cursor AI

Overview

This application is a Streamlit-based web tool designed to transform raw Word documents containing case studies, updates, or briefings into clean, professionally formatted internal newsletters using AI processing. The system extracts content from uploaded Word documents, processes it using a Large Language Model (LLM), and generates a formatted newsletter as output.

User Preferences

Preferred communication style: Simple, everyday language.

System Architecture

The application follows a straightforward architecture with the following components:

  1. Frontend: Streamlit web interface for document uploading and displaying processed content
  2. Document Processing: Python-docx based document parsing and formatting
  3. AI Processing: Integration with Groq LLM API for content enhancement
  4. Configuration Management: ConfigParser-based settings management

The application is designed as a single-page web application where users can upload Word documents, view the extracted content, and generate a formatted newsletter.

Key Components

1. Streamlit Web Interface (app.py)

The main application entry point that provides:

  • File upload functionality for Word documents
  • Progress tracking for document processing
  • Display of both original and AI-processed content
  • Newsletter generation and download capabilities

2. Document Processor (document_processor.py)

Handles all Word document operations:

  • Extracting content from uploaded Word documents
  • Identifying document structure based on headings
  • Creating new formatted Word documents for the newsletter output

3. LLM Processor (llm_processor.py)

Manages communication with the Groq LLM API:

  • Loads API configuration from config file or environment variables
  • Sends content to the LLM for processing
  • Parses and returns the enhanced content

4. Configuration (config.ini)

Stores API configurations:

  • API key (can be set through environment variable)
  • Model selection (default: llama3-70b-8192)
  • API endpoint URL

Data Flow

  1. Input: User uploads a Word document through the Streamlit interface
  2. Extraction: The document processor extracts content and structure from the uploaded file
  3. Processing: The extracted content is sent to the LLM for enhancement and formatting
  4. Display: Both original and processed content are displayed in the UI
  5. Output: A formatted newsletter is generated as a downloadable Word document

External Dependencies

Python Libraries

  • streamlit: Web application framework
  • python-docx: Word document processing
  • requests: HTTP requests for API communication
  • configparser: Configuration management

External Services

  • Groq API: LLM service for content processing
    • Uses the llama3-70b-8192 model by default
    • Requires an API key (can be set in config.ini or as an environment variable GROQ_API_KEY)

Deployment Strategy

The application is configured for deployment on Replit with:

  1. Runtime Environment:

    • Python 3.11
    • Nix channel: stable-24_05
  2. Deployment Configuration:

    • Target: autoscale
    • Port: 5000
    • Command: streamlit run app.py --server.port 5000
  3. Streamlit Configuration:

    • Headless mode enabled
    • Listening on all interfaces (0.0.0.0)

Development Notes

Incomplete Implementations

The current codebase has some incomplete functions:

  • The extract_content_from_docx function in document_processor.py has an exception handler without implementation
  • The process_content_with_llm function in llm_processor.py is incomplete when API key is missing
  • The create_newsletter_docx function is referenced but not fully implemented

Potential Enhancements

  1. Add error handling for missing API keys
  2. Implement caching for LLM responses to reduce API costs
  3. Add user authentication if sensitive information is processed
  4. Implement template selection for different newsletter formats
  5. Add content summarization options for lengthy documents

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages