*Created using Replit's AI Agent and Cursor AI
This application is a Streamlit-based web tool designed to transform raw Word documents containing case studies, updates, or briefings into clean, professionally formatted internal newsletters using AI processing. The system extracts content from uploaded Word documents, processes it using a Large Language Model (LLM), and generates a formatted newsletter as output.
Preferred communication style: Simple, everyday language.
The application follows a straightforward architecture with the following components:
- Frontend: Streamlit web interface for document uploading and displaying processed content
- Document Processing: Python-docx based document parsing and formatting
- AI Processing: Integration with Groq LLM API for content enhancement
- Configuration Management: ConfigParser-based settings management
The application is designed as a single-page web application where users can upload Word documents, view the extracted content, and generate a formatted newsletter.
The main application entry point that provides:
- File upload functionality for Word documents
- Progress tracking for document processing
- Display of both original and AI-processed content
- Newsletter generation and download capabilities
Handles all Word document operations:
- Extracting content from uploaded Word documents
- Identifying document structure based on headings
- Creating new formatted Word documents for the newsletter output
Manages communication with the Groq LLM API:
- Loads API configuration from config file or environment variables
- Sends content to the LLM for processing
- Parses and returns the enhanced content
Stores API configurations:
- API key (can be set through environment variable)
- Model selection (default: llama3-70b-8192)
- API endpoint URL
- Input: User uploads a Word document through the Streamlit interface
- Extraction: The document processor extracts content and structure from the uploaded file
- Processing: The extracted content is sent to the LLM for enhancement and formatting
- Display: Both original and processed content are displayed in the UI
- Output: A formatted newsletter is generated as a downloadable Word document
streamlit: Web application frameworkpython-docx: Word document processingrequests: HTTP requests for API communicationconfigparser: Configuration management
- Groq API: LLM service for content processing
- Uses the llama3-70b-8192 model by default
- Requires an API key (can be set in config.ini or as an environment variable GROQ_API_KEY)
The application is configured for deployment on Replit with:
-
Runtime Environment:
- Python 3.11
- Nix channel: stable-24_05
-
Deployment Configuration:
- Target: autoscale
- Port: 5000
- Command:
streamlit run app.py --server.port 5000
-
Streamlit Configuration:
- Headless mode enabled
- Listening on all interfaces (0.0.0.0)
The current codebase has some incomplete functions:
- The
extract_content_from_docxfunction indocument_processor.pyhas an exception handler without implementation - The
process_content_with_llmfunction inllm_processor.pyis incomplete when API key is missing - The
create_newsletter_docxfunction is referenced but not fully implemented
- Add error handling for missing API keys
- Implement caching for LLM responses to reduce API costs
- Add user authentication if sensitive information is processed
- Implement template selection for different newsletter formats
- Add content summarization options for lengthy documents