This tool uses AI and Large Language Models (LLMs) like LLaMA and Groq to scrape LinkedIn posts and create new, engaging content. It learns from successful posts using few-shot learning and generates posts that match their style. With natural language processing (NLP), it understands the text and context, making the content relevant and personalized in multiple languages. Powered by LLaMA3 and Groq, it creates human-like posts for your LinkedIn needs.

This project consists of two main components:
- LinkedIn Scraper: Collects posts from LinkedIn profiles
- Post Generator: Analyzes existing posts and generates new ones using LLM (Groq/Llama3)
The system uses few-shot learning techniques to understand what makes LinkedIn posts engaging, and generates new content following similar patterns.
-
LinkedIn Post Scraping:
- Automated login and data collection
- Extraction of post text, engagement metrics (likes, comments, shares)
- Profile activity analysis
- Data export in multiple formats (JSON)
-
Post Analysis:
- Automatic language detection (English, Indonesian, Mixed)
- Content tagging and categorization
- Length analysis (Short, Medium, Long)
- Engagement metrics processing
-
AI-Powered Post Generation:
- Topic-based content creation
- Multi-language support (English, Indonesian, Mixed)
- Configurable length and style
- Example-based learning from successful posts
linkedin-post-generator/
├── data/
│ ├── processed_posts.json # Enriched post data with metadata
│ └── raw_posts.json # Raw scraped posts
├── linkedin-scraper/
│ ├── credentials.py # LinkedIn login credentials (not included in repo)
│ ├── linkedin-scraper.py # Scraping logic
│ └── [output files] # Generated during scraping
├── .env # Environment variables for API keys
├── few_shot.py # Few-shot learning implementation
├── llm_helper.py # LLM integration utilities
├── main.py # Application entry point
├── post_generator.py # Post generation logic
├── preprocess.py # Data preprocessing utilities
└── requirements.txt # Python dependencies
-
Clone the repository:
git clone https://github.com/caernations/linkedin-post-generator cd linkedin-post-generator -
Install dependencies:
pip install -r requirements.txt -
Create a
.envfile with your API key:GROQ_API_KEY=your_groq_api_key_here -
For LinkedIn scraping, create a
credentials.pyfile in thelinkedin-scraperdirectory:username = "your_linkedin_email" password = "your_linkedin_password"
-
Update the profile URL in
linkedin-scraper.py:profile_url = 'https://www.linkedin.com/in/target_profile/'
-
Run the scraper:
python linkedin-scraper/linkedin-scraper.py -
The scraper will:
- Login to LinkedIn using your credentials
- Navigate to the specified profile
- Scroll through their activity to load posts
- Extract and save post data to JSON files
To enrich raw posts with metadata:
python preprocess.py
This will:
- Extract language, line count and tags from each post
- Unify and categorize tags
- Save the processed data to
data/processed_posts.json
To generate a LinkedIn post:
from post_generator import generate_post
# Parameters:
# - Length: "Short" (1-5 lines), "Medium" (6-10 lines), "Long" (11-15 lines)
# - Language: "English", "Indonesia", "Mixed"
# - Tag: Topic category like "Personal", "Technology", etc.
post = generate_post("Medium", "Mixed", "Software Engineering")
print(post)Start the main application:
streamlit run main.py
To run this project, ensure you have Python 3.8+ installed and the following Python packages:
selenium– for web automation and scrapingbeautifulsoup4– for parsing HTML contentpandas– for data manipulation and analysispython-dateutil– for flexible date parsinglangchain==0.2.14– for building with LLMslangchain-core==0.2.39– core components for LangChainlangchain-community==0.2.12– community-contributed LangChain moduleslangchain-groq==0.1.9– integration with Groq LLMsstreamlit==1.35.0– for building interactive frontendspython-dotenv– to manage environment variables
- The LinkedIn scraper requires a valid LinkedIn account
- Excessive scraping may lead to LinkedIn rate limiting or account restrictions
- For better post generation, collect at least 50-100 posts as examples
- The quality of generated posts depends on the quality and quantity of scraped examples
- This project uses Llama3-70B via the Groq API