- Query Classifier: Routes queries to appropriate analysis paths
- Country Analyzer: Analyzes patterns across different countries
- Shark Profiler: Profiles individual shark investment strategies
- Industry Analyzer: Identifies hot industries and success patterns
- Pitch Evaluator: Evaluates specific pitches and ideas
- ML Analyzer: Advanced machine learning predictions
- Success Predictor: Predicts deal success probability
- Recommendation Engine: Generates actionable recommendations
- Machine Learning Models: Random Forest classifier for success prediction
- Interactive Visualizations: Plotly charts for data exploration
- Real-time Analysis: Live processing of pitch queries
- Comprehensive Reports: Detailed markdown reports with insights
- Natural Language Queries: Ask questions in plain English
- File Upload Support: Upload pitch decks, business plans, or data files
- Chat History: Track all previous analyses
- Export Functionality: Download reports and analysis data
Sharktank_GPT_Streamlit/
├── streamlit_app.py # Main Streamlit application
├── langgraph_workflow.py # LangGraph workflow implementation
├── groq_integration.py # Groq LLM integration
├── advanced_analysis.py # ML and advanced analytics
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
├── README.md # This file
├── Shark Tank US dataset.csv # US dataset
├── Shark Tank India.csv # India dataset
├── Shark Tank Australia dataset.csv # Australia dataset
└── shark_tank_merged.csv # Merged dataset
- Python 3.8 or higher
- Required CSV dataset files in the project directory
- Groq API key (get from https://console.groq.com/)
-
Clone or download the project files:
git clone https://github.com/yourusername/Sharktank_GPT_Streamlit.git cd Sharktank_GPT_Streamlit -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables by creating a
.envfile in the project root:GROQ_API_KEY=your_groq_api_key_here LANGFUSE_SECRET_KEY=sk-lf-your_secret_key_here # Optional LANGFUSE_PUBLIC_KEY=pk-lf-your_public_key_here # Optional LANGFUSE_BASE_URL=https://cloud.langfuse.com # Optional LANGFUSE_ENABLED=false # Optional
-
Ensure all CSV dataset files are in the project root directory:
Shark Tank US dataset.csvShark Tank India.csvShark Tank Australia dataset.csvshark_tank_merged.csv
-
Run the application:
streamlit run streamlit_app.py
-
Open your browser to
http://localhost:8501
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On Mac/Linux:
source venv/bin/activate
- On Windows:
-
Install dependencies:
pip install -r requirements.txt
-
Follow steps 3-6 from Option 1
The app requires environment variables to be set. Create a .env file in the root directory:
GROQ_API_KEY=your_groq_api_key_here
LANGFUSE_SECRET_KEY=sk-lf-... # Optional, for observability
LANGFUSE_PUBLIC_KEY=pk-lf-... # Optional, for observability
LANGFUSE_BASE_URL=https://cloud.langfuse.com # Optional
LANGFUSE_ENABLED=false # Set to true to enable LangfuseImportant: Never commit your .env file to version control. It's already in .gitignore.
Langfuse Setup (Optional): To enable observability with Langfuse:
- Sign up at https://cloud.langfuse.com
- Create a new project
- Get your API keys from Settings → API Keys
- Add them to your
.envfile and setLANGFUSE_ENABLED=true
Edit config.py to customize:
- Analysis thresholds
- Visualization colors
- File upload limits
- Report settings
-
Pitch Analysis:
"I want to pitch a food tech startup asking for $500k for 15% equity" -
Industry Research:
"What are the most successful industries in Shark Tank?" -
Shark Comparison:
"Compare investment patterns between US and India sharks" -
Success Prediction:
"Analyze my pitch: AI-powered fitness app, $1M ask, 20% equity"
- Upload CSV files with pitch data
- Upload text files with business descriptions
- Upload markdown files with pitch decks
- Probability Score: 0-100% success likelihood
- Confidence Level: Model confidence in prediction
- Success Level: High/Medium/Low classification
- ML Models: Random Forest with feature importance
- Country Analysis: Success rates by country
- Industry Trends: Hot industries and success patterns
- Shark Profiles: Individual investment strategies
- Gender Analysis: Investment patterns by gender
- Equity Analysis: Optimal equity ranges
- Valuation Checks: Reasonable ask amounts
- Industry Risks: Sector-specific challenges
- Market Factors: External risk considerations
- Interactive Charts: Plotly-powered visualizations
- Country Comparison: Success rates and metrics
- Industry Analysis: Performance by sector
- Shark Profiles: Investment patterns
- Trend Analysis: Historical patterns
- Executive summary with key metrics
- Detailed country and industry analysis
- Shark investment profiles
- Success factors and risk assessment
- Actionable recommendations
- Downloadable markdown format
- GitHub account with repository set up
- Streamlit Cloud account (sign up at https://share.streamlit.io)
- Groq API key
-
Initialize git repository (if not already done):
git init
-
Add all files:
git add . -
Commit changes:
git commit -m "Ready for Streamlit deployment" -
Create a new repository on GitHub (if not exists):
- Go to https://github.com and sign in
- Click "+" icon → "New repository"
- Repository name:
Sharktank_GPT_Streamlit - Choose Public (for free Streamlit Cloud) or Private
- DO NOT initialize with README, .gitignore, or license
- Click "Create repository"
-
Connect and push to GitHub:
git remote add origin https://github.com/YOUR_USERNAME/Sharktank_GPT_Streamlit.git git branch -M main git push -u origin main
Note: If you get authentication errors, use a GitHub Personal Access Token:
- GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
- Generate new token with
repopermissions - Use token as password when pushing
-
Go to https://share.streamlit.io and sign in with GitHub
-
Click "New app"
-
Configure the app:
- Select your repository:
Sharktank_GPT_Streamlit - Select branch:
main(ormaster) - Main file path:
streamlit_app.py
- Select your repository:
-
Add Secrets (API keys):
- Go to app settings → Secrets
- Add your environment variables:
GROQ_API_KEY = "your_groq_api_key_here" LANGFUSE_SECRET_KEY = "sk-lf-..." # Optional LANGFUSE_PUBLIC_KEY = "pk-lf-..." # Optional LANGFUSE_BASE_URL = "https://cloud.langfuse.com" LANGFUSE_ENABLED = "false"
-
Click "Deploy" and wait 2-5 minutes
-
Your app will be live at:
https://your-app-name.streamlit.app
Make sure your CSV files are committed to the repository:
Shark Tank US dataset.csvShark Tank India.csvShark Tank Australia dataset.csvshark_tank_merged.csv
These files should be in the root directory of your repository and are required for the app to function.
Your requirements.txt file is already configured with all necessary dependencies. Streamlit Cloud will automatically install them.
- Streamlit Cloud provides free tier with 1GB RAM
- For heavy ML workloads, consider upgrading to paid tier
- The app loads datasets at startup, so initial load may take a few seconds
The app uses .env files locally, but on Streamlit Cloud, use the Secrets feature instead. The python-dotenv package will read from Streamlit secrets automatically.
To update your deployed app:
- Make changes to your code
- Commit and push to GitHub:
git add . git commit -m "Update description" git push origin main
- Streamlit Cloud will automatically redeploy
- You can also manually trigger redeploy from the app settings
-
Import Errors:
pip install --upgrade -r requirements.txt
-
Groq API Errors:
- Check your internet connection
- Verify the API key in
.envfile or Streamlit Cloud secrets - Get your API key from https://console.groq.com/
-
File Not Found Errors:
- Verify CSV files are in the repository root directory
- Check file names match exactly (case-sensitive)
- Ensure files are committed to GitHub
-
API Key Errors:
- Verify secrets are set correctly in Streamlit Cloud
- Check that keys don't have extra spaces or quotes
- Ensure
.envfile exists locally with correct keys
-
Memory Issues:
- Dataset files might be too large for free tier
- Consider using data caching (already implemented in the code)
- Close other applications if running locally
-
Port Already in Use:
streamlit run streamlit_app.py --server.port 8502
-
Virtual Environment Issues:
- Ensure virtual environment is activated
- Reinstall packages:
pip install -r requirements.txt - Verify Python version:
python --version(should be 3.8+)
-
Git Authentication Issues:
- Use GitHub Personal Access Token instead of password
- Or use SSH:
git@github.com:YOUR_USERNAME/Sharktank_GPT_Streamlit.git
- Go to your app in Streamlit Cloud
- Click the menu (three dots) in the top right
- Select "Manage app"
- View logs for error messages
If CSV files are too large (>100MB):
- Consider using Git LFS:
git lfs install && git lfs track "*.csv" - Or upload datasets to cloud storage and load from URL
DO:
- Use Streamlit Secrets for all API keys on Streamlit Cloud
- Keep your
.gitignoreupdated - Never commit
.envfiles - Review your code before pushing to GitHub
- Use environment variables instead of hardcoded values
DON'T:
- Hardcode API keys in your code
- Commit sensitive data to GitHub
- Share your repository secrets publicly
- Use production API keys in development
- Python 3.8 or higher
- 4GB RAM minimum (recommended)
- Internet connection for Groq API
- Modern web browser
All required packages are listed in requirements.txt:
- streamlit
- pandas
- numpy
- plotly
- langgraph
- langchain
- langchain-groq
- groq
- scikit-learn
- xgboost
- shap
- python-dotenv
- langfuse
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source and available under the MIT License.
For issues or questions:
- Check the troubleshooting section above
- Review the code comments
- Open an issue on GitHub
- Check Streamlit Cloud documentation: https://docs.streamlit.io/streamlit-cloud
- Visit Streamlit Community: https://discuss.streamlit.io
- Real-time data updates
- Additional ML models
- API integration
- Mobile app version
- Advanced NLP features
- Custom dashboard creation
Built with LangGraph, Streamlit, and Python