ProfiteerIO is a full-stack, end-to-end e-commerce intelligence platform that empowers users to predict optimal product pricing and boost purchasing behavior using real-time data pipelines, machine learning, and a seamless UI experience.
- Demo
- Introduction
- Hackathon Win
- Architecture
- Dataset
- Installation
- Prediction
- UI Dashboard
- Features
- Future Work
- Contributing
- License
For a TLDR; demo, check out the YouTube video showcasing ProfiteerIO in action.
Setting the right price for your products can be a challenge for e-commerce businesses. Price too high, and you risk losing customers; price too low, and you sacrifice profits. So, how do you find the sweet spot in real-time, adapting to shifting market dynamics, product trends, and customer behavior?
Introducing ProfiteerIO- a powerful tool designed to help e-commerce businesses maximize profits by predicting sales under different pricing scenarios.
With ProfiteerIO, you can:
- Seamlessly sync catalog sales data from diverse sources (CSV, JSON, and more) into a robust database or data warehouse using Airbyte.
- Harness the power of AI by training models to predict purchase volumes based on product attributes and pricing strategies with MindsDB.
- Visualize product performance and forecast future sales trends through an intuitive React dashboard.
- Deploy your entire analytics stack on Kubernetes to any cloud provider (AWS, GCP, Azure) with just a few simple configurations using Helm.
ProfiteerIO was the 1st place winner of the Airbyte+MindsDB Hackathon. The project was recognized for its innovative use of Airbyte and MindsDB to create a full-stack e-commerce intelligence platform.
The entire stack is deployed via a Helm chart with modular subcharts for Airbyte, MindsDB, Postgres, and the app components, enabling One-Click Deployment on Kubernetes.
Below is a comprehensive breakdown of how the platform works as well as an architecture diagram, its key components, and how everything ties together in a single seamless deployment.
| STACK | ROLE |
|---|---|
| FastAPI | Synthesizes realistic e-commerce sales data based on price, rating, and time dynamics. Also exposes REST APIs for analytics and prediction, powering the frontend and Airbyte source. |
| PostgreSQL | Central datastore for all synthesized sales data, analytics results, and ML inputs/outputs. Tuned for fast filtering and aggregation. |
| Airbyte | Periodically extracts generated data from the FastAPI source and loads it into PostgreSQL. Supports scheduled and manual syncs. |
| MindsDB | Connects to PostgreSQL to train and serve machine learning models that forecast purchase volume based on key product features. |
| React | A user-friendly UI that visualizes dashboards, enables catalog interactions, and supports live predictions using slider inputs. |
| NGINX | Routes incoming requests (via Nginx or other reverse proxy) to backend services and enforces secure access and CORS handling. |
-
Synthetic Dataset Generation: FastAPI uses configurable rules to simulate sales events. Parameters like price sensitivity, time-of-day behavior, and product ratings are baked into the generation algorithm.
-
Ingestion & Storage (EL): Airbyte extracts data from FastAPI and loads it into PostgreSQL. You can trigger syncs manually or schedule them hourly/daily.
-
ML Model Training: MindsDB connects directly to PostgreSQL, continuously retraining or updating its prediction models. These models consider fields like price, rating, and hour-of-day to estimate expected purchases.
-
API & Analytics: FastAPI exposes analytics endpoints to serve dashboard stats and product insights. For predictions, it acts as a bridge between the React frontend and MindsDB’s trained models.
-
User Interaction: React queries the backend to render the Dashboard (metrics), Catalog (product management), and Prediction page (interactive sliders for pricing simulation).
The E-Commerce Dynamics Dataset is a meticulously crafted synthetic dataset designed to emulate real-world e-commerce sales behavior. The dataset simulates sales activities across a product catalog over a user-specified time period (in hours).
Each record encapsulates a snapshot of product performance at a given timestamp, factoring in dynamic pricing, consumer engagement (views, cart additions, purchases), and contextual influences like time-of-day demand fluctuations. The synthesis logic leverages probabilistic models and domain-inspired heuristics to ensure realism and variability.
-
Consumer Behavior Simulation: Views, cart additions, and purchases are modeled using a multi-stage funnel, influenced by product ratings, category popularity, and time-of-day demand patterns.
-
Category Popularity Integration: Products are assigned to categories with predefined popularity scores (derived from CategoryPopularity and ratings), impacting visibility and engagement.
-
Rating Evolution: Product ratings (ranging from 2.0 to 5.0) evolve stochastically over time, simulating shifts in consumer sentiment.
-
Time-of-Day Sensitivity: Demand fluctuates across four time buckets—Overnight (22:00–06:00), Morning (06:00–12:00), Afternoon (12:00–18:00), and Evening (18:00–22:00)—mimicking realistic shopping patterns.
-
Granular Temporal Resolution: Sales data is generated hourly, with timestamps in ISO format, enabling fine-grained analysis of temporal trends.
-
Dynamic Pricing Mechanisms: Products follow one of three pricing strategies—aggressive, moderate, or standard—bounded by minimum and maximum price constraints, reflecting real-world pricing variability.
Read More
Each record in the dataset contains the following fields:
timestamp: ISO-formatted timestamp of the sales snapshot.product_id: Unique identifier for the product.product_name: Descriptive name of the product.product_image: Reference to the product’s image (if applicable).category: Product category, linked to a popularity score.category_popularity: Popularity score of the category (0 to 1).base_price: Reference price for the product.current_price: Dynamically determined price based on the pricing strategy.price_strategy: Pricing approach (aggressive, moderate, or standard).popularity_factor: Composite score reflecting category popularity, rating, and device-specific adjustments.time_of_day_bucket: Time bucket influencing demand (Overnight, Morning, Afternoon, Evening).views: Estimated product views, driven by popularity and time factors.cart_adds: Number of times the product was added to carts, based on views and rating.purchases: Number of completed purchases, influenced by cart additions and price sensitivity.rating: Current product rating, subject to periodic updates.
Read More
The dataset is generated using a modular Python-based synthesis logic, incorporating:
-
Pricing Logic: Prices are determined using strategy-specific distributions (e.g., Gaussian for moderate pricing, uniform for aggressive), constrained by product-defined bounds.
-
Engagement Funnel: Views are estimated from base view counts, adjusted by popularity and time-of-day factors. Cart additions and purchases follow, with conversion rates tied to ratings and price sensitivity.
-
Popularity Modeling: A
popularity_factoris calculated as a function of category popularity, product rating, and a randomized device-specific multiplier. -
Stochastic Rating Updates: Ratings are initialized (or inherited from
base_rating) and periodically adjusted with a 10% probability, reflecting dynamic consumer feedback.
- The dataset is generated using the
generate_sales_datafunction, which accepts aSalesRequestobject containing the number of hours and product metadata. - The dataset is synthetic, offering flexibility for experimentation without privacy or proprietary data concerns.
Tip
The recommended way to run the project is to use the provided Helm chart for deployment on Kubernetes. However, if you prefer to run it locally, there are instructions at the end of this section.
Note
If you are using the Helm chart, please ensure you have Docker, Kubectl, Minikube, Helm, Git installed and running on your machine.
Clone the repository to your local machine using the following command
git clone https://www.github.com/qreater/profiteer-io.gitNavigate to the directory, and then follow the instructions below to build the Docker images for each component. (Frontend, Backend) Make sure you are in the minikube context for docker.
cd devops/
docker build -t profiteer-io-backend .. -f ./Dockerfile.backend
docker build -t profiteer-io-frontend .. -f ./Dockerfile.frontendThe Helm chart includes all the necessary components, including Airbyte, MindsDB, PostgreSQL, and the FastAPI backend, as well as an NGINX reverse proxy for routing requests to the appropriate services.
Warning
Ensure you configure the values.yaml file in the helm chart according to your requirements. The default configuration is set up for local development, but you can modify it for production use. The deployment uses about 7Gi of memory and 2 CPUs for the overall stack, with MindsDB demanding the most resources.
cd devops/helm/
kubectl create namespace profiteer
helm install profiteer-io . --namespace profiteerOnce the deployment is complete, you can port-forward the services individually, or the NGINX service to access the frontend and backend. K9s is a great tool to visualize the services and their ports.
After this, the stack will be up and running. You can proceed to follow with the flow of the application, starting with the Airbyte connection to the FastAPI source.
Read More
Clone the repository to your local machine using the following command
git clone https://www.github.com/qreater/profiteer-io.gitMake sure you have PostgreSQL, Airbyte, MindsDB, and Docker accessible on your machine. You can use Docker to run PostgreSQL and Airbyte locally, or use cloud versions of these services.
Setting up FastAPI Service
Create a .env file in the /backend directory and add the following environment variables, configure as needed:
MINDSDB_URL=http://localhost:47334
API_KEY=OPEN_SESAME
DB_NAME=mindsdb
DB_USER=mindsdb
DB_PASSWORD=mindsdb
DB_HOST=localhost
DB_PORT=5432
TABLE_NAME=Sales_DataCreate a virtual environment and follow the instructions below to install the dependencies and run the backend. Ensure you have Python 3.12, Pip installed in your machine.
cd backend
pip install poetry
poetry install
poetry run uvicorn main:app --reload --port 8000Now, follow the technical workflow to set up the Airbyte connection to the FastAPI source. Sync the data to PostgreSQL and set up the MindsDB connection to the PostgreSQL database.
Then proceed to train the model using the SQL commands provided in the Prediction section.
Airbyte Connection
Set up a connection between Sales API and Postgres in Airbyte User Interface. Use the following configuration for the connection to the FastAPI source:
{
"hours": 48,
"products": [
{
"category": "Laptops",
"max_price": 1400,
"min_price": 1000,
"base_price": 1200,
"product_id": "P001",
"base_rating": 4.5,
"product_name": "UltraBook X1",
"product_image": "https://example.com/images/ultrabook-x1.jpg",
"price_strategy": "aggressive"
},
...Add more products as needed
]
}Set up MindsDB Connection
Log in to MindsDB and create a new connection to the Postgres database. Use the following credentials, change them if necessary.
CREATE DATABASE profiteer_data
WITH ENGINE = 'postgres',
PARAMETERS = {
"host": "postgresql.profiteer",
"port": 5432,
"user": "mindsdb",
"password": "mindsdb",
"database": "mindsdb",
"schema": "public"
};- Train the model
Use the following command to train the model in MindsDB:
CREATE PREDICTOR mindsdb.sales_forecast
FROM profiteer_data
(SELECT
views AS input_views,
cart_adds,
purchases,
(current_price::float / NULLIF(base_price, 0)) AS price_ratio,
popularity_factor
FROM "Sales_Data")
PREDICT purchases
WINDOW 24
HORIZON 1;Try predicting purchases
Use the following command to try the prediction in MindsDB:
SELECT * FROM mindsdb.sales_forecast
WHERE input_views = 24416.75
AND cart_adds = 1521.19
AND price_ratio = 0.95
AND popularity_factor = 1.51;
DESCRIBE model sales_forecast;Setting up React Frontend
Create a .env file in the /frontend directory and add the following environment variables, configure as needed:
VITE_API_URL=http://localhost:8000/
VITE_API_KEY=OPEN_SESAMENow install the dependencies and run the frontend. Ensure you have Node.js and NPM installed in your machine.
cd frontend
npm install
npm run devMindsDB is used to predict the number of purchases based on the features provided in the dataset. The prediction model is trained using the purchases column as the target variable and the selected features as input variables. The model uses a time window of 24 hours and a horizon of 1 hour to make predictions.
Tip
The prediction model is designed to be un-biased and robust, ensuring that the predictions are not influenced by any specific product or category.
| Field Name | Description |
|---|---|
views |
The views received by products |
cart_adds |
The number of times products were added to the cart |
popularity_factor |
The average popularity of the product categories, calculated with current price and ratings |
price_ratio |
The average ratio of current price to base price, rounded to the nearest whole number |
The UI dashboard is built using React and provides a user-friendly interface to visualize the predictions made by MindsDB. The dashboard includes the following features:
-
Product Catalog: A list of products with their details, including product name, image, category, and current price.
-
Product Details and Prediction: A detailed view of each product, including its predicted purchases based on the price set.
-
Overall Statistics: A summary of the overall statistics, including the total purchases, top products and total revenue.
-
Scheduled Data Syncs: Seamlessly sync your catalog sales data from various sources into a robust database or data warehouse using Airbyte.
-
AI-Powered Predictions: Leverage MindsDB to train models that predict purchase volumes based on product attributes and pricing strategies.
-
Interactive Dashboard: Visualize product performance and forecast future sales trends through an intuitive React dashboard.
-
One-Click Deployment: Deploy your entire analytics stack on Kubernetes to any cloud provider (AWS, GCP, Azure) with just a few simple configurations using Helm.
-
Modular Architecture: The architecture is designed to be modular, allowing for easy integration of new components and services as needed.
-
Stock Management: Integrate stock management features to track inventory levels and optimize stock replenishment based on sales predictions.
-
Broader Category Support: Expand the dataset to include a wider range of product categories and attributes, enhancing the model's predictive capabilities.
-
Real-Time Data Ingestion: Implement near-real-time data ingestion capabilities to ensure that the dataset is always up-to-date with the latest sales data.
-
World Events: Integrate world events and trends into the dataset to better understand their impact on sales and consumer behavior.
We welcome contributions to this project! If you have any suggestions or improvements, please feel free to open an issue or submit a pull request. Here are the tools and utilities we used to build this project, and we encourage you to use them as well:
| CATEGORY | TOOL |
|---|---|
| Dev Cycle | GitHub Issues + Pull Requests |
| CI/CD | GitHub Actions for PR test runs |
| Design | Figma for UI/UX mockups, logo |
| Frontend Style | prettier & eslint for linting and formatting |
| Backend Style | black for Python code formatting |
| Deployments | Docker, Kubernetes, Helm |
| Asset Generation | DALL·E for product imagery and creative assets |
Tip
This project adheres to modern developer workflows and automation principles. It includes CI pipelines, standardized formatting tools, and a collaborative GitHub-based review process.
This project is licensed under the MIT License. See the LICENSE file for more details.



