This project aims to optimize the supply chain by predicting weekly sales for various stores using a machine learning approach. We used RandomForestRegressor to model the sales data based on features such as store characteristics, promotions, and external factors. This project provides insights into the drivers of sales and helps in making data-driven decisions for optimizing inventory and supply chain operations.
The objective of this project is to predict the Weekly_Sales for retail stores using historical data, including features such as promotions, store type, holidays, and seasonal factors. By developing a machine learning model, we aim to:
- Provide accurate sales predictions for inventory management.
- Analyze the importance of different features in predicting sales.
- Visualize sales trends and key metrics to derive insights.
The project uses four datasets:
- Features.csv: Contains additional information about the stores, such as promotional events and weather conditions.
- Stores.csv: Contains store-related information, including store type and assortment type.
- Train.csv: Historical sales data for training the model, including
Weekly_Salesfor each store. - Test.csv: Test data for evaluating the model.
Loaded the datasets using Pandas and merged them to create a unified dataset containing all the relevant information for each store and date.
- Merging Datasets: Merged the
features,stores, andtraindatasets. - Handling Missing Values: Filled missing values for numeric columns with their median and for categorical columns with their mode.
- Feature Engineering: Extracted features such as
Year,Month,Week,Day, andDayOfWeekfrom theDatecolumn for better analysis. - Encoding Categorical Variables: One-hot encoded categorical variables to make them suitable for machine learning models.
Used the RandomForestRegressor from scikit-learn to train the model on the processed data. The model was trained to predict the Weekly_Sales based on various features.
The model was evaluated using:
- Mean Squared Error (MSE): To measure the average squared difference between actual and predicted sales.
- R-squared (R²): To measure the proportion of variance explained by the model.
We plotted the top 10 features that contributed most to the model's predictions, providing insights into the factors driving sales.
Used Plotly to enhance visualizations for deeper insights into sales trends and feature impacts. The visualizations include:
- Sales Trend Over Time: A line plot showing
Weekly_Salesover the entire timeframe, which helps understand seasonal patterns and trends. - Sales Distribution by Store Type: A boxplot depicting the distribution of sales by different store types to identify performance variations.
- Python: Core language used for analysis and modeling.
- Pandas: For data manipulation and preprocessing.
- Scikit-Learn: For machine learning modeling and evaluation.
- Seaborn and Plotly: For data visualization.
- Matplotlib: For plotting basic visualizations.
To run this project, you need to install the following dependencies:
pip install pandas numpy scikit-learn matplotlib seaborn plotly- Clone the repository.
- Ensure you have the required datasets in the appropriate directory.
- Run the Python script:
python Supply_Chain_Opt.py- Mean Squared Error (MSE): The error metric used to measure the average squared difference between predicted and actual sales.
- R-squared (R²): Indicates how well the features explain the variance in sales.
The model achieved reasonable accuracy in predicting weekly sales, with the feature importance analysis highlighting key drivers of sales such as promotions, store type, and seasonal factors.
- Hyperparameter Tuning: Improve the model by tuning hyperparameters of the RandomForestRegressor.
- Additional Features: Incorporate external data like economic indicators or regional events to improve model accuracy.
- Optimization: Use optimization techniques to enhance supply chain management based on sales predictions.