Data Scientist & ML Engineer with an Engineering degree in Applied Mathematics and Statistics from the Institute of Technology of Cambodia (ITC). I specialize in end-to-end ML system development — from feature engineering to scalable model deployment — with real-world impact across energy systems, macroeconomic forecasting, NLP, and policy analytics.
- 🔋 Currently engineering Battery Energy Storage Systems (BESS) data pipelines at SchniecTech Group
- 📈 Built a hybrid inflation forecasting model (XGBoost + LSTM + SARIMAX) for the Cambodian Ministry of Planning
- 🎓 Former Data Science Instructor — taught ML, EDA, and Power BI to the next generation of analysts
- 🌏 Passionate about applying AI to Southeast Asian development challenges
| Area | Tools | Focus |
|---|---|---|
| 🔬 Research & Experiments | Jupyter Notebook · Google Colab | EDA, model prototyping, thesis work |
| 🏭 Production Pipelines | Python · Docker · PostgreSQL | Data ingestion, transformation, ETL |
| 📊 Dashboards & Reports | Power BI · Plotly · Data Studio | Business insights, automated reporting |
| 🔋 Energy Data Systems | Python · EMS/SCADA · Pandas | Real-time grid analytics, SOC monitoring |
| 🗣️ NLP Experiments | Hugging Face · SpaCy · LoRA | Tokenization, fine-tuning, Khmer NLP |
| 🎓 Teaching & Mentoring | Python · Power BI · Slides | Curriculum design, student projects |
Mar 2023 ──● First commit — started exploring data science on GitHub
Apr 2023 ──● Uploaded early EDA notebooks on tourism & agriculture data
Jun 2023 ──● Published Laptop Price Prediction project (SVR winner!)
Sep 2023 ──● Hotel Reservation DB — first SQL schema design on GitHub
Jan 2024 ──● Joined Sunrise Institute — started teaching & documenting
Mar 2024 ──● Crop Yield Prediction — XGBoost + Random Forest pipeline
Jul 2024 ──● Cambodia Tourism Forecasting — ARIMA / SARIMA / Prophet
Oct 2024 ──● Deep dive into NLP — tokenization & Khmer language corpus
Feb 2025 ──● Ministry of Planning — Inflation Forecasting Thesis begins
Apr 2025 ──● Hybrid ML deployed: XGBoost + LSTM + SARIMAX live
Dec 2025 ──● SchniecTech — EMS/SCADA real-time data engineering begins
Apr 2026 ──◉ Today — actively building, learning, and contributing 🚀
| Language | Level | Context |
|---|---|---|
| 🇰🇭 Khmer | Native | Mother tongue |
| 🇬🇧 English | Professional | Academic, work, research writing |
| 🇫🇷 French | Basic | Aii Language Center (2019–2022) |
Dec 2025 – Present
- Analyzed high-frequency EMS/SCADA time-series data (Active Power, Frequency, SOC, Voltage, Reactive Power)
- Built real-time interactive dashboards for grid and plant performance monitoring
- Conducted anomaly detection for power fluctuations, voltage deviations, and system faults
- Optimized battery charge/discharge cycles by monitoring State of Charge (SOC) patterns
- Automated daily operational reports for engineering and management decision-making
Feb 2025 – Oct 2025
- Engineered a hybrid inflation forecasting system (XGBoost + LSTM + SARIMAX) to enhance national economic projections
- Built end-to-end data pipelines for macroeconomic indicators including cleaning, feature engineering & stationarity testing
- Analyzed global commodity factors (oil, gold) and domestic sector drivers affecting Cambodian inflation
- Delivered interactive dashboards and policy reports adopted in economic planning decisions
Jan 2024 – Feb 2025
- Taught EDA, statistics, ML, forecasting, and data visualization using Python and Power BI
- Designed hands-on mini-projects bridging theory with real-world datasets
- Mentored students on data storytelling and insight communication to technical and non-technical audiences
A hybrid ML system to forecast national inflation with improved accuracy over traditional models.
- Approach: XGBoost + LSTM + SARIMAX ensemble — combining classical time series with deep learning
- Data: Macroeconomic indicators, global oil & gold prices, domestic sector indices
- Impact: Improved forecast accuracy for national economic planning at the Ministry of Planning
- Tools:
PythonTensorFlowStatsmodelsXGBoostPandasPlotly
Real-time analytics platform for Battery Energy Storage System operations.
- Approach: High-frequency signal analysis (SOC, Frequency, Active Power) with anomaly detection
- Impact: Enabled proactive grid instability detection and optimized charge/discharge efficiency
- Tools:
PythonPandasPower BIEMS/SCADA dataMatplotlib
Time-series forecasting of tourist arrivals with post-COVID-19 recovery trend analysis.
- Approach: Evaluated ARIMA, SARIMA, Prophet, and LSTM; selected best performer via MSE and residual diagnostics
- Impact: Identified seasonal recovery patterns → delivered data-driven recommendations to policymakers
- Tools:
PythonARIMASARIMAProphetLSTMPandasMatplotlib
ML system predicting agricultural yield and recommending optimal crop selection.
- Approach: Random Forest & XGBoost on soil and weather features; recommendation engine built on top
- Metrics: Evaluated using R², MAE, RMSE
- Tools:
PythonScikit-learnXGBoostPandasNumPySeaborn
Regression pipeline to predict laptop prices from hardware specifications.
- Approach: Compared Linear Regression, Random Forest, and SVR — SVR delivered best accuracy
- Pipeline: Web scraping → EDA → Feature Engineering → Model Training → Evaluation
- Tools:
PythonBeautifulSoupScikit-learnPandasMatplotlibSeaborn
Relational database system for end-to-end hotel operations management.
- Scope: Room booking, client management, staff scheduling, payment confirmation
- Design: ER diagrams, normalized relational schemas, primary/foreign key constraints
- Tools:
MySQLSQLERD DesignRelational Modeling
🏫 Institute of Technology of Cambodia (ITC) — 2020 – 2025
Engineering Degree in Data Science · Major: Applied Mathematics and Statistics 📄 Thesis: Analysis and Forecasting of Inflation in Cambodia
🌐 Aii Language Center — 2019 – 2022
English · French (Basic)
