Data Scientist & ML Engineer with an Engineering degree in Applied Mathematics and Statistics from the Institute of Technology of Cambodia (ITC). I specialize in end-to-end ML system development β from feature engineering to scalable model deployment β with real-world impact across energy systems, macroeconomic forecasting, NLP, and policy analytics.
- π Currently engineering Battery Energy Storage Systems (BESS) data pipelines at SchniecTech Group
- π Built a hybrid inflation forecasting model (XGBoost + LSTM + SARIMAX) for the Cambodian Ministry of Planning
- π Former Data Science Instructor β taught ML, EDA, and Power BI to the next generation of analysts
- π Passionate about applying AI to Southeast Asian development challenges
| Area | Tools | Focus |
|---|---|---|
| π¬ Research & Experiments | Jupyter Notebook Β· Google Colab | EDA, model prototyping, thesis work |
| π Production Pipelines | Python Β· Docker Β· PostgreSQL | Data ingestion, transformation, ETL |
| π Dashboards & Reports | Power BI Β· Plotly Β· Data Studio | Business insights, automated reporting |
| π Energy Data Systems | Python Β· EMS/SCADA Β· Pandas | Real-time grid analytics, SOC monitoring |
| π£οΈ NLP Experiments | Hugging Face Β· SpaCy Β· LoRA | Tokenization, fine-tuning, Khmer NLP |
| π Teaching & Mentoring | Python Β· Power BI Β· Slides | Curriculum design, student projects |
Mar 2023 βββ First commit β started exploring data science on GitHub
Apr 2023 βββ Uploaded early EDA notebooks on tourism & agriculture data
Jun 2023 βββ Published Laptop Price Prediction project (SVR winner!)
Sep 2023 βββ Hotel Reservation DB β first SQL schema design on GitHub
Jan 2024 βββ Joined Sunrise Institute β started teaching & documenting
Mar 2024 βββ Crop Yield Prediction β XGBoost + Random Forest pipeline
Jul 2024 βββ Cambodia Tourism Forecasting β ARIMA / SARIMA / Prophet
Oct 2024 βββ Deep dive into NLP β tokenization & Khmer language corpus
Feb 2025 βββ Ministry of Planning β Inflation Forecasting Thesis begins
Apr 2025 βββ Hybrid ML deployed: XGBoost + LSTM + SARIMAX live
Dec 2025 βββ SchniecTech β EMS/SCADA real-time data engineering begins
Apr 2026 βββ Today β actively building, learning, and contributing π
| Language | Level | Context |
|---|---|---|
| π°π Khmer | Native | Mother tongue |
| π¬π§ English | Professional | Academic, work, research writing |
| π«π· French | Basic | Aii Language Center (2019β2022) |
Dec 2025 β Present
- Analyzed high-frequency EMS/SCADA time-series data (Active Power, Frequency, SOC, Voltage, Reactive Power)
- Built real-time interactive dashboards for grid and plant performance monitoring
- Conducted anomaly detection for power fluctuations, voltage deviations, and system faults
- Optimized battery charge/discharge cycles by monitoring State of Charge (SOC) patterns
- Automated daily operational reports for engineering and management decision-making
Feb 2025 β Oct 2025
- Engineered a hybrid inflation forecasting system (XGBoost + LSTM + SARIMAX) to enhance national economic projections
- Built end-to-end data pipelines for macroeconomic indicators including cleaning, feature engineering & stationarity testing
- Analyzed global commodity factors (oil, gold) and domestic sector drivers affecting Cambodian inflation
- Delivered interactive dashboards and policy reports adopted in economic planning decisions
Jan 2024 β Feb 2025
- Taught EDA, statistics, ML, forecasting, and data visualization using Python and Power BI
- Designed hands-on mini-projects bridging theory with real-world datasets
- Mentored students on data storytelling and insight communication to technical and non-technical audiences
A hybrid ML system to forecast national inflation with improved accuracy over traditional models.
- Approach: XGBoost + LSTM + SARIMAX ensemble β combining classical time series with deep learning
- Data: Macroeconomic indicators, global oil & gold prices, domestic sector indices
- Impact: Improved forecast accuracy for national economic planning at the Ministry of Planning
- Tools:
PythonTensorFlowStatsmodelsXGBoostPandasPlotly
Real-time analytics platform for Battery Energy Storage System operations.
- Approach: High-frequency signal analysis (SOC, Frequency, Active Power) with anomaly detection
- Impact: Enabled proactive grid instability detection and optimized charge/discharge efficiency
- Tools:
PythonPandasPower BIEMS/SCADA dataMatplotlib
Time-series forecasting of tourist arrivals with post-COVID-19 recovery trend analysis.
- Approach: Evaluated ARIMA, SARIMA, Prophet, and LSTM; selected best performer via MSE and residual diagnostics
- Impact: Identified seasonal recovery patterns β delivered data-driven recommendations to policymakers
- Tools:
PythonARIMASARIMAProphetLSTMPandasMatplotlib
ML system predicting agricultural yield and recommending optimal crop selection.
- Approach: Random Forest & XGBoost on soil and weather features; recommendation engine built on top
- Metrics: Evaluated using RΒ², MAE, RMSE
- Tools:
PythonScikit-learnXGBoostPandasNumPySeaborn
Regression pipeline to predict laptop prices from hardware specifications.
- Approach: Compared Linear Regression, Random Forest, and SVR β SVR delivered best accuracy
- Pipeline: Web scraping β EDA β Feature Engineering β Model Training β Evaluation
- Tools:
PythonBeautifulSoupScikit-learnPandasMatplotlibSeaborn
Relational database system for end-to-end hotel operations management.
- Scope: Room booking, client management, staff scheduling, payment confirmation
- Design: ER diagrams, normalized relational schemas, primary/foreign key constraints
- Tools:
MySQLSQLERD DesignRelational Modeling
π« Institute of Technology of Cambodia (ITC) β 2020 β 2025
Engineering Degree in Data Science Β· Major: Applied Mathematics and Statistics π Thesis: Analysis and Forecasting of Inflation in Cambodia
π Aii Language Center β 2019 β 2022
English Β· French (Basic)
