Skip to content

Rotha-101/Rotha-101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cover

CHEA ROTHA

Typing SVG

Building end-to-end intelligent systems — from raw data to real-world impact


LinkedIn GitHub Email Location


🧠 About Me

Data Scientist & ML Engineer with an Engineering degree in Applied Mathematics and Statistics from the Institute of Technology of Cambodia (ITC). I specialize in end-to-end ML system development — from feature engineering to scalable model deployment — with real-world impact across energy systems, macroeconomic forecasting, NLP, and policy analytics.

  • 🔋 Currently engineering Battery Energy Storage Systems (BESS) data pipelines at SchniecTech Group
  • 📈 Built a hybrid inflation forecasting model (XGBoost + LSTM + SARIMAX) for the Cambodian Ministry of Planning
  • 🎓 Former Data Science Instructor — taught ML, EDA, and Power BI to the next generation of analysts
  • 🌏 Passionate about applying AI to Southeast Asian development challenges

🛠️ Tech Stack

💻 Programming & Tools

Python SQL R Java C# React MATLAB

🤖 Machine Learning & AI

TensorFlow Keras Scikit-learn XGBoost LightGBM

📊 Data & Visualization

Pandas NumPy Plotly Matplotlib Power BI Tableau

🗣️ NLP

Hugging Face SpaCy NLTK

🗄️ Databases & DevOps

MySQL PostgreSQL MongoDB Docker Git VS Code Jupyter


🧩 Coding Habits & Work Style

Area Tools Focus
🔬 Research & Experiments Jupyter Notebook · Google Colab EDA, model prototyping, thesis work
🏭 Production Pipelines Python · Docker · PostgreSQL Data ingestion, transformation, ETL
📊 Dashboards & Reports Power BI · Plotly · Data Studio Business insights, automated reporting
🔋 Energy Data Systems Python · EMS/SCADA · Pandas Real-time grid analytics, SOC monitoring
🗣️ NLP Experiments Hugging Face · SpaCy · LoRA Tokenization, fine-tuning, Khmer NLP
🎓 Teaching & Mentoring Python · Power BI · Slides Curriculum design, student projects

Methodology Approach Mindset Domain


🌟 Key Highlights at a Glance


🗓️ My GitHub Journey

Mar 2023  ──●  First commit — started exploring data science on GitHub
Apr 2023  ──●  Uploaded early EDA notebooks on tourism & agriculture data
Jun 2023  ──●  Published Laptop Price Prediction project (SVR winner!)
Sep 2023  ──●  Hotel Reservation DB — first SQL schema design on GitHub
Jan 2024  ──●  Joined Sunrise Institute — started teaching & documenting
Mar 2024  ──●  Crop Yield Prediction — XGBoost + Random Forest pipeline
Jul 2024  ──●  Cambodia Tourism Forecasting — ARIMA / SARIMA / Prophet
Oct 2024  ──●  Deep dive into NLP — tokenization & Khmer language corpus
Feb 2025  ──●  Ministry of Planning — Inflation Forecasting Thesis begins
Apr 2025  ──●  Hybrid ML deployed: XGBoost + LSTM + SARIMAX live
Dec 2025  ──●  SchniecTech — EMS/SCADA real-time data engineering begins
Apr 2026  ──◉  Today — actively building, learning, and contributing 🚀

🔧 Tools & Environments I Love

Jupyter VS Code Google Colab Anaconda Git GitHub Docker Power BI Tableau Looker Studio Google Sheets Excel


🌐 Languages

Language Level Context
🇰🇭 Khmer Native Mother tongue
🇬🇧 English Professional Academic, work, research writing
🇫🇷 French Basic Aii Language Center (2019–2022)

💼 Professional Experience

🔋 Battery Energy Storage System Engineer & Data Analyst — SchniecTech Group

Dec 2025 – Present

  • Analyzed high-frequency EMS/SCADA time-series data (Active Power, Frequency, SOC, Voltage, Reactive Power)
  • Built real-time interactive dashboards for grid and plant performance monitoring
  • Conducted anomaly detection for power fluctuations, voltage deviations, and system faults
  • Optimized battery charge/discharge cycles by monitoring State of Charge (SOC) patterns
  • Automated daily operational reports for engineering and management decision-making

📈 Data Scientist — Ministry of Planning, Cambodia

Feb 2025 – Oct 2025

  • Engineered a hybrid inflation forecasting system (XGBoost + LSTM + SARIMAX) to enhance national economic projections
  • Built end-to-end data pipelines for macroeconomic indicators including cleaning, feature engineering & stationarity testing
  • Analyzed global commodity factors (oil, gold) and domestic sector drivers affecting Cambodian inflation
  • Delivered interactive dashboards and policy reports adopted in economic planning decisions

🎓 Instructor, Data Science — Sunrise Institute

Jan 2024 – Feb 2025

  • Taught EDA, statistics, ML, forecasting, and data visualization using Python and Power BI
  • Designed hands-on mini-projects bridging theory with real-world datasets
  • Mentored students on data storytelling and insight communication to technical and non-technical audiences

🚀 Featured Projects

🌡️ Inflation Forecasting in Cambodia (Thesis)

A hybrid ML system to forecast national inflation with improved accuracy over traditional models.

  • Approach: XGBoost + LSTM + SARIMAX ensemble — combining classical time series with deep learning
  • Data: Macroeconomic indicators, global oil & gold prices, domestic sector indices
  • Impact: Improved forecast accuracy for national economic planning at the Ministry of Planning
  • Tools: Python TensorFlow Statsmodels XGBoost Pandas Plotly

⚡ BESS / EMS Time-Series Analytics

Real-time analytics platform for Battery Energy Storage System operations.

  • Approach: High-frequency signal analysis (SOC, Frequency, Active Power) with anomaly detection
  • Impact: Enabled proactive grid instability detection and optimized charge/discharge efficiency
  • Tools: Python Pandas Power BI EMS/SCADA data Matplotlib

🌏 Tourism Forecasting in Cambodia

Time-series forecasting of tourist arrivals with post-COVID-19 recovery trend analysis.

  • Approach: Evaluated ARIMA, SARIMA, Prophet, and LSTM; selected best performer via MSE and residual diagnostics
  • Impact: Identified seasonal recovery patterns → delivered data-driven recommendations to policymakers
  • Tools: Python ARIMA SARIMA Prophet LSTM Pandas Matplotlib

🌾 Crop Yield Prediction & Recommendation System

ML system predicting agricultural yield and recommending optimal crop selection.

  • Approach: Random Forest & XGBoost on soil and weather features; recommendation engine built on top
  • Metrics: Evaluated using R², MAE, RMSE
  • Tools: Python Scikit-learn XGBoost Pandas NumPy Seaborn

💻 Laptop Price Prediction

Regression pipeline to predict laptop prices from hardware specifications.

  • Approach: Compared Linear Regression, Random Forest, and SVR — SVR delivered best accuracy
  • Pipeline: Web scraping → EDA → Feature Engineering → Model Training → Evaluation
  • Tools: Python BeautifulSoup Scikit-learn Pandas Matplotlib Seaborn

🏨 Hotel Reservation System Database

Relational database system for end-to-end hotel operations management.

  • Scope: Room booking, client management, staff scheduling, payment confirmation
  • Design: ER diagrams, normalized relational schemas, primary/foreign key constraints
  • Tools: MySQL SQL ERD Design Relational Modeling

🎓 Education

🏫 Institute of Technology of Cambodia (ITC)2020 – 2025

Engineering Degree in Data Science · Major: Applied Mathematics and Statistics 📄 Thesis: Analysis and Forecasting of Inflation in Cambodia

🌐 Aii Language Center2019 – 2022

English · French (Basic)


📬 Get In Touch

Typing SVG


Gmail
LinkedIn GitHub
WhatsApp Location

Open to Work   Response Time



contribution snake animation




"Turning complex data into decisions that matter."


Profile Views   Made with

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors