Atlanta, USA
Last updated: 2026-06-26
This repository provides a structured Azure Machine Learning (Azure ML) overview from foundational concepts to advanced MLOps and production operations. It covers the conceptual model, the mathematics, what happens in the backend, the minimum lifecycle stages, and how each piece looks in practice.
It is organized to answer the most common questions first:
- Workspace: the main Azure ML container where data, code, jobs, models, and endpoints live.
- Job: a single executable ML task, such as training, evaluation, or data preparation.
- Environment: the runtime definition that keeps dependencies, packages, and base images reproducible.
- Model: the trained artifact that captures learned patterns from data.
- Endpoint: the serving interface that exposes a model for online or batch predictions.
- MLOps: the operational practices that connect training, deployment, monitoring, and governance.
These definitions are used throughout the guide so the training path stays consistent from the first concept to the deployment examples.
Use the structured training navigation at docs/index.md.
GitHub Pages deployment is wired via deploy-github-pages.yml.
In GitHub, set Settings -> Pages -> Source = GitHub Actions.
This documentation site now builds with MkDocs (Material theme).
Local preview:
pip install mkdocs mkdocs-material mkdocs-mermaid2-plugin pymdown-extensions
mkdocs serveProduction build:
mkdocs build --strictNote on math rendering: GitHub Markdown renders inline math with
$...$and block math with$$...$$. All formulas below use that syntax so they display correctly on GitHub.
- 1) What Azure Machine Learning Is
- 2) Minimum End-to-End ML Stages
- 3) Core Math Foundations
- 4) What Happens in the Backend
- 5) Azure ML Conceptual Architecture
- 6) Basic-to-Advanced Learning Path
- 7) Deployment Patterns
- 8) Monitoring, Reliability, and Model Risk
- 9) Security and Governance Baseline
- 10) Practical Outcome for This Repository
Azure ML is a managed platform to design, train, deploy, and monitor machine learning systems.
At a high level, it provides:
- Workspace: central control plane for assets and runs.
- Compute: managed clusters/instances for training and inference.
- Data assets: versioned references to data sources.
- Model assets: versioned trained artifacts.
- Environments: versioned, reproducible runtime definitions (base image + dependencies).
- Jobs: a unit of executable work (training, sweep, pipeline, command).
- Pipelines: reusable workflow graphs for ML tasks.
- Endpoints: managed online or batch serving interfaces.
- Monitoring: drift, performance, and operational telemetry.
graph TD
WS[Workspace] --> D[Data Assets]
WS --> E[Environments]
WS --> C[Compute]
WS --> J[Jobs / Pipelines]
D --> J
E --> J
C --> J
J --> M[Registered Model]
M --> EP[Endpoint - Online / Batch]
EP --> MON[Monitoring & Drift]
MON -.retrain trigger.-> J
The minimum lifecycle is:
- Problem framing (objective, constraints, KPI).
- Data ingestion and preparation (quality, labels, features).
- Training and validation (experimentation + model selection).
- Registration and packaging (model + environment).
- Deployment (online or batch endpoint).
- Monitoring and iteration (accuracy, latency, drift, retraining).
These stages map directly to Azure ML assets and jobs, enabling reproducibility and governance.
flowchart LR
A[1 Problem Framing] --> B[2 Data Prep]
B --> C[3 Train & Validate]
C --> D[4 Register & Package]
D --> E[5 Deploy]
E --> F[6 Monitor]
F -- drift / decay --> C
Given dataset
Where:
-
$f_{\theta}$ is the model. -
$\mathcal{L}$ is the loss function. -
$N$ is the number of training examples.
- MSE (regression):
- Binary cross-entropy (classification):
Where
-
L2 (Ridge) adds
$\lambda \lVert\theta\rVert_2^2$ . -
L1 (Lasso) adds
$\lambda \lVert\theta\rVert_1$ .
These reduce overfitting and improve generalization for production reliability. The regularized objective becomes:
- Classification — precision, recall, and their harmonic mean:
- Regression — root mean squared error:
When a job is submitted:
- Azure ML resolves the job spec (code, environment, inputs, outputs).
- Compute is allocated or attached.
- Container image/environment is pulled or built.
- Data references are mounted/downloaded to runtime.
- Script/notebook command executes and logs metrics/artifacts.
- Outputs (model, metrics, logs) are persisted in workspace-linked storage.
- Lineage links are created across data, code snapshot, environment, and model.
This backend process is what enables repeatability, auditability, and regulated deployment workflows.
sequenceDiagram
participant U as User / CLI / SDK
participant CP as Control Plane (Workspace)
participant C as Compute
participant S as Storage / Registry
U->>CP: Submit job spec (code + env + data refs)
CP->>C: Allocate / attach compute
CP->>C: Pull or build environment image
S->>C: Mount / download data
C->>C: Execute training script, log metrics
C->>S: Persist model, metrics, logs
CP->>CP: Record lineage (data, code, env, model)
- Workspace metadata
- Asset registry
- Access and role-based governance
- Experiment/run history
- Storage accounts / data lake connectivity
- Compute execution nodes
- Model inference containers/endpoints
- CI/CD for ML (MLOps)
- Monitoring and alerts
- Responsible AI checks
- Security and compliance controls
- Understand ML lifecycle and Azure ML workspace components.
- Run first training experiment on compute instance.
- Track metrics and compare runs.
- Create reusable training pipelines.
- Use data/model versioning and model registry.
- Deploy managed online endpoint with scaling and auth.
- Build full MLOps with CI/CD, approvals, and staged promotion.
- Implement feature engineering pipelines and retraining triggers.
- Add drift detection, canary releases, and rollback strategy.
- Apply responsible AI practices and governance policies.
- Real-time (Online Endpoint): low-latency scoring for APIs/apps.
- Batch Endpoint: scheduled/asynchronous large-scale scoring.
- Edge/Hybrid: deploy packaged models where connectivity is limited.
Trade-off dimensions:
- Latency vs throughput
- Cost vs availability
- Accuracy vs interpretability
Production ML requires both software and statistical observability:
- Operational: CPU/memory, request rate, p95 latency, error rate.
- Model quality: precision/recall/F1, calibration, AUC, RMSE.
- Data quality: schema violations, missingness, outliers.
- Drift:
- Covariate drift: input feature distribution changes over time.
- Concept drift: target relationship changes, so the same inputs map to different outcomes.
These equations indicate distributional change between time windows. In practice, teams detect drift with distance or hypothesis metrics. For example, the Population Stability Index (PSI) across
where
Minimum practices:
- Private networking and controlled ingress/egress.
- Managed identity for compute and data access.
- Secret handling via Key Vault.
- RBAC and least-privilege permissions.
- Model/data lineage with versioned assets.
- Approval gates for production deployment.
This repository is positioned as an Azure ML learning hub covering:
- End-to-end conceptual understanding.
- Mathematical grounding for ML training and evaluation.
- Azure ML backend/runtime behavior.
- Minimum and advanced operational stages for real deployments.