aws-samples · dlptv · Feb 26, 2026 · Feb 26, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,6 @@
 
+.vscode/
+
 examples/conda-env-kernel-image/.idea/conda-env-kernel-image.iml
 examples/conda-env-kernel-image/.idea/inspectionProfiles/profiles_settings.xml
 examples/conda-env-kernel-image/.idea/modules.xml

diff --git a/examples/pytorch-computer-vision/Dockerfile b/examples/pytorch-computer-vision/Dockerfile
@@ -0,0 +1,23 @@
+# PyTorch Computer Vision Custom Image for SageMaker
+# Works with: Studio V2 and Unified Studio (JupyterLab & Code Editor)
+#
+# Single image for all platforms -- entrypoint is configured via AppImageConfig:
+#   Studio V2 JupyterLab:       ["jupyter-lab"]
+#   Unified Studio JupyterLab:  ["entrypoint-sagemaker-ui-jupyter-server"]
+#   Code Editor (both):         ["entrypoint-code-editor"]
+
+FROM public.ecr.aws/sagemaker/sagemaker-distribution:3.5.1-cpu
+
+ARG NB_USER="sagemaker-user"
+ARG NB_UID=1000
+ARG NB_GID=100
+
+USER root
+
+COPY requirements.txt /tmp/requirements.txt
+RUN pip install --no-cache-dir -r /tmp/requirements.txt && \
+    rm /tmp/requirements.txt
+
+USER $NB_USER
+
+# Do NOT add ENTRYPOINT -- configured via AppImageConfig at attach time
diff --git a/examples/pytorch-computer-vision/README.md b/examples/pytorch-computer-vision/README.md
@@ -0,0 +1,297 @@
+# PyTorch Computer Vision Custom Image
+
+A single custom image with PyTorch and computer vision libraries that works across SageMaker Studio V2 and Unified Studio — JupyterLab and Code Editor on both platforms.
+
+## What's included
+
+The base image ([SageMaker Distribution](https://github.com/aws/sagemaker-distribution) 3.5.1) already includes PyTorch, OpenCV, NumPy, and Pandas. This example adds:
+
+- **albumentations** - Image augmentation library
+- **timm** - Pre-trained vision models (ViT, EfficientNet, ConvNeXt, etc.)
+- **ultralytics** - YOLOv8 for object detection and segmentation
+
+## How custom images work
+
+SageMaker uses a chain of resources to go from a Docker image to a selectable environment in the Studio console:
+
+```
+Dockerfile → ECR image → SageMaker Image → Image Version → AppImageConfig → Domain attachment
+```
+
+- **ECR image**: Your Docker image, pushed to Amazon ECR.
+- **SageMaker Image**: A named resource that acts as a pointer. It can have multiple versions.
+- **Image Version**: Links a specific ECR image digest to the SageMaker Image. When you rebuild and push, you create a new version.
+- **AppImageConfig**: Tells SageMaker how to _run_ the container — which entrypoint to use, what arguments to pass. This is where you configure the IDE (JupyterLab vs Code Editor) and the platform (Studio V2 vs Unified Studio).
+- **Domain attachment**: An `update-domain` call that makes the image available to users when they create a new space.
+
+You create the SageMaker Image and AppImageConfig once. After that, updating your image is just: rebuild, push, create a new Image Version, restart your space.
+
+## Dockerfile rules
+
+The Dockerfile in this example is minimal by design. Here are the constraints that matter:
+
+**Use the `sagemaker-distribution` base image.** This is required for Unified Studio — it ships the `entrypoint-sagemaker-ui-jupyter-server` and `entrypoint-code-editor` scripts. For Studio V2 only, other base images work, but `sagemaker-distribution` is the only one that covers all platforms.
+
+**Do NOT set ENTRYPOINT.** The entrypoint varies by platform and IDE. By leaving it unset in the Dockerfile, each `AppImageConfig` specifies the right one at launch time. This is what allows a single image to work across all four combinations.
+
+**Pin numpy to match the base image.** The base image ships with `numpy==1.26.4`. Many pre-installed packages (the SageMaker Python SDK, autogluon, catboost) have `numpy<2` upper bounds. If one of your dependencies pulls in numpy 2.x as a transitive dependency, the build will succeed but imports will fail at runtime with cryptic errors. Always pin:
+
+```
+numpy==1.26.4   # must match base image
+```
+
+**Don't override AWS SDK packages.** Do not include `boto3`, `botocore`, or `sagemaker` in your `requirements.txt`. They come from the base image with specific version constraints. Overriding them can break the SageMaker SDK and extension loading.
+
+**Switch to root for installs, then back.** The base image runs as `sagemaker-user` (UID 1000). You need `USER root` to install packages, then `USER $NB_USER` at the end. SageMaker expects the container to run as a non-root user.
+
+## Key design: single image, multiple platforms
+
+A single Docker image works across all four platform-IDE combinations. The only difference is the `AppImageConfig`, which sets the entrypoint:
+
+| Platform | IDE | Entrypoint | Why this one |
+|----------|-----|-----------|-------------|
+| Studio V2 | JupyterLab | `jupyter-lab` | Raw JupyterLab binary. Needs explicit `--ServerApp.ip=0.0.0.0` and `--ServerApp.token=` because the SageMaker health check connects over the network without authentication. |
+| Studio V2 | Code Editor | `entrypoint-code-editor` | Wrapper script from the base image. Handles configuration automatically. |
+| Unified Studio | JupyterLab | `entrypoint-sagemaker-ui-jupyter-server` | Wrapper that configures Unified Studio's auth model (`MD_IAM`). Using `jupyter-lab` here causes `InternalFailure`. |
+| Unified Studio | Code Editor | `entrypoint-code-editor` | Same wrapper as Studio V2 — it detects the environment and adapts. |
+
+All three entrypoints ship with the `sagemaker-distribution` base image. Code Editor uses the same config on both platforms. JupyterLab does not — this is the detail most likely to cause a failed deployment.
+
+## Prerequisites
+
+- **AWS CLI v2** installed and configured
+- **Docker** installed and running locally
+- **A SageMaker execution role** — needs permissions to pull from ECR (`ecr:GetDownloadUrlForLayer`, `ecr:BatchGetImage`, `ecr:BatchCheckLayerAvailability`). To find your existing role: SageMaker console → Domains → select your domain → Domain settings → the "Execution role" field. It looks like `arn:aws:iam::<account>:role/service-role/AmazonSageMaker-ExecutionRole-...`
+- **A SageMaker domain** (Studio V2, Unified Studio, or both)
+
+## Setup
+
+Run this block first. The `aws sts` call is last because it is slow and can cause shells to drop pasted lines that follow it:
+
+```bash
+export AWS_REGION=us-west-2          # change to your region
+export ECR_REPO=sagemaker-custom-images/pytorch-cv
+export IMAGE_NAME=pytorch-cv
+export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
+echo "Account: $AWS_ACCOUNT_ID  Region: $AWS_REGION  Repo: $ECR_REPO"
+```
+
+Verify the output shows the correct values before proceeding.
+
+## Build and push
+
+Run from the `examples/pytorch-computer-vision/` directory:
+
+```bash
+# Create ECR repository (skip if it already exists)
+aws ecr create-repository --repository-name $ECR_REPO --region $AWS_REGION || true
+
+# Build (--platform linux/amd64 is required when building on Apple Silicon)
+docker build --platform linux/amd64 -t $IMAGE_NAME .
+
+# Login to ECR
+aws ecr get-login-password --region $AWS_REGION | \
+    docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
+
+# Tag and push
+docker tag $IMAGE_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/${ECR_REPO}:latest
+docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/${ECR_REPO}:latest
+```
+
+## Register with SageMaker
+
+This creates the SageMaker Image, an Image Version pointing to your ECR image, and three AppImageConfigs (one per entrypoint):
+
+```bash
+export ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/YourSageMakerRole"  # change to your execution role
+
+# Create SageMaker Image — a named resource that holds versions
+aws sagemaker create-image \
+    --image-name $IMAGE_NAME \
+    --role-arn $ROLE_ARN \
+    --region $AWS_REGION
+
+# Create Image Version — links the ECR digest to the SageMaker Image
+aws sagemaker create-image-version \
+    --image-name $IMAGE_NAME \
+    --base-image $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/${ECR_REPO}:latest \
+    --region $AWS_REGION
+
+# AppImageConfig for Studio V2 JupyterLab
+# ip=0.0.0.0: health check connects over the network, not localhost
+# token=: health check doesn't pass auth tokens
+# no-browser: container has no display
+aws sagemaker create-app-image-config \
+    --app-image-config-name ${IMAGE_NAME}-jl \
+    --jupyter-lab-app-image-config '{
+        "ContainerConfig": {
+            "ContainerEntrypoint": ["jupyter-lab"],
+            "ContainerArguments": ["--ServerApp.base_url=/jupyterlab/default", "--ServerApp.ip=0.0.0.0", "--ServerApp.token=", "--no-browser"]
+        }
+    }' \
+    --region $AWS_REGION
+
+# AppImageConfig for Code Editor (works on both Studio V2 and Unified Studio)
+aws sagemaker create-app-image-config \
+    --app-image-config-name ${IMAGE_NAME}-ce \
+    --code-editor-app-image-config '{
+        "ContainerConfig": {
+            "ContainerEntrypoint": ["entrypoint-code-editor"]
+        }
+    }' \
+    --region $AWS_REGION
+
+# AppImageConfig for Unified Studio JupyterLab
+# Uses a different entrypoint that handles Unified Studio's auth model (MD_IAM)
+aws sagemaker create-app-image-config \
+    --app-image-config-name ${IMAGE_NAME}-unified-jl \
+    --jupyter-lab-app-image-config '{
+        "ContainerConfig": {
+            "ContainerEntrypoint": ["entrypoint-sagemaker-ui-jupyter-server"]
+        }
+    }' \
+    --region $AWS_REGION
+```
+
+## Attach to a domain
+
+These `update-domain` calls make the image selectable when users create a new space. Note: `CustomImages` replaces the existing list — if you have other custom images, include them all.
+
+### Studio V2
+
+To find your domain ID: SageMaker console → Domains → select your domain → the `d-xxxxxxxxxxxx` in the URL or the "Domain ID" field.
+
+```bash
+export STUDIO_DOMAIN_ID=d-xxxxxxxxxxxx   # change to your Studio domain ID
+
+aws sagemaker update-domain \
+    --domain-id $STUDIO_DOMAIN_ID \
+    --region $AWS_REGION \
+    --default-user-settings '{
+        "JupyterLabAppSettings": {
+            "CustomImages": [{
+                "ImageName": "'"$IMAGE_NAME"'",
+                "AppImageConfigName": "'"${IMAGE_NAME}-jl"'"
+            }]
+        },
+        "CodeEditorAppSettings": {
+            "CustomImages": [{
+                "ImageName": "'"$IMAGE_NAME"'",
+                "AppImageConfigName": "'"${IMAGE_NAME}-ce"'"
+            }]
+        }
+    }'
+```
+
+### Unified Studio
+
+The SageMaker domain is provisioned automatically by the DataZone Tooling environment. To find it: DataZone console → your domain → your project → Tooling environment → Provisioned resources → `sageMakerDomainId`.
+
+```bash
+export UNIFIED_DOMAIN_ID=d-xxxxxxxxxxxx    # change to your Unified Studio domain ID
+
+aws sagemaker update-domain \
+    --domain-id $UNIFIED_DOMAIN_ID \
+    --region $AWS_REGION \
+    --default-user-settings '{
+        "JupyterLabAppSettings": {
+            "CustomImages": [{
+                "ImageName": "'"$IMAGE_NAME"'",
+                "AppImageConfigName": "'"${IMAGE_NAME}-unified-jl"'"
+            }]
+        },
+        "CodeEditorAppSettings": {
+            "CustomImages": [{
+                "ImageName": "'"$IMAGE_NAME"'",
+                "AppImageConfigName": "'"${IMAGE_NAME}-ce"'"
+            }]
+        }
+    }'
+```
+
+## Updating your image
+
+After changing `requirements.txt` or the Dockerfile, rebuild and push a new version:
+
+```bash
+docker build --platform linux/amd64 -t $IMAGE_NAME .
+docker tag $IMAGE_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/${ECR_REPO}:latest
+docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/${ECR_REPO}:latest
+
+aws sagemaker create-image-version \
+    --image-name $IMAGE_NAME \
+    --base-image $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/${ECR_REPO}:latest \
+    --region $AWS_REGION
+```
+
+You do not need to re-create the AppImageConfig or re-attach to the domain. Restart any running spaces to pick up the new version.
+
+## Validate
+
+After launching a space with your custom image:
+
+```bash
+python validate.py
+```
+
+## Troubleshooting
+
+### Common failures
+
+| Symptom | Cause | Fix |
+|---------|-------|-----|
+| `InternalFailure`, no further detail | Image built on ARM (Apple Silicon) without `--platform linux/amd64` | Rebuild with `docker build --platform linux/amd64` |
+| `InternalFailure` on Studio V2 JupyterLab | Missing `--ServerApp.ip=0.0.0.0` or `--ServerApp.token=` in ContainerArguments | Add both args to the `-jl` AppImageConfig |
+| `InternalFailure` on Unified Studio JupyterLab | Used `jupyter-lab` entrypoint instead of `entrypoint-sagemaker-ui-jupyter-server` | Create a separate AppImageConfig with the correct entrypoint |
+| App stuck in "Pending" on Unified Studio, then fails | Missing VPC interface endpoints (Unified Studio domains are VpcOnly) | Create endpoints for `sagemaker.api`, `sts`, `sagemaker.runtime`, and `logs` in the domain's VPC with its security group |
+| `ImportError` at runtime for numpy, catboost, or sagemaker SDK | numpy upgraded to 2.x by a transitive dependency | Pin `numpy==1.26.4` in `requirements.txt` |
+| Image builds but packages fail to import | Overriding AWS SDK packages (boto3, botocore, sagemaker) | Remove them from `requirements.txt` — use the base image versions |
+| `docker push` goes to wrong ECR repo in zsh | zsh interprets `$VAR:latest` as history modifier `:l` + literal `atest` | Use `${VAR}:latest` (curly braces) |
+
+### Reading container logs
+
+When an app fails, check CloudWatch:
+
+- **Log group**: `/aws/sagemaker/studio`
+- **Log stream**: `<domain-id>/<space-name>/<app-type>/default`
+
+What to look for:
+- `Environment is Environment.STUDIO_IAM` — confirms Studio V2 detected correctly
+- `Environment is Environment.MD_IAM` — confirms Unified Studio detected correctly
+- `Serving notebooks from local directory` — JupyterLab started successfully
+- `received signal 15` — SageMaker killed the container after health check timeout
+
+## Clean up
+
+Delete resources in reverse order to avoid dependency errors:
+
+```bash
+# 1. Delete any running apps and spaces first (via the Studio console or CLI)
+
+# 2. Remove the image from domain(s)
+aws sagemaker update-domain \
+    --domain-id $STUDIO_DOMAIN_ID \
+    --region $AWS_REGION \
+    --default-user-settings '{
+        "JupyterLabAppSettings": { "CustomImages": [] },
+        "CodeEditorAppSettings": { "CustomImages": [] }
+    }'
+
+# 3. Delete AppImageConfigs
+aws sagemaker delete-app-image-config --app-image-config-name ${IMAGE_NAME}-jl --region $AWS_REGION
+aws sagemaker delete-app-image-config --app-image-config-name ${IMAGE_NAME}-ce --region $AWS_REGION
+aws sagemaker delete-app-image-config --app-image-config-name ${IMAGE_NAME}-unified-jl --region $AWS_REGION
+
+# 4. Delete image version and image
+aws sagemaker delete-image-version --image-name $IMAGE_NAME --version-number 1 --region $AWS_REGION
+aws sagemaker delete-image --image-name $IMAGE_NAME --region $AWS_REGION
+
+# 5. Delete the ECR repository if no longer needed
+aws ecr delete-repository --repository-name $ECR_REPO --region $AWS_REGION --force
+```
+
+## Related resources
+
+- [Custom image specs for Studio V2](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-byoi-specs.html)
+- [Dockerfile specs for Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/byoi-specifications.html)
+- [SageMaker Distribution on GitHub](https://github.com/aws/sagemaker-distribution)
diff --git a/examples/pytorch-computer-vision/requirements.txt b/examples/pytorch-computer-vision/requirements.txt
@@ -0,0 +1,7 @@
+albumentations==1.4.3
+timm==1.0.3
+ultralytics==8.2.0
+
+# Pin numpy to the version in the base image to avoid breaking
+# sagemaker SDK, autogluon, catboost, and other pre-installed packages
+numpy==1.26.4
diff --git a/examples/pytorch-computer-vision/validate.py b/examples/pytorch-computer-vision/validate.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+"""Verify that all computer vision packages are installed correctly."""
+
+import sys
+
+
+def check_package(name, import_name=None):
+    if import_name is None:
+        import_name = name.replace("-", "_")
+    try:
+        module = __import__(import_name)
+        version = getattr(module, "__version__", "unknown")
+        print(f"  [OK] {name}: {version}")
+        return True
+    except ImportError as e:
+        print(f"  [FAIL] {name}: {e}")
+        return False
+
+
+def main():
+    print("=" * 50)
+    print("PyTorch Computer Vision Image Validation")
+    print("=" * 50)
+
+    ok = True
+
+    print("\n[Base image packages]")
+    ok &= check_package("torch")
+    ok &= check_package("cv2", "cv2")
+    ok &= check_package("numpy")
+    ok &= check_package("pandas")
+
+    print("\n[Added packages]")
+    ok &= check_package("albumentations")
+    ok &= check_package("timm")
+    ok &= check_package("ultralytics")
+
+    print("\n" + "=" * 50)
+    if ok:
+        print("VALIDATION PASSED")
+        return 0
+    else:
+        print("VALIDATION FAILED")
+        return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())