diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
index ecc2fea..64727c4 100644
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -48,7 +48,8 @@
       "skills": [
         "./skills/git-workflow/SKILL.md",
         "./skills/tidd/SKILL.md",
-        "./skills/intent-engineering/SKILL.md"
+        "./skills/intent-engineering/SKILL.md",
+        "./skills/serverless-migration-advisor/SKILL.md"
       ]
     },
     {
diff --git a/CHANGELOG.md b/CHANGELOG.md
index e8adfa5..e3d651e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+
+- **serverless-migration-advisor** (workflow): AWS always-on 아키텍처를 서버리스+Spot 패턴으로 이행할 때 사용하는 업스트림 어드바이저. 5-Phase 인터뷰 → 타겟 아키텍처 매핑 → 단계별 체크리스트 리포트 생성. AWS Docs 인용 의무 + serverless-autoresearch($3.94/48실험) 및 serverless-openclaw($1/월) 검증 사례 인용. 구현 how-to는 `sagemaker-spot-training` 등으로 위임. (#3)
+
 ## [0.2.0-beta] - 2026-04-07
 
 ### Added
diff --git a/README.md b/README.md
index ecf231a..9853d72 100644
--- a/README.md
+++ b/README.md
@@ -41,6 +41,7 @@ Git 워크플로우 및 협업 관련
 | [git-workflow](plugins/workflow/skills/git-workflow) | Git 브랜치 전략 및 커밋 컨벤션 가이드 |
 | [tidd](plugins/workflow/skills/tidd) | TiDD(Ticket Driven Development) - No Ticket, No Commit 원칙 강제 훅 |
 | [intent-engineering](plugins/workflow/skills/intent-engineering) | Intent Document(INTENT.md) 생성 및 관리 - Why/What/Not/Learnings 기반 프로젝트 의도 문서화 |
+| [serverless-migration-advisor](plugins/workflow/skills/serverless-migration-advisor) | AWS always-on 아키텍처를 서버리스+Spot 패턴으로 이행하는 업스트림 어드바이저. 트레이드오프 평가·리스크 플래깅·단계별 이행 계획 생성 후 구현 스킬(sagemaker-spot-training 등)로 위임. |
 
 ### Documentation
 
diff --git a/issues/3-serverless-migration/HANDOFF.md b/issues/3-serverless-migration/HANDOFF.md
new file mode 100644
index 0000000..5b224dc
--- /dev/null
+++ b/issues/3-serverless-migration/HANDOFF.md
@@ -0,0 +1,224 @@
+# Issue #3: Serverless Migration Skill — Handoff Document
+
+> **To the agent picking this up:** this is a handoff from the
+> `serverless-autoresearch` project. Read this end-to-end before writing any
+> code. The source repository (see §3) contains the lessons that should power
+> this skill — extract, don't reinvent. Write `SPEC.md` and `PLAN.md` in this
+> directory once you've decided scope and architecture.
+
+## 1. Goal
+
+Build a Claude Code skill (or skill family) that helps users **migrate an
+existing, always-on architecture to a serverless + Spot pattern on AWS**, with
+particular emphasis on the lessons proven out in the source project.
+
+The skill should let a user describe their current workload (batch training,
+ETL job, long-running service, ML inference, etc.) and receive:
+
+1. A feasibility assessment (is this workload a good serverless+Spot candidate?)
+2. A concrete migration plan (target AWS services, cost/time estimate, risks)
+3. Drop-in IaC / code patterns where applicable
+4. Pitfalls to avoid, drawn from real experience
+
+## 2. Why this skill
+
+The source project (`serverless-autoresearch`) demonstrated that a workload
+most people assume needs a dedicated H100 for 8 hours can actually run on
+Spot for **229 seconds at $0.16**, matching the reference result. Same
+technique transfers to many batch-style workloads. The lessons are concrete
+and hard-won; putting them behind a skill lets other teams apply them without
+re-learning the same failures.
+
+## 3. Source project — what to mine
+
+Repository: `/Users/dohyunjung/Workspace/roboco-io/research/serverless-autoresearch/`
+(GitHub: `roboco-io/serverless-autoresearch`, main branch)
+
+### 3.1 Primary sources (read these first)
+
+| Path | What it contains | Why it matters |
+|------|-----------------|----------------|
+| `docs/insights.md` | 15 numbered lessons from 48 real Spot experiments | **Core knowledge base**. Every insight is battle-tested. Several directly contradict intuition (e.g. #3/#12 vs #13). |
+| `docs/comparison-report.md` | Sequential-dedicated vs parallel-Spot architecture analysis | Migration narrative: before/after |
+| `docs/spot-capacity-guide.md` | How to find Spot capacity by region, quota flow | First-mile problem for every Spot migration |
+| `docs/gpu-cost-analysis.md` | P5/P6 pricing, per-experiment cost math | Concrete cost templates |
+| `experiments/003-h100-comparison/results-summary.md` | Full multi-phase experiment report | End-to-end worked example |
+| `experiments/001-baseline-l40s/`, `002-optimization-l40s/` | Earlier experiment reports | Shows the incremental path — useful for the skill's suggested migration stages |
+| `README.md` | Project summary, 48-experiment program, $3.94 total | Headline numbers for the skill's "what's possible" pitch |
+
+### 3.2 Code patterns (reusable snippets)
+
+| Path | Pattern demonstrated |
+|------|---------------------|
+| `src/pipeline/batch_launcher.py` | Submitting N parallel SageMaker Spot jobs with `use_spot_instances=True`, `max_wait`, retry-safe structure |
+| `src/pipeline/result_collector.py` | Polling multiple async SageMaker jobs, extracting metrics from CloudWatch logs |
+| `src/pipeline/orchestrator.py` | Generation-loop coordination across parallel Spot jobs |
+| `src/sagemaker/entry_point.py`, `train_wrapper.py` | Container entry point that works under SageMaker's S3 input-channel convention |
+| `infrastructure/setup_iam.sh` | Minimal IAM role for SageMaker training (execution + S3 access) |
+| `infrastructure/requirements-train.txt` | Pinned deps compatible with SageMaker PyTorch DLC 2.8.0 |
+| `config.yaml.example` | Config template: profile, region, instance type, Spot flags, time budget |
+| `Makefile` | `dry-run`, `run-single`, `run`, `cost` commands — the UX template the skill should replicate |
+
+### 3.3 Diagrams (for reference / inspiration, don't copy verbatim)
+
+- `docs/architecture.svg` — system architecture
+- `docs/comparison-diagrams.svg` — sequential vs parallel visual
+
+## 4. Core knowledge to surface
+
+Distilled from the source project's `docs/insights.md`, the skill must teach
+or apply at minimum these principles:
+
+### 4.1 Spot-specific operational knowledge
+- **Region/AZ matters more than instance size** — Spot placement scores
+  vary 1–9 across regions for the same SKU. Always check
+  `aws ec2 get-spot-placement-scores` before committing to a region.
+- **Larger instances can be cheaper on Spot** — less demand. Don't assume
+  smaller = cheaper.
+- **Quota is the first-mile problem** — g-family auto-approves in minutes,
+  p-family (H100, B200) needs days of lead time.
+- **Spot interrupt risk is workload-length-dependent** — under ~30 min,
+  Spot is almost free. Beyond that, interrupt-handling matters.
+- **Billable ≠ wall clock** — SageMaker Spot only bills training time,
+  not Spot wait, not startup. This is the real cost model.
+
+### 4.2 Cost & time patterns
+- **HUGI (Hurry Up and Get Idle)** — only pay when compute is active;
+  let the platform release the instance the instant the job ends. Always-on
+  servers lose to HUGI on any bursty workload.
+- **Parallelism cuts wall clock, not cost** — N-way parallel Spot is the
+  same billable time as sequential, at 1/N the wall clock.
+- **Startup overhead matters at short durations** — SageMaker startup is
+  ~3 min. For 5-min jobs, that's 60% overhead. Either amortize (batch more
+  work per job) or choose a platform with lower cold-start (Lambda, Fargate
+  Spot, Batch).
+
+### 4.3 Migration-specific pitfalls (from experience)
+- **Hyperparameters and batch size interact across hardware** (insight
+  #13). When migrating a workload, don't assume "same config" — LR-like
+  settings that depended on the old hardware's memory limits may need to
+  be re-examined after the move.
+- **Cheap-GPU-evolved configs often transfer to expensive GPUs** (insight
+  #14). For iteration-heavy workloads, run exploration on cheap Spot, then
+  promote the winner. Don't start on the expensive tier.
+- **Transient CUDA / Spot errors happen** at ~5% rate on H100 Spot. Any
+  migration plan needs automatic retry baked in; one failure out of 20 is
+  normal, not a signal to abandon Spot.
+- **Kernel/accelerator support varies by GPU** (FA3 sm_90 only, etc.). A
+  serverless migration across GPU types needs a fallback path.
+
+### 4.4 The concrete achievement to cite
+
+The source project reproduced Karpathy's upstream autoresearch result
+(val_bpb ~0.998) to **val_bpb = 0.9951 on a single 229-second H100 Spot
+run costing $0.16**, versus the upstream's estimated **$7–24 over 8 hours**.
+Total program: 48 experiments across L40S + H100 for **$3.94**. These are
+the numbers the skill should be able to quote as the "what's actually
+possible" anchor.
+
+## 5. Suggested scope (the agent decides)
+
+The ask — "help migrate any architecture to serverless" — is broad. A
+plausible scoping, for the agent to accept, refine, or reject:
+
+**Tier 1 (must-have):** batch/training/ETL workloads → Spot + S3 + managed
+service (SageMaker Training / AWS Batch / Fargate Spot). This is where the
+source project's experience is deepest.
+
+**Tier 2 (extend if time permits):** always-on API → Lambda / API Gateway
+or ECS Fargate. Source project doesn't cover this directly but HUGI
+principles apply.
+
+**Tier 3 (probably out of scope):** full legacy monolith decomposition,
+data layer migration (RDS → DynamoDB), event-driven re-architecture.
+These are bigger than a single skill.
+
+Reasonable options for skill structure:
+
+- **Option A — single skill** `serverless-migration-advisor`. One SKILL.md
+  with workload-type branches internally. Simpler to maintain, harder to
+  load selectively.
+- **Option B — orchestrator + sub-skills**, like the `aws-well-architected`
+  family (see `issues/2-aws-well-architected-review/SPEC.md`). Main skill
+  routes to `serverless-migration-batch`, `serverless-migration-api`, etc.
+  Matches the existing plugin repo pattern.
+- **Option C — complement an existing skill**. The source project's
+  CLAUDE.md mentions a `sagemaker-spot-training` skill at
+  `github.com/roboco-io/claude-skills`. If that skill already covers the
+  training-workload case, this new skill should focus on migration framing
+  (before-state analysis, service selection, cost/risk call-out) rather
+  than reimplement SageMaker-specific how-to.
+
+**Recommendation:** investigate Option C first. If the existing skill is
+hands-on-how-to, this new skill should be the upstream migration advisor
+that eventually *delegates* to it. Don't duplicate.
+
+Category placement in the plugin repo: either `development/` or
+`workflow/`. `workflow/` fits "migration process" framing better.
+
+## 6. Non-goals
+
+- **Not a cost calculator** — skill should cite cost *patterns and ranges*
+  from real experiments, not compute live AWS prices.
+- **Not a Terraform generator** — emit *snippets and structural guidance*,
+  not a full IaC suite.
+- **Not a multi-cloud story** — AWS-only, matching source project scope.
+- **Not silent about limits** — must surface the source project's actual
+  failure modes (CUDA errors, quota delays, FA3 compatibility, etc.),
+  not only the wins.
+
+## 7. Success criteria
+
+A user arrives with "I have a nightly batch job running on an EC2 H100 that
+costs $N/month and I want it cheaper." The skill should, within one
+interaction:
+
+1. Classify the workload (batch-training / batch-ETL / inference / long-running)
+2. Identify the target serverless pattern (SageMaker Spot / Batch / Fargate Spot / Lambda)
+3. Estimate cost savings range (citing source-project precedents where relevant)
+4. Flag top 3 risks specific to that workload type (Spot interrupts, cold start, quota, etc.)
+5. Produce a staged migration plan: validation-on-cheap-Spot → production-on-target
+6. Cite the source project's insights (numbered) where the advice originates — traceable, not black-box.
+
+A secondary marker: if the user has an IaC file (Terraform, CDK, CloudFormation),
+the skill can read it and produce a *diff-style* migration sketch rather than
+starting from a blank page.
+
+## 8. Implementation constraints
+
+Follow the plugin repo's conventions (`plugins/tools/plugins/CLAUDE.md`):
+
+- YAML frontmatter + Markdown in every SKILL.md
+- Progressive disclosure via `references/` subdirectory
+- Update `.claude-plugin/marketplace.json` `plugins` array when adding the plugin
+- Keep SKILL.md under ~500 lines; offload detail to `references/`
+- Add the plugin under `plugins/{category}/.claude-plugin/plugin.json`
+
+## 9. Deliverables (for the agent)
+
+In this directory (`issues/3-serverless-migration/`):
+
+1. **`SPEC.md`** — your refined scope, architecture, skill structure,
+   interview/output contracts. Follow the style of `issues/2-aws-well-architected-review/SPEC.md`.
+2. **`PLAN.md`** — implementation steps, ordered, with rough effort estimates.
+3. (optional) **`RESEARCH.md`** — your investigation of the existing
+   `sagemaker-spot-training` skill (referenced in §5 Option C) and how
+   this new skill should complement or supersede it.
+
+Then proceed with implementation in `plugins/{chosen-category}/skills/...`.
+
+## 10. Open questions for the agent to resolve
+
+- Does `github.com/roboco-io/claude-skills` have a live `sagemaker-spot-training`
+  skill? If yes, what's its exact scope? (Drives §5 Option A vs C.)
+- Should the skill interview the user (like `aws-well-architected`) or work
+  from a single declarative input (paste your Terraform)? Probably both.
+- How to represent "staged migration" — as a checklist skill output? A
+  generated issue tree? A Terraform-diff format?
+- Where do the numbered source-project insights live in the skill? Inline
+  citations in output, or an appended `references/source-insights.md`?
+
+---
+
+*Handoff prepared 2026-04-18 from serverless-autoresearch commit `5435b37`.*
+*Source project contact: `/Users/dohyunjung/Workspace/roboco-io/research/serverless-autoresearch/`.*
diff --git a/issues/3-serverless-migration/PLAN.md b/issues/3-serverless-migration/PLAN.md
new file mode 100644
index 0000000..0c23161
--- /dev/null
+++ b/issues/3-serverless-migration/PLAN.md
@@ -0,0 +1,226 @@
+# Issue #3: Serverless Migration Advisor — 구현 계획
+
+> **전제**: `SPEC.md` 승인 후 착수.
+> **구현 위치**: `plugins/workflow/skills/serverless-migration-advisor/`.
+> **총 예상 공수**: ~2.5일(1인). 리서치 1일 + 본문 작성 1일 + references + 검증 0.5일.
+
+## 0. 선행 조건
+
+- [x] HANDOFF.md 숙독
+- [x] SPEC.md 작성 완료
+- [x] SPEC.md / PLAN.md / RESEARCH.md 사용자 승인 — **Stage 1 시작 전 필수**
+- [x] 기존 sagemaker-spot-training 스킬 인터페이스 확인
+- [x] 두 검증 프로젝트 경로 확보 (autoresearch, openclaw)
+- [x] AWS 공식문서 1차 수집 완료 (RESEARCH.md 참고)
+
+---
+
+## 1. 구현 단계
+
+### Stage 1 — 리서치 보강 (0.5일)
+
+목표: `RESEARCH.md`의 공식문서 스냅샷을 references/*.md로 구조화 가능한 수준까지 완성.
+
+**Tasks:**
+
+1. **AWS Serverless Lens 9개 설계원칙 전체 발췌**
+   - 출처: https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/design-principles.html
+   - 각 원칙에 한 줄 요약 + 본 스킬에서의 활용처 매핑.
+2. **Aurora Serverless v2 / DynamoDB 트레이드오프**
+   - Aurora Serverless v2: ACU 스케일링 특성, Cold start 없음, 최소 ACU 비용 바닥.
+   - DynamoDB: On-Demand vs Provisioned, 단일 테이블 설계 트레이드오프.
+3. **EventBridge / SQS / Step Functions** 비교
+   - 이벤트 역전(invert)  보장, 재시도, DLQ, 비용 단위.
+4. **S3 Express One Zone** — 저지연 배치 워크로드 용도.
+5. **Lambda SnapStart** (Java/Python) — 콜드 스타트 대안.
+6. **검증 사례 재리뷰**
+   - `serverless-autoresearch/docs/insights.md` — 15개 인사이트 번호 고정.
+   - `serverless-openclaw/` — 비용 구조, Lambda Container 1.35s, EventBridge pre-warming 로직.
+
+**Deliverable**: 본 계획과 동일 디렉토리의 `RESEARCH.md` 업데이트.
+
+---
+
+### Stage 2 — 스킬 스켈레톤 생성 (0.5일)
+
+**Tasks:**
+
+1. 디렉토리 생성:
+   ```
+   plugins/workflow/skills/serverless-migration-advisor/
+   ├── SKILL.md
+   └── references/
+   ```
+2. `SKILL.md` 초안 작성 (YAML frontmatter + 5 Phase 골격):
+   - `name: serverless-migration-advisor`
+   - `description:` — 트리거 키워드("서버리스 이행", "EC2에서 Lambda로", "Spot 이행", "serverless migration") 포함.
+3. `references/` 파일 13개를 빈 스켈레톤으로 생성:
+   - 각 파일 상단에 frontmatter 한 줄 + `> Snapshot date: 2026-04-18`.
+   - Stage 3-5에서 내용 채움.
+4. `plugin.json`, `marketplace.json` 업데이트:
+   - workflow 플러그인의 `skills` 배열에 `./skills/serverless-migration-advisor/SKILL.md` 추가.
+5. `README.md` 플러그인 목록 갱신.
+
+**Verification**: `npm test` 통과 (marketplace·plugin-json·skills·integrity 검증).
+
+---
+
+### Stage 3 — 트레이드오프 references 채우기 (0.5일)
+
+우선순위 순:
+
+1. **`tradeoffs-compute.md`** ← SPEC §4.1-4.5 내용을 확장.
+   - Lambda, SageMaker Spot, Fargate Spot, EC2 Spot, Batch 각 섹션.
+   - 표 + "함의" + 공식 URL 링크.
+2. **`tradeoffs-spot.md`**
+   - 용량(placement score), 가격, 인터럽트 동작(`terminate`/`stop`/`hibernate`), HUGI, billable 정의.
+   - `aws ec2 get-spot-placement-scores` 예시 명령.
+3. **`serverless-lens.md`**
+   - 9개 설계원칙 한 줄 요약.
+   - "이 스킬에서의 활용" 매핑 테이블.
+4. **`tradeoffs-data-layer.md`** — Tier 3용.
+5. **`tradeoffs-event-driven.md`** — 이벤트 기반 재설계용.
+
+**원칙**: 각 파일에 최소 3개 공식 URL 인용. 주장당 한 줄 인용(`> AWS Docs: …`).
+
+---
+
+### Stage 4 — 이행 패턴 references 채우기 (0.5일)
+
+1. **`patterns-tier1-batch.md`**
+   - EC2 long-running → SageMaker Managed Spot.
+   - EMR → AWS Batch + Fargate/EC2 Spot.
+   - Cron on EC2 → EventBridge Scheduler + Lambda/Batch.
+   - 각 패턴: before/after diagram (텍스트), 체크리스트 템플릿, autoresearch Insight 인용.
+2. **`patterns-tier2-api.md`**
+   - ALB + EC2 → API Gateway + Lambda.
+   - ECS 상시 서비스 → Fargate + Fargate Spot 혼합.
+   - WebSocket → API Gateway WebSocket + Lambda.
+   - openclaw 사례 인용.
+3. **`patterns-tier3-monolith.md`**
+   - Strangler Fig (AWS 공식 prescriptive guidance 링크).
+   - Branch by Abstraction.
+   - "Tier 3는 검증 사례 없음 — 파일럿 필수" 경고.
+4. **`patterns-tier3-data.md`**
+   - RDS → Aurora Serverless v2 (동일 엔진 유지 이행).
+   - RDS → DynamoDB (모델 재설계 필요, CDC 기반).
+   - S3 Express One Zone 도입 전략.
+
+---
+
+### Stage 5 — 검증 사례 + 인사이트 references (0.25일)
+
+1. **`case-study-autoresearch.md`**
+   - 프로젝트 링크, 커밋 해시, 핵심 숫자 테이블.
+   - 본 스킬 출력에서 인용 가능한 "범위 문자열" (예: "H100 Spot 229s $0.16").
+2. **`case-study-openclaw.md`**
+   - 프로젝트 링크, 아키텍처 요약, 비용 구조.
+   - "Tier 2 상시형 API $1/월 달성 사례" 앵커.
+3. **`source-insights.md`**
+   - autoresearch 15개 + openclaw 주요 인사이트 번호화.
+   - 번호 고정 규칙: 초기 순서 유지, 추가 시만 번호 증가.
+
+---
+
+### Stage 6 — 인터뷰 뱅크 + SKILL.md 본문 (0.5일)
+
+1. **`interview-bank.md`**
+   - Phase 1/3별 질문 집합 완성.
+   - Tier별 분기 규칙 명시.
+   - 각 질문에 AskUserQuestion JSON 형식 템플릿.
+2. **`SKILL.md` 본문 완성**
+   - 5 Phase 실행 순서.
+   - 각 Phase에서 참조할 references 파일 명시.
+   - 리포트 템플릿 (SPEC §5).
+   - Delegation 로직.
+   - 500줄 이하 유지, 상세는 references로.
+
+---
+
+### Stage 7 — 테스트 및 통합 (0.25일)
+
+1. **Unit 테스트**
+   - `src/__tests__/skills.test.ts`가 자동으로 본 스킬 검증 (frontmatter, 파일 존재, 줄수).
+   - `integrity.test.ts` — marketplace↔plugin.json↔SKILL.md 정합성.
+   - `npm test` green.
+2. **통합 검증**
+   - 로컬 설치: `/plugin marketplace add <path>` → `/plugin install workflow@roboco-plugins`.
+   - 샘플 시나리오 3개 돌려보기:
+     - Tier 1: "EC2 H100 야간 배치 → Spot 이행"
+     - Tier 2: "ALB + EC2 상시 API → Lambda 이행"
+     - Tier 3: "모놀리스 RDS → DynamoDB 분해"
+   - 각 시나리오에서 리포트 생성 정상 여부 확인.
+3. **Delegation 동작**
+   - Tier 1 시나리오에서 `sagemaker-spot-training` 스킬이 트리거되거나 최소한 안내되는지 확인.
+
+---
+
+### Stage 8 — 문서화 및 출시 (0.25일)
+
+1. `CHANGELOG.md` 업데이트 (`0.3.0-beta` 또는 결정된 버전).
+2. `README.md` 플러그인 목록 테이블 최종 확인.
+3. 커밋 (TiDD 준수 — 이슈 #3 연결):
+   - `feat(workflow): add serverless-migration-advisor skill (#3)`
+4. PR 생성.
+
+---
+
+## 2. 리스크 및 완화
+
+| 리스크 | 영향 | 완화 |
+|-------|------|------|
+| Serverless Lens 문서 링크가 끊기거나 개정됨 | 인용 traceability 훼손 | `references/serverless-lens.md`에 `Snapshot date` 기재, 6개월 주기 리뷰 |
+| Tier 3 범위가 과도하게 팽창 | 일정 초과, 스킬 품질 저하 | "원칙 수준만" 가이드 고수, 상세는 AWS 공식 링크로 회피 |
+| AskUserQuestion이 많으면 UX 저하 | 사용자 이탈 | Phase 1 4문항, Phase 3은 분기로 평균 4-5문항 유지 |
+| sagemaker-spot-training과 중복 | 두 스킬이 동일 질문 반복 | 본 스킬은 "서비스 확정까지", 구현은 무조건 위임 |
+| aws-well-architected와 중복 | 리뷰/이행 혼선 | 본 스킬 SKILL.md 상단에 "이 스킬은 이행, WA는 리뷰" 명시 |
+
+---
+
+## 3. 의사결정이 필요한 지점 (구현 중)
+
+- **IaC 정적 분석 수준**: Phase 2에서 Terraform/CDK 파싱. 첫 버전은 정규식 기반으로 `resource "aws_instance"` 같은 패턴만 추출. 파서 라이브러리 도입은 후속.
+- **리포트 슬러그 충돌 규칙**: `YYYY-MM-DD-<topic>` 중복 시 `-2`, `-3` 접미사 부여 함수.
+- **언어**: 리포트 기본 언어는 사용자 대화 언어 자동 감지. SKILL.md 본문·references는 한국어.
+- **검증 사례 인용 빈도**: 리포트 한 건당 최소 1개 Case + 최소 2개 Insight + 최소 3개 AWS Docs 링크.
+
+---
+
+## 4. 산출물 체크리스트
+
+- [x] `plugins/workflow/skills/serverless-migration-advisor/SKILL.md`
+- [x] `references/tradeoffs-compute.md`
+- [x] `references/tradeoffs-spot.md`
+- [x] `references/tradeoffs-data-layer.md`
+- [x] `references/tradeoffs-event-driven.md`
+- [x] `references/serverless-lens.md`
+- [x] `references/patterns-tier1-batch.md`
+- [x] `references/patterns-tier2-api.md`
+- [x] `references/patterns-tier3-monolith.md`
+- [x] `references/patterns-tier3-data.md`
+- [x] `references/interview-bank.md`
+- [x] `references/case-study-autoresearch.md`
+- [x] `references/case-study-openclaw.md`
+- [x] `references/source-insights.md`
+- [x] `.claude-plugin/marketplace.json` 업데이트
+- [x] `README.md` 목록 업데이트
+- [x] `CHANGELOG.md` 업데이트
+- [x] `npm test` green (197/197)
+- [x] 3개 시나리오 read-through 검증 통과 (Stage H.2; 로컬 설치 실행은 PR 이후 수동 검증 예정)
+
+---
+
+## 5. 수락 기준
+
+SPEC §12의 6개 지표 + 다음:
+
+- [x] `npm test` 전부 green (197/197)
+- [x] SKILL.md 500줄 이하 (226 lines)
+- [x] 모든 references/*.md에 `Snapshot date` 표기
+- [x] 모든 트레이드오프 주장에 `[AWS Docs]` 또는 `[Insight #N]` 또는 `[Case: …]` 라벨 (spot-check 5/5 resolved)
+- [x] Tier별 샘플 3개 모두 read-through 검증 통과 (실제 리포트 생성은 PR 후 수동 시나리오에서)
+- [x] `sagemaker-spot-training` 위임 경로 동작 (Delegation Map 확인)
+
+---
+
+*본 계획 생성: 2026-04-18. SPEC 승인 후 본 파일 기준으로 구현 착수.*
diff --git a/issues/3-serverless-migration/RESEARCH.md b/issues/3-serverless-migration/RESEARCH.md
new file mode 100644
index 0000000..99176d5
--- /dev/null
+++ b/issues/3-serverless-migration/RESEARCH.md
@@ -0,0 +1,499 @@
+# Issue #3: AWS 공식문서 기반 트레이드오프 리서치 스냅샷
+
+> **수집 일자**: 2026-04-18
+> **목적**: 스킬의 `references/tradeoffs-*.md` 작성을 위한 1차 소스 집합.
+> **원칙**: 스킬은 AWS Docs 원문을 재인용하지 않고, 본 스냅샷 기반으로 요약·함의 작성.
+
+## 1. 수집 대상 및 URL
+
+| 문서 | URL | 상태 |
+|------|-----|------|
+| AWS Well-Architected Serverless Lens | https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/welcome.html | 1차 완료 (서론만) |
+| AWS Lambda quotas | https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html | 완료 |
+| ECS Fargate capacity providers | https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html | 완료 |
+| SageMaker Managed Spot Training | https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html | 완료 |
+| EC2 Spot Instance interruptions | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html | 완료 |
+| AWS Batch with Spot | https://docs.aws.amazon.com/batch/latest/userguide/spot.html | 완료 |
+| Serverless Lens 9 design principles | https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/design-principles.html | **보강 필요 (Stage 1)** |
+| Lambda SnapStart | https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html | **보강 필요 (Stage 1)** |
+| Aurora Serverless v2 | https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html | **보강 필요 (Stage 1)** |
+| DynamoDB Capacity Modes | https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html | **보강 필요 (Stage 1)** |
+| S3 Express One Zone | https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html | **보강 필요 (Stage 1)** |
+| Step Functions Express vs Standard | https://docs.aws.amazon.com/step-functions/latest/dg/concepts-standard-vs-express.html | **보강 필요 (Stage 1)** |
+
+---
+
+## 2. Lambda — 트레이드오프 사실
+
+**출처**: Lambda quotas 페이지.
+
+### 2.1 정량 한계
+
+| 항목 | 값 | Increasable |
+|------|-----|-------------|
+| 동시 실행 쿼터 | 1,000 | 예 (수만까지) |
+| 지속 실행(durable) | 1,000,000 | 예 (수백만까지) |
+| 함수 메모리 | 128-10,240 MB (1MB 단위) | 아니오 |
+| 함수 타임아웃 | 900초 | 아니오 |
+| 환경변수 총합 | 4 KB | 아니오 |
+| 리소스 기반 정책 | 20 KB | 아니오 |
+| Layer | 함수당 5개 | 아니오 |
+| 동시성 스케일링 한도 | 함수당 10초마다 1,000 실행환경 | 아니오 |
+| 요청 페이로드 (동기) | 6 MB | 아니오 |
+| 응답 페이로드 (동기) | 6 MB | 아니오 |
+| 스트리밍 응답 | 200 MB | 아니오 |
+| 요청·응답 (비동기) | 1 MB | 아니오 |
+| 스트리밍 대역폭 | 처음 6MB 무제한, 이후 2MB/s | 아니오 |
+| zip 배포 | 50 MB (업로드) / 250 MB (unzipped, layer 포함) | 아니오 |
+| 컨테이너 이미지 | 10 GB (unzipped) | 아니오 |
+| 컨테이너 이미지 설정 | 16 KB | 아니오 |
+| `/tmp` | 512 MB-10,240 MB (1MB 단위) | 아니오 |
+| 파일 디스크립터 | 1,024 (Managed Instances: 4,096) | 아니오 |
+| 스레드/프로세스 | 1,024 (Managed Instances: Bottlerocket 기본) | 아니오 |
+| 1 vCPU 등가 메모리 | 1,769 MB | - |
+
+### 2.2 트레이드오프 함의
+
+- **15분 초과 불가**: 긴 배치는 Step Functions 분해 또는 Batch/Fargate.
+- **페이로드 6MB 한계**: 대용량 응답은 스트리밍(200MB) 또는 S3 presigned URL.
+- **메모리=CPU 연동**: 1,769 MB에서 1 vCPU. 지연 최적화 시 메모리 상향 → CPU 상향 (비용도 비례).
+- **동시성 1,000 기본**: API Gateway 기본 10,000 RPS와 불일치. 부하 테스트로 미리 확인.
+- **쿼터 증액**: New 계정은 축소 쿼터, 사용 패턴 따라 자동 증액. 운영 전 명시적 요청 권장.
+
+---
+
+## 3. SageMaker Managed Spot Training — 트레이드오프 사실
+
+**출처**: Managed Spot Training 페이지.
+
+### 3.1 정량
+
+- **절감**: on-demand 대비 최대 **90%**.
+- **공식**: `(1 - BillableTimeInSeconds / TrainingTimeInSeconds) × 100`.
+- **예**: Billable=100, Training=500 → 절감 80%.
+- **조건**: `MaxWaitTimeInSeconds > MaxRuntimeInSeconds`.
+- **체크포인트 미사용 시**: 내장·마켓플레이스 알고리즘 `MaxWaitTime ≤ 3600s`.
+
+### 3.2 상태 전이
+
+- 인터럽트 없음: `Starting → Downloading → Training → Uploading`
+- 1회 인터럽트 후 재개: `Starting → Downloading → Training → Interrupted → Starting → Downloading → Training → Uploading`
+- 2회 인터럽트 + MaxWait 초과: `Stopped: MaxWaitTimeExceeded`
+- 스팟 미획득: `Starting → Stopping → Stopped: MaxWaitTimeExceeded`
+
+### 3.3 체크포인트
+
+SageMaker는 로컬 경로를 S3와 동기화. 재시작 시 S3에서 로컬로 복원. 짧은 잡이 아니면 체크포인트 **권장**.
+
+### 3.4 자동 모델 튜닝 호환
+
+Managed Spot은 Hyperparameter Tuning에서도 사용 가능.
+
+---
+
+## 4. Fargate Spot — 트레이드오프 사실
+
+**출처**: ECS Fargate capacity providers 페이지.
+
+### 4.1 핵심 동작
+
+- **2분 경고**: 태스크 상태 변경 이벤트가 EventBridge로, SIGTERM이 컨테이너로.
+- **`stopTimeout`**: 기본 30초, 최대 120초. SIGKILL 전 grace 기간.
+- **용량 부족**: Fargate Spot 용량이 없으면 태스크 시작이 지연됨. **자동으로 On-Demand 전환하지 않음**.
+- **서비스 + Spot**: 인터럽트 시 스케줄러가 추가 태스크 시작 시도.
+- **단일 태스크**: 용량 복구까지 중단.
+
+### 4.2 정전 이벤트 예시
+
+```json
+{
+  "detail-type": "ECS Task State Change",
+  "detail": {
+    "stoppedReason": "Your Spot Task was interrupted.",
+    "stopCode": "SpotInterruption",
+    ...
+  }
+}
+```
+
+### 4.3 트레이드오프 함의
+
+- Fargate Spot은 서비스 단위로 운영해야 안전. 배치 태스크는 Batch + Fargate Spot 조합.
+- Capacity Provider Strategy로 `FARGATE` + `FARGATE_SPOT` weight 혼합 권장.
+- SIGTERM 미처리 시 데이터 손상/손실 가능.
+
+---
+
+## 5. EC2 Spot — 트레이드오프 사실
+
+**출처**: Spot Instance interruptions 페이지.
+
+### 5.1 인터럽트 사유
+
+- **Capacity**: EC2가 용량을 다시 필요로 할 때 (주 원인). 하드웨어 유지·폐기 포함.
+- **Price**: 최대가 지정 시 Spot 가격이 초과하면. **최대가 지정은 인터럽트 빈도 증가**.
+- **Constraints**: launch group/AZ group 등 제약 충족 불가 시.
+
+### 5.2 인터럽트 동작
+
+- `terminate` (기본): 인스턴스 종료.
+- `stop`: persistent 요청일 때만. EBS 보존, 시작은 EC2만 가능.
+- `hibernate`: 즉시 시작 (2분 경고 없음). 인스턴스 패밀리·AMI 지원 필요.
+
+### 5.3 신호
+
+- **2분 경고**: EventBridge 이벤트 + IMDSv2 메타데이터 (`/meta-data/spot/instance-action`).
+- **Rebalance recommendation**: 인터럽트 리스크 상승 시 선제 신호. 2분보다 이른 활용 가능.
+- **권장 폴링**: 5초 간격 메타데이터 확인.
+
+### 5.4 운영 권장
+
+- ASG(Auto Scaling Group) + 다중 인스턴스 타입 + 다중 AZ.
+- 체크포인트 + S3/DynamoDB 영속화.
+- AWS FIS로 사전 검증.
+- `BidEvictedEvent` (CloudTrail)로 사후 추적.
+
+---
+
+## 6. AWS Batch with Spot — 트레이드오프 사실
+
+**출처**: AWS Batch with Spot 페이지(AI 요약 포함).
+
+### 6.1 할당 전략
+
+| 전략 | 특성 | Spot 권장도 |
+|------|------|-------------|
+| `BEST_FIT` | 최소 가용 인스턴스 | 낮음 (인터럽트률 높음) |
+| `BEST_FIT_PROGRESSIVE` | 필요 시 상위 인스턴스 승격 | 중간 |
+| `SPOT_CAPACITY_OPTIMIZED` | 인터럽트 가능성 최소 | **표준** |
+
+### 6.2 권장 구성
+
+- `retryStrategy.attempts = 2~3`
+- `evaluateOnExit`로 재시도 사유 구분
+- SIGTERM 핸들러
+- 체크포인트/복구
+- Spot compute-env 우선 + On-Demand fallback 큐
+
+### 6.3 워크로드 적합성
+
+- **Spot 적합**: 배치, ML 훈련, CI/CD (fault-tolerant, retryable).
+- **Spot 부적합**: 프로덕션 API, 데이터베이스, SLA 엄격 작업.
+
+---
+
+## 7. Serverless Lens — 1차 발췌
+
+**출처**: Serverless Applications Lens welcome 페이지.
+
+- 발행: **2022-07-14**.
+- 스코프: 서버리스 워크로드 설계·배포·아키텍처.
+- WA 기본 프레임워크와의 관계: 보완. 서버리스 특화 best practice만 수록.
+
+### 7.1 보강 필요 (Stage 1에서 수집)
+
+아래 URL에서 9개 design principles 발췌:
+- https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/design-principles.html
+
+5 pillars (Operational Excellence / Security / Reliability / Performance / Cost) 각각의 서버리스 특화 질문 및 best practice도 스냅샷 필요:
+- https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/the-pillars-of-the-well-architected-framework.html
+
+---
+
+## 8. 추가 수집 우선순위 (Stage 1)
+
+> ✅ **해결됨** (2026-04-18): §8.1~8.5의 모든 preview 항목은 §13 (SnapStart), §14 (Aurora Serverless v2), §15 (DynamoDB), §16 (S3 Express One Zone), §17 (Step Functions)에서 전체 스냅샷으로 대체되었다. §8.6 (EventBridge/SQS/Kinesis)은 Stage C `tradeoffs-event-driven.md` 작성 시 AWS Docs 재수집 대상으로 이월.
+
+---
+
+## 9. 인용 포맷 규칙 (스킬 references/에서 사용)
+
+모든 사실은 다음 형식으로 기록:
+
+```markdown
+> **사실**: Lambda 함수 타임아웃은 최대 900초(15분)다.
+> **출처**: [AWS Docs — Lambda quotas §Function configuration](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html) (Snapshot 2026-04-18)
+> **스킬에서의 함의**: 15분 초과 작업은 Step Functions 분해 또는 Batch·Fargate로 위임.
+```
+
+---
+
+## 10. 검증 사례(Case Studies) 크로스 레퍼런스
+
+### 10.1 serverless-autoresearch
+
+- 경로: `/Users/dohyunjung/Workspace/roboco-io/research/serverless-autoresearch/`
+- 주요 자료:
+  - `docs/insights.md` — 15개 인사이트 (번호 고정).
+  - `docs/comparison-report.md` — Sequential vs Parallel Spot.
+  - `docs/spot-capacity-guide.md` — 지역 선택.
+  - `experiments/003-h100-comparison/results-summary.md` — H100 검증.
+- 본 RESEARCH에서 사용된 상수:
+  - 48 실험 총비용: $3.94.
+  - H100 단일 Spot run: 229초, $0.16.
+  - upstream 대비: $7~24 / 8h.
+  - Karpathy 재현: val_bpb 0.9951 vs 원본 ~0.998.
+
+### 10.1.1 Numbered Insights (stable references)
+
+스킬 `references/case-study-autoresearch.md`에서 인용할 고정 번호 테이블. 제목은 `docs/insights.md` 원문 헤더를 그대로 옮김.
+
+| # | Title | One-line lesson | Tier usage |
+|---|-------|-----------------|------------|
+| 1 | Spot Capacity Varies Dramatically by Region | 동일 인스턴스 타입도 리전마다 placement score 1~9 편차 → `aws ec2 get-spot-placement-scores` 필수 | Tier 1 배치 |
+| 2 | Larger Instances Can Be Cheaper on Spot | g7e.8xlarge가 g7e.2xlarge보다 싼 경우 존재 — 사이즈=비용 가정 금지, Spot price history 직접 확인 | Tier 1 배치 |
+| 3 | DEVICE_BATCH_SIZE ≠ Token Throughput (hardware-dependent — see also #13) | TOTAL_BATCH_SIZE 고정 시 DEVICE_BATCH_SIZE만 올려도 토큰 처리량은 불변, 오히려 val_bpb 악화 (L40S/SDPA) | Tier 1 배치 |
+| 4 | Flash Attention 3 is GPU-Architecture Specific | FA3 커널은 Hopper/Ampere만 지원, Ada Lovelace(L40S)는 런타임 CUDA 오류 → 아키텍처별 fallback 필수 | Tier 1 배치 |
+| 5 | SageMaker Startup Overhead is Significant | 잡당 ~3분 시작 오버헤드 → 5분 훈련 잡의 60%가 오버헤드. 단일 잡에 실험 병합 또는 warm pool | Tier 1 배치 |
+| 6 | Quota Management is a First-Class Concern | GPU Spot 쿼터 기본값 0, g7e 자동승인 / p5·p6는 수동 검토. 마이그레이션 전 다중 리전 쿼터 사전 요청 | Tier 1 배치 |
+| 7 | SageMaker Profiler Doesn't Support All Instance Types | g7e는 `ValidationException: Profiler is currently not supported` → Estimator에 `disable_profiler=True` | Tier 1 배치 |
+| 8 | The Parallel Evolution Approach Works | 4 병렬 실험 $0.066, ~10분 wall clock — autonomous 파이프라인 검증됨 | Tier 1 배치 |
+| 9 | PyArrow Version Matters | DLC의 pyarrow 23.x와 로컬 이전 버전 불일치 시 parquet `Repetition level histogram size mismatch`. `pyarrow>=21.0.0` 필수 | Tier 1 배치 |
+| 10 | config.yaml Should Never Be in Git | 역할 ARN·프로필·리전 등 환경별·민감 정보 포함 → gitignore + `.example` 템플릿 | Tier 1 배치 (운영 원칙) |
+| 11 | Spot GPUs Are Valid Proxies for Large-Scale Training | L40S Spot HPO 결과가 H100 프로덕션에 전이 (랭킹·아키텍처 결정). 절대 BPB·최적 BS는 미전이 | Tier 1 배치 |
+| 12 | DEVICE_BATCH_SIZE ≠ More Training (L40S-specific; reversed on H100) | BS 64→128이 L40S에서는 악화, H100/FA3에서는 개선 — 하드웨어별 상반된 방향 | Tier 1 배치 |
+| 13 | Batch Size × LR × Hardware Interact — Evolved LRs Can Be BS-Specific | BS 고정한 LR 진화는 BS-조건부 최적일 뿐. 하드웨어·BS 변경 시 LR 재방문 필요. 단일 가정 점검이 20실험 탐색보다 **100× 비용효율** | Tier 1 배치 |
+| 14 | Cheap-GPU-Evolved LRs Transfer to Expensive GPUs — Sometimes Better Than Re-Evolving | L40S($0.40 24실험)에서 찾은 LR을 H100에 옮겨 upstream baseline 이하 달성. Phase-2 H100 재탐색이 오히려 악화 | Tier 1 배치 |
+| 15 | Serverless Spot Can Match or Beat Dedicated H100 Results at 44–150× Lower Cost | 229초 single Spot run ($0.16)으로 Karpathy upstream H100 8h ($7-24) val_bpb 일치 혹은 상회 | Tier 1 배치 (핵심 서사) |
+
+**Source commit (autoresearch)**: `5435b374fb5daae5eee95e3e8eb9292caacf94f8`
+**Source path**: `docs/insights.md`
+**Extraction date**: 2026-04-18
+
+### 10.2 serverless-openclaw
+
+- 경로: `https://github.com/serithemage/serverless-openclaw`
+- 본 RESEARCH에서 사용된 상수:
+  - 월 목표 비용: **under $1-2/month** (Free Tier 시 $0.23).
+  - Lambda Container 콜드 스타트: **1.35s** (warm 0.12s).
+  - ECS Fargate Spot 컴퓨트 절감: **70%**.
+  - API Gateway 선택으로 ALB 고정비 **$18-25/월** 제거.
+  - EventBridge scheduled pre-warming으로 **0s first response**.
+  - Primary: Lambda Container, Fallback: ECS Fargate Spot.
+
+### 10.2.1 Numbered Principles (stable references)
+
+스킬 `references/case-study-openclaw.md`에서 인용할 고정 번호 테이블. autoresearch의 numeric insight (§10.1.1)와 구분하기 위해 **O1-O5** prefix 사용.
+
+| # | Principle from openclaw | One-line lesson | Tier usage |
+|---|------------------------|-----------------|------------|
+| O1 | Lambda Container + dual compute fallback | 기본 Lambda Container(zero-idle, 1.35s cold start), 15분 초과·고부하는 ECS Fargate Spot fallback(~70% 컴퓨트 절감) | Tier 2 API |
+| O2 | API Gateway over ALB | ALB 고정비 **$18-25/월** 제거 → per-request 청구 모델로 전환 | Tier 2 API |
+| O3 | EventBridge scheduled pre-warming | 액티브 시간대 cron으로 컨테이너 주기 호출, 월 ~$0.07 추가로 first-response 콜드스타트 페널티 제거 | Tier 2 API |
+| O4 | DynamoDB + S3 session persistence for stateless Lambda | DynamoDB에 대화/설정, S3에 세션 백업(동시성 제어) → Stateless Lambda에서 대화 지속성 확보 | Tier 2 API |
+| O5 | Free-tier first cost target | 개인 사용 **$1-2/월** (Free Tier 내 $0.23) 목표 — 전 구성요소의 zero-idle·per-request 원칙 적용 결과 | Tier 2 API |
+| O6 | CloudFront + S3 for web UI | 정적 호스팅으로 서버 비용 제거 — 웹 UI를 S3 버킷 + CloudFront 배포로 서빙하여 EC2/Lambda 런타임 없이 0 idle 비용 달성 | Tier 2 API |
+
+*주: TASKS.md 템플릿의 O5(CloudFront+S3)는 O6으로 이동, Free-tier 목표를 O5로 승격 — 비용 설계 원칙이 케이스 스터디 인용에서 더 빈번하기 때문.*
+
+**Source**: https://github.com/serithemage/serverless-openclaw (README, 2026-04-18 snapshot)
+
+---
+
+## 11. Open issues (after Stage A)
+
+~~1. Serverless Lens 9 설계원칙 텍스트 원문 수집~~ → §12에서 7개 원칙으로 해결 (AWS 본문이 9→7로 축소됨)
+~~2. Aurora Serverless v2 / DynamoDB / S3 Express / Step Functions / EventBridge 공식문서 스냅샷~~ → §14/§15/§16/§17에서 해결 (EventBridge은 Stage C 이월)
+~~3. Lambda SnapStart 한계·지원 런타임~~ → §13에서 해결
+4. AWS prescriptive guidance의 Strangler Fig / CDC migration URL 수집 (Stage 4).
+5. §8.6 EventBridge / SQS / Kinesis 상세 스냅샷 — Stage C `tradeoffs-event-driven.md` 작성 시 수집 예정.
+
+---
+
+## 12. Serverless Lens — Design Principles
+
+**출처**: https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/general-design-principles.html (Snapshot 2026-04-18)
+
+> **수집 노트**: 원래 TASKS에서 "9 design principles"를 기대했으나, 2026-04-18 기준 AWS Serverless Lens 공식문서의 `general-design-principles.html` 페이지는 **7개 원칙**만 공식 수록함. `design-principles.html` 엔드포인트는 빈 페이지로 리다이렉트됨. 아래 표는 현행 공식 문서 기준 7개를 그대로 인용함.
+
+### 12.1 Principles 표
+
+| # | Title (원문) | Summary (원문 1문장) | 본 스킬에서의 활용 |
+|---|-------------|---------------------|-------------------|
+| 1 | Speedy, simple, singular | Functions are concise, short, single-purpose, and their environment may live up to their request lifecycle. | Phase 2 워크로드 특성 평가 기준 (함수 단위 분해 가능성) |
+| 2 | Think concurrent requests, not total requests | Serverless applications take advantage of the concurrency model, and tradeoffs at the design level are evaluated based on concurrency. | Phase 3 RPS → Lambda 동시성 쿼터 매핑 |
+| 3 | Share nothing | Function runtime environment and underlying infrastructure are short-lived, therefore local resources such as temporary storage is not guaranteed. | Phase 2 상태 저장성 평가 (S3/DynamoDB 위임 트리거) |
+| 4 | Assume no hardware affinity | Underlying infrastructure may change. Use code or dependencies that are hardware-agnostic. | Phase 4 타겟 런타임 선정 (GPU/특수 CPU 의존 워크로드는 비적합) |
+| 5 | Orchestrate your application with state machines, not functions | Chaining Lambda executions within the code to orchestrate the workflow of your application results in a monolithic and tightly coupled application. Instead, use a state machine to orchestrate transactions and communication flows. | Phase 4 Step Functions 도입 권고 근거 |
+| 6 | Use events to trigger transactions | Events such as writing a new Amazon S3 object or an update to a database allow for transaction execution in response to business functionalities. | Phase 4 EventBridge/SQS 기반 이벤트 드리븐 전환 근거 |
+| 7 | Design for failures and duplicates | Operations triggered from requests or events must be idempotent, as failures can occur and a given request or event can be delivered more than once. | Phase 4 멱등성 요구 (Spot 인터럽트 재시도와 결합) |
+
+### 12.2 함의
+
+- **원칙 1,2,3**: Tier 1 배치 / Tier 2 API 모두 Lambda 적합성 판단의 3대 필터.
+- **원칙 4**: GPU/특수 라이브러리 의존 워크로드 → Fargate·EC2 Spot·SageMaker로 분기.
+- **원칙 5,6**: Phase 4에서 Step Functions + EventBridge 조합을 "기본 권고"로 삼는 근거.
+- **원칙 7**: Tier 1 Spot 재시도 전략과 자연스럽게 연결. 멱등성은 Spot 이식성의 선결 조건.
+
+---
+
+## 13. Lambda SnapStart — 트레이드오프 사실
+
+**출처**: https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html (Snapshot 2026-04-18)
+
+### 13.1 정량
+
+| 항목 | 값 |
+|------|-----|
+| 지원 런타임 | **Java 11+, Python 3.12+, .NET 8+** (Node.js, Ruby, OS-only, 컨테이너 이미지 미지원) |
+| 복원 지연 | "as low as sub-second" — 저지연 최적 시나리오에서 1초 미만 |
+| 적용 단위 | **게시된 버전(published version) 또는 버전을 가리키는 alias** (unqualified `$LATEST` 불가) |
+| 추가 비용 | **Java: 무료** (요청·실행시간·메모리만 청구). **Python/.NET: 캐시 + 복원 비용** 추가 (메모리 기반, 리전 단가) |
+| 최소 캐시 청구 | Python/.NET: **최소 3시간분** (함수가 active 상태 유지 시 지속 과금) |
+| 스냅샷 보존 (Java) | **14일 미호출 시 Inactive** → 다음 호출 시 재초기화 필요 (`SnapStartNotReadyException`) |
+| 미지원 조합 | Provisioned Concurrency, Amazon EFS, 512MB 초과 ephemeral storage |
+| 지원 리전 | 모든 상용 리전 (ap-southeast-NZ, ap-east-Taipei 제외) |
+
+### 13.2 제외 / 주의 사항
+
+- **VPC ENI 수명주기**: 스냅샷에 포함되지 않음. 복원 시 ENI 재연결 지연 가능.
+- **고유성(Uniqueness) 함정**: 스냅샷 시점의 난수·UUID·TLS 세션 키·시드 값이 모든 복원 인스턴스에 복제됨 → 보안 크리티컬. 초기화 단계가 아닌 **핸들러 내부에서 생성** 권장.
+- **런타임 훅**: `beforeCheckpoint` / `afterRestore` 훅으로 uniqueness 및 커넥션 재확립 처리 (Java CRaC API, Python/.NET 전용 훅).
+- **네트워크 커넥션**: DB/Redis/HTTP 커넥션 상태는 복원 후 **보장되지 않음**. AWS SDK 커넥션은 자동 재개.
+- **임시 데이터**: 캐시된 타임스탬프/임시 자격증명은 핸들러에서 새로 갱신.
+- **SDK 자격증명**: SnapStart 활성 시 Lambda는 access-key 환경변수 대신 `AWS_CONTAINER_CREDENTIALS_FULL_URI` 사용 (복원 전 만료 방지).
+- **Provisioned Concurrency 상호배타**: 엄격한 콜드스타트 SLA가 필요하면 PC, 그 외는 SnapStart.
+
+### 13.3 함의
+
+- **Tier 2 API (Java/Python/.NET)**: 콜드 스타트가 SLA에 영향을 주는 사용자 대면 API에 1순위 권고.
+- **네트워크 의존 초기화**: VPC 안에서 RDS/Redis 커넥션을 초기화하는 함수는 SnapStart 효과가 제한적 — ENI 연결이 병목.
+- **uniqueness 함정**: 금융/인증 계열에서는 스냅샷 복원 후 반드시 fresh entropy 재생성 훅 의무화.
+- **Java 이점**: 별도 비용 없이 활성화 가능 → Spring Boot Lambda 이식의 표준 권고.
+
+---
+
+## 14. Aurora Serverless v2 — 트레이드오프 사실
+
+**출처**: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html (Snapshot 2026-04-18)
+
+### 14.1 정량
+
+| 항목 | 값 |
+|------|-----|
+| ACU 범위 | 엔진/플랫폼 버전별 `0.5-128` → `0.5-256` → **`0-256`** (최신 Aurora MySQL 3.08.0+ / PostgreSQL 13.15+, 14.12+, 15.7+, 16.3+) |
+| ACU 정의 | 1 ACU ≈ **2 GiB RAM + 상응 CPU·네트워킹** |
+| ACU 증분 | **0.5 ACU** 단위, 초 단위 연속 측정 |
+| 스케일 반응성 | 수 초 이내, 온라인 스케일 (downtime 없음) |
+| Auto-pause | min=0 설정 시 idle 후 자동 pause → 새 커넥션 도착 시 즉시 resume (스토리지 비용은 계속) |
+| 콜드 스타트 | v1 대비 **제거** — 지속 실행 인스턴스. 단 auto-pause → resume 시 수 초 재개 지연 가능 |
+| 최소 용량 청구 | 각 writer/reader 인스턴스별 `min ACU × 가동시간` (클러스터 2개 × min=1 → 최소 2 ACU 항상 과금) |
+| Provisioned 호환 | 동일 클러스터 내 Provisioned + Serverless v2 **혼합** 가능 (리더·라이터 모두). 인스턴스 클래스 변경으로 상호 전환 |
+| Multi-AZ | 지원 (Provisioned 클러스터와 동일한 failover 매커니즘) |
+| Global Database | 지원 (v2 전용 리전 복제) |
+| RDS Proxy | 지원 (Lambda ↔ Aurora 연결 풀링 최적 조합) |
+| 미지원 | Database Activity Streams, Cluster Cache Management (Aurora PG), Aurora Auto Scaling (reader 인스턴스로 대체) |
+| Promotion Tier | 0-1: writer와 동일 용량 자동 추적 / 2-15: 독립 스케일 |
+
+### 14.2 v1 대비 차이
+
+- v1: 콜드 스타트 존재, 자동 pause/resume, ACU 2배수 스텝, 수 분 단위 스케일.
+- v2: 인스턴스 지속, 0.5 ACU 단위, 초 단위 스케일, Global Database 호환.
+
+### 14.3 함의
+
+- **Tier 3 RDS → Aurora 이행의 기본 경로**: DB 엔진 변경 없이 서버리스 과금 모델 도입.
+- **바닥 비용 함정**: min ACU > 0 이면 idle 시에도 과금. 완전 zero-idle 원할 경우 min=0 + auto-pause 활용 (단, resume 지연 감수).
+- **burst 트래픽 대응**: 0.5 ACU 단위 스케일로 Lambda 동시성 급증에 연동 가능.
+- **혼합 운영**: 레거시 Provisioned 리더 유지 + Serverless v2 라이터 도입 점진 전환 가능.
+
+---
+
+## 15. DynamoDB Capacity Modes — 트레이드오프 사실
+
+**출처**: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html (Snapshot 2026-04-18)
+
+### 15.1 On-Demand vs Provisioned
+
+| 항목 | On-Demand | Provisioned |
+|------|-----------|-------------|
+| 청구 단위 | 요청당 (Read Request Unit / Write Request Unit) | 시간당 용량 예약 (RCU/WCU × hour) |
+| RRU/WRU 정의 | 1 RRU = 최대 4KB strongly consistent read 1회 또는 eventually consistent read 2회 · 1 WRU = 최대 1KB write 1회 |
+| 스케일링 | 자동, 신규 테이블도 즉시 4,000 writes/sec + 12,000 reads/sec 지원 | Auto Scaling으로 min/max 범위 내 자동 조정 (반응 수 분) |
+| 피크 대응 | 이전 피크의 **2배**까지 즉시 허용. 30분 이내 2배 초과 시 throttle 가능 (pre-warming으로 사전 증가 가능) | max 이상 throttled, burst capacity (5분) 완충 |
+| 바닥 비용 | 0 (호출 없으면 $0) | min RCU/WCU × 가동시간 |
+| Reserved Capacity | 불가 | 1년 **최대 54%** / 3년 **최대 77%** 할인 (100 RCU/WCU 단위) |
+| 모드 전환 | Provisioned → On-Demand: 24시간당 최대 4회 / On-Demand → Provisioned: 언제든 |
+| 기본 쿼터 | 계정당 합산 **40,000 RCU/WCU**, On-Demand 테이블당 max 40,000 RCU + 40,000 WCU |
+| 선택적 max 설정 | On-Demand에 **per-table max** 설정 가능 (비용 폭주 방지) |
+| 적합 워크로드 | 예측 불가·스파이키·서버리스·신규 앱 (**기본 권고**) | 안정·예측 가능·지속 고부하 (Reserved 활용 시 비용 절감) |
+
+### 15.2 함의
+
+- **Tier 2 API 기본 권고**: On-Demand — Lambda 동시성 확장과 자연 매칭, idle 시 $0.
+- **예측 가능 고부하**: Provisioned + Reserved Capacity로 On-Demand 대비 50~70% 절감 가능.
+- **하이브리드**: GSI를 다른 capacity mode로 설정 가능 → 조회 빈도 차이 반영.
+- **전환 정책**: 24시간 전환 제한은 마이그레이션 테스트 기간 동안 고려해야 함.
+
+---
+
+## 16. S3 Express One Zone — 트레이드오프 사실
+
+**출처**: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html (Snapshot 2026-04-18)
+**보조 출처**: https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html
+
+### 16.1 특성
+
+| 항목 | 값 |
+|------|-----|
+| AZ 스코프 | **단일 AZ** (가용성 SLA 99.95%). 다중 디바이스 중복 저장이지만 AZ 간 복제 없음 |
+| 지연 시간 | 한 자리 ms (S3 Standard 대비 ~10x 빠름) |
+| 처리량 | 버킷당 **읽기 200,000 TPS / 쓰기 100,000 TPS** (쿼터 증가 가능) |
+| 버킷 타입 | **Directory bucket** (S3 general purpose bucket과 별도 스키마) |
+| 버킷 네이밍 | `{base-name}--{zone-id}--x-s3` (예: `my-bucket--usw2-az1--x-s3`), 3-63자 |
+| 스토리지 구조 | **계층형 디렉토리** (slash delimiter로 폴더 자동 생성), GPB의 flat prefix와 다름 |
+| 요청 가격 | Standard 대비 **약 50% 저렴** (per-request) |
+| 스토리지 가격 | Standard 대비 단가 유사 혹은 소폭 저렴 (단위 per-GB 비용은 리전별 차이). **비용 이점의 주 원천은 요청 단가** |
+| 일관성 | Strong read-after-write (기본) |
+| 암호화 | SSE-S3(자동) 또는 SSE-KMS. **SSE-C 미지원** |
+| ACL | 항상 bucket-owner-enforced (ACL 비활성) |
+| Block Public Access | **항상 On** (해제 불가) |
+| 데이터 전송 | 동일 AZ 내 EC2/ECS/Lambda에서 호출 시 DTO 비용 없음 |
+| 쿼터 | 계정당 리전별 디렉토리 버킷 **100개** (증가 가능) |
+| 기능 제한 | 버저닝 없음, Lifecycle 일부, CRR/SRR 없음, Intelligent-Tiering·Glacier 불가 |
+
+### 16.2 권장 워크로드
+
+- **적합**: 비디오 편집/크리에이티브, ML 훈련 데이터 random access, 실시간 분석, 인터랙티브 앱, shuffle/spill, 분석 중간 결과.
+- **부적합**: 장기 아카이브, 다중 리전 재해복구, 규제 요구 다내구성 저장, 외부 공개 CDN 오리진, ACL 필요 워크로드.
+
+### 16.3 함의
+
+- **Tier 1 배치 워크로드 shuffle storage**: autoresearch nanoGPT 훈련 데이터 로딩 가속 후보. 단, EC2/Fargate Spot과 **동일 AZ** 배치 필수.
+- **단일 AZ 제약**: AZ 장애 시 데이터 유실 가능 → 원본은 Standard/Glacier에 별도 보관, 중간 결과만 Express.
+- **비용 역전 함정**: 저빈도 접근 시 Standard 대비 **총비용 증가** (요청 단가 절감은 요청 빈도에 비례).
+- **네이밍 규칙**: `--{zone-id}--x-s3` 패턴 필수. IaC 템플릿/버킷명 생성 로직에 강제.
+- **CreateSession auth 모델**: 객체 오퍼레이션 전 `s3express:CreateSession` 필요 → SDK 버전·IAM 정책 업그레이드 필요.
+
+---
+
+## 17. Step Functions — Standard vs Express
+
+**출처**: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-standard-vs-express.html (Snapshot 2026-04-18)
+
+### 17.1 비교
+
+| 항목 | Standard | Async Express | Sync Express |
+|------|----------|---------------|--------------|
+| 최대 실행 시간 | **1년** | 5분 | 5분 (콘솔 `StartSyncExecution`은 60s 만료, SDK/CLI는 5분까지) |
+| 실행 시맨틱 | **Exactly-once** (내부 상태 영속) | **At-least-once** (상태 비영속, 중복 가능) | **At-most-once** (상태 비영속, 재시도 없음) |
+| 실행 이력 | API로 조회, 콘솔 시각적 디버깅, **90일 보관** (30일로 단축 가능) | Step Functions 미포착 → CloudWatch Logs 활성화 필수 | CloudWatch Logs 활성화 필수 |
+| 처리량 | state transition rate 제한 (account quota) | 초당 수만~수십만 실행 | account 용량 제한과 **분리** (자동 스케일) |
+| 청구 모델 | **per state transition** | per execution × duration × memory | per execution × duration × memory |
+| 지원 통합 | 모든 서비스 통합 + `.sync`, `.waitForTaskToken` | `.sync`, `.waitForTaskToken` 미지원 | `.sync`, `.waitForTaskToken` 미지원 |
+| Distributed Map / Activities | 지원 | 미지원 | 미지원 |
+| Idempotency | 동일 이름 재실행 시 자동 idempotent 응답 | 자동 관리 **없음** — 동명 동시 실행 가능 | 자동 관리 없음, 예외 시 재실행 없음 |
+
+### 17.2 함의
+
+- **Tier 1 배치 분해**: 15분 초과 배치는 Standard 워크플로로 분해. Spot 재시도 로직을 상태기계에 명시하고 exactly-once 보장 활용.
+- **Tier 2 API 이벤트 후처리**: Async Express — 짧은 fan-out·스트리밍 이벤트, API 응답 후처리.
+- **Tier 2 API 동기 마이크로서비스**: Sync Express — API Gateway 뒤 실시간 워크플로, at-most-once 수용 가능할 때.
+- **at-least-once 함정**: Async Express는 중복 실행 가능 → 멱등성 설계(§12 원칙 7) 전제 필수. 비멱등 작업(예: 결제)은 Standard 선택.
+- **5분 한계**: Express로는 장시간 워크플로 불가. Standard로 분할하거나 `StartExecution`으로 체인.
+- **실행 이력**: 감사·디버깅 필요 시 Standard (90일 retention, 30일로 축소 요청 가능). 비용 우선이면 Express + 명시적 CloudWatch Logs.
+- **Workflow type immutable**: state machine 생성 후 Standard↔Express 변경 불가 → 설계 초기 결정.
+
+---
+
+*본 리서치 작성: 2026-04-18. SPEC/PLAN과 쌍을 이루며 implementation Stage 1에서 보강.*
diff --git a/issues/3-serverless-migration/SPEC.md b/issues/3-serverless-migration/SPEC.md
new file mode 100644
index 0000000..b8a8fae
--- /dev/null
+++ b/issues/3-serverless-migration/SPEC.md
@@ -0,0 +1,424 @@
+# Issue #3: Serverless Migration Advisor 스킬 스펙 문서
+
+> **스코프**: 기존 always-on 아키텍처를 AWS 서버리스 + Spot 패턴으로 이행하기 위한 **업스트림 어드바이저** 스킬.
+> 세부 how-to는 카테고리 내 타 스킬(예: `sagemaker-spot-training`)로 위임한다.
+> **검증 근거**: `serverless-autoresearch`(Tier 1, $3.94/48실험), `serverless-openclaw`(Tier 2, ~$1/월).
+
+## 1. 개요
+
+### 1.1 목적
+
+기존 AWS 워크로드(배치/ETL/API/이벤트 기반/모놀리스)를 서버리스 + Spot 조합으로 이행할 때,
+트레이드오프를 **AWS 공식문서 기반 근거**와 **실 검증 사례**로 명시해
+사용자가 의사결정·리스크 평가·단계별 이행 계획을 세우도록 돕는다.
+
+### 1.2 핵심 가치
+
+- **공식문서 근거 인용**: 모든 트레이드오프 주장은 AWS Docs 섹션 또는 Serverless Lens 원칙으로 traceable.
+- **실 검증 사례 연결**: 두 오픈소스 프로젝트 인사이트를 번호화하여 인용 가능.
+- **심층 인터뷰 기반**: AskUserQuestion으로 워크로드 제약·위험 허용도를 단계적으로 수집.
+- **위임 명확화**: 서비스 확정 후 구현 스킬로 핸드오프. 중복 지식 방지.
+- **전 티어 지원**: Tier 1(배치) / Tier 2(API) / Tier 3(모놀리스·데이터) 모두 원칙+패턴 수준으로 다룸.
+
+---
+
+## 2. 아키텍처
+
+### 2.1 스킬 배치
+
+```text
+plugins/workflow/skills/serverless-migration-advisor/
+├── SKILL.md                                # 메인 지침 (500줄 이하)
+└── references/
+    ├── tradeoffs-compute.md                # Lambda/Fargate/Batch/SageMaker/EC2 Spot 공식 트레이드오프
+    ├── tradeoffs-spot.md                   # Spot 용량/인터럽트/HUGI/billable 정의
+    ├── tradeoffs-data-layer.md             # RDS / Aurora Serverless v2 / DynamoDB / S3 Express
+    ├── tradeoffs-event-driven.md           # EventBridge / SQS / Kinesis / Step Functions
+    ├── serverless-lens.md                  # AWS Well-Architected Serverless Lens 원칙 요약 + 링크
+    ├── patterns-tier1-batch.md             # 배치/훈련/ETL 이행 패턴
+    ├── patterns-tier2-api.md               # 상시형 API / 웹 이행 패턴
+    ├── patterns-tier3-monolith.md          # Strangler Fig, 모놀리스 분해
+    ├── patterns-tier3-data.md              # RDS → DynamoDB, CDC 전이
+    ├── interview-bank.md                   # Phase별 AskUserQuestion 질문/옵션 풀
+    ├── case-study-autoresearch.md          # 48실험 $3.94, H100 229s $0.16
+    ├── case-study-openclaw.md              # Lambda + Fargate Spot ~$1/월
+    └── source-insights.md                  # 번호화 검증 인사이트 (Insight #N 형식 인용)
+```
+
+### 2.2 카테고리 위치
+
+`plugins/workflow/` (이행 프로세스 지향). `development/` 카테고리의 `sagemaker-spot-training`(how-to)과 역할 분리.
+
+### 2.3 메인 스킬 역할
+
+1. **워크로드 분류기**: 인터뷰로 Tier 1/2/3 및 세부 타입 결정.
+2. **트레이드오프 서피스**: AWS Docs + 검증 사례 기반으로 각 후보 서비스의 한계 명시.
+3. **의사결정 보조**: 비용/지연/관리성/벤더 락인 축으로 후보 비교.
+4. **단계별 이행 계획 생성**: 검증-가능한-스테이지 체크리스트.
+5. **구현 스킬 위임**: 타겟 확정 후 구현 how-to 스킬로 핸드오프.
+
+---
+
+## 3. 동작 흐름 (5 Phase)
+
+### Phase 1 — 워크로드 분류 인터뷰
+
+AskUserQuestion 질문:
+- Q1. 워크로드 타입: `배치/훈련` / `ETL` / `상시 API` / `스케줄 작업` / `이벤트 기반` / `모놀리스` / `기타`
+- Q2. 현재 실행 환경: `EC2 24/7` / `EC2 + Auto Scaling` / `ECS/EKS` / `EMR` / `온프레` / `기타`
+- Q3. 실행 빈도: `상시` / `일 수회` / `일 1회` / `주·월 단위` / `이벤트 기반 희소`
+- Q4. 작업 단위 지속 시간: `<1분` / `1~15분` / `15분~1시간` / `1~8시간` / `>8시간`
+
+### Phase 2 — IaC 스캔 (선택)
+
+사용자가 Terraform / CDK / CloudFormation 파일 경로를 제공하면 정적 분석하여 현 리소스 요약.
+제공되지 않으면 스킵하고 Phase 3으로.
+
+### Phase 3 — 제약·리스크 심층 인터뷰
+
+AskUserQuestion 질문 (워크로드 타입별로 질문 집합 분기 — [interview-bank.md](references/interview-bank.md)):
+
+**공통:**
+- Q5. 월간 목표 비용 범위
+- Q6. RTO / RPO 요구
+- Q7. 규정 준수: `PCI-DSS` / `HIPAA` / `GDPR` / `SOC2` / `없음`
+
+**Tier 1 (배치) 특화:**
+- Spot 인터럽트 허용도: `허용 (재시도 가능)` / `제한적 (체크포인트 있음)` / `불가`
+- 작업 최대 허용 wall-clock
+
+**Tier 2 (API) 특화:**
+- 콜드 스타트 허용도 p99: `<500ms` / `<2s` / `<5s` / `허용`
+- 지속 연결 필요 여부: WebSocket / SSE / gRPC streaming
+
+**Tier 3 (모놀리스/데이터) 특화:**
+- 다운타임 윈도우
+- 데이터 일관성 요구: `strong` / `eventual 허용`
+- 벤더 락인 허용도
+
+### Phase 4 — 타겟 아키텍처 추천
+
+references/ 참조하여 후보 서비스 매핑:
+
+| 워크로드 | Primary | Secondary | 제외 사유 예 |
+|---------|---------|-----------|-------------|
+| 배치 훈련 | SageMaker Managed Spot | AWS Batch (Spot) | Lambda 15분 한계 |
+| ETL (<15분) | Lambda | Step Functions + Lambda | Batch는 오버헤드 과다 |
+| ETL (>15분) | AWS Batch (Spot) | Step Functions + Fargate | Lambda 불가 |
+| 상시 API (버스트) | Lambda + API Gateway | Fargate (On-Demand) | ALB 고정비 $18~25/월 |
+| 상시 API (대용량 지속) | Fargate | Fargate + Fargate Spot 혼합 | Lambda 6MB 페이로드 한계 |
+| 이벤트 처리 | EventBridge + Lambda | Step Functions | - |
+| 모놀리스 | Strangler Fig 분해 후 적용 | - | 직접 이행 불가 |
+
+각 매핑에는 **공식문서 인용**과 **Insight #N** 레이블 동반.
+
+### Phase 5 — 리포트 생성
+
+`docs/serverless-migration/YYYY-MM-DD-{topic}.md` 생성.
+아래 §5 리포트 구조 참조.
+
+---
+
+## 4. AWS 공식문서 트레이드오프 스냅샷 (2026-04-18 기준)
+
+스킬은 아래 사실들을 `references/tradeoffs-*.md`에 보존하고, 변경 시 수동 리뷰.
+
+### 4.1 Lambda (트레이드오프)
+
+| 항목 | 값 | 출처 |
+|------|-----|------|
+| 최대 메모리 | 10,240 MB | [Lambda quotas §Function config] |
+| 최대 실행 시간 | 900초 (15분) | 동일 |
+| 동시 실행 기본 쿼터 | 1,000 | 동일 |
+| zip 패키지 | 50MB(zipped) / 250MB(unzipped) | 동일 |
+| 컨테이너 이미지 | 최대 10 GB | 동일 |
+| /tmp | 512MB~10,240MB | 동일 |
+| 동기 페이로드 | 요청·응답 각 6MB | 동일 |
+| 스트리밍 응답 | 최대 200MB | 동일 |
+| 1 vCPU 등가 | 1,769 MB | 동일 |
+
+**함의 (스킬이 전달):**
+- 15분 초과 작업은 Lambda 불가 → Step Functions 또는 Batch로 분해·위임.
+- 지속 연결(WebSocket long-lived) 필요 시 Fargate 권장. Lambda는 API Gateway WebSocket으로 메시지 단위만.
+- 제공 메모리=CPU 연동이므로 지연 최적화 시 메모리 상향 → CPU도 상향되는 trade-off.
+
+### 4.2 SageMaker Managed Spot Training
+
+| 항목 | 값 |
+|------|-----|
+| 비용 절감 | on-demand 대비 최대 90% |
+| 절감률 공식 | `(1 - BillableTime / TrainingTime) × 100` |
+| 필수 조건 | `MaxWaitTime > MaxRuntime` |
+| 체크포인트 미사용 시 | 내장·마켓플레이스 알고리즘 `MaxWaitTime ≤ 3600s` |
+| 상태 전이 | `Starting → Downloading → Training → (Interrupted → Starting) → Uploading` |
+
+**함의:**
+- 인터럽트 허용 워크로드에 한함. 체크포인트 없으면 1시간 한도.
+- Billable은 **Training 시간만** 포함 (Starting/Downloading 제외). HUGI 원칙의 AWS 공식 구현.
+
+### 4.3 Fargate Spot
+
+| 항목 | 값 |
+|------|-----|
+| 인터럽트 경고 | 2분 |
+| 신호 | SIGTERM (EventBridge + 컨테이너) |
+| `stopTimeout` | 기본 30초, 최대 120초 |
+| 정전 시 자동 복구 | 서비스라면 재시도; 단일 태스크는 용량 확보까지 중단 |
+| On-Demand fallback | **자동 아님** (사용자가 `capacityProviderStrategy`로 혼합 필요) |
+
+**함의:**
+- 단일 태스크 + Fargate Spot = 가용성 위험. 서비스 + `desiredCount ≥ 2` 또는 용량공급자 혼합 필수.
+- 2분 내 정리가 가능해야 하므로 SIGTERM 핸들러 의무.
+
+### 4.4 EC2 Spot
+
+| 항목 | 값 |
+|------|-----|
+| 인터럽트 사유 | Capacity / Price / Constraint |
+| 인터럽트 동작 | `terminate` (기본) / `stop` / `hibernate` |
+| 경고 | 2분, EventBridge + IMDSv2 메타데이터 |
+| Rebalance recommendation | 인터럽트 전 선제 신호 |
+| 최대가 지정 시 | 인터럽트 빈도 **증가** |
+| 검증 방법 | AWS FIS로 인터럽트 주입 테스트 |
+
+**함의:**
+- 최대가 설정은 역효과. 기본(On-Demand 가격 상한)이 최선.
+- 리밸런스 시그널 활용하면 사실상 2분보다 긴 여유 확보 가능.
+- CloudTrail `BidEvictedEvent`로 사후 감지.
+
+### 4.5 AWS Batch with Spot
+
+| 할당 전략 | 특성 | 추천 |
+|-----------|------|------|
+| `BEST_FIT` | 최소 가용 인스턴스 | Spot 비추천 (인터럽트률 ↑) |
+| `BEST_FIT_PROGRESSIVE` | 용량 부족 시 상위 인스턴스 승격 | 중간 |
+| `SPOT_CAPACITY_OPTIMIZED` | 인터럽트 가능성 최소 | **Spot 표준** |
+
+권장 패턴:
+- `retryStrategy.attempts ≥ 2` + `evaluateOnExit`로 재시도 사유 구분.
+- 큐 우선순위: Spot compute-env 우선, On-Demand compute-env fallback.
+
+### 4.6 AWS Well-Architected Serverless Lens
+
+- `references/serverless-lens.md`에 9개 설계원칙 요약 보관.
+- **원칙↔스킬 출력 매핑**:
+  - "Speed up your development cycle" → Delegation 섹션에서 IaC 자동화 제안
+  - "Services, not servers" → 타겟 서비스 추천 근거
+  - "Anticipate and handle errors" → Tradeoff Dossier 리스크 섹션
+
+---
+
+## 5. 출력 리포트 스키마
+
+```markdown
+# Serverless Migration Plan — {workload-name}
+
+**생성 일시**: YYYY-MM-DD HH:MM
+**분류**: Tier {1|2|3} / {sub-type}
+**검증 사례 참조**: {autoresearch | openclaw | 없음}
+
+## Executive Summary
+
+| 항목 | AS-IS (현재) | TO-BE (타겟) | 변화 |
+|------|--------------|--------------|------|
+| 월간 비용 | $X | $Y | -Z% |
+| 관리 오버헤드 | … | 0 | … |
+| 콜드 스타트 p99 | 0ms | Xs | 신규 리스크 |
+| Spot 인터럽트 리스크 | N/A | X% 예상 | 신규 리스크 |
+| 규정 준수 | … | … | … |
+
+## Workload Classification
+- 타입, 실행 빈도, 지속 시간, 트래픽 패턴 요약.
+
+## Target Architecture
+- Primary: {서비스}
+- Alternatives: {서비스}
+- 선택 근거: [AWS Docs §…], [Insight #N]
+
+## Tradeoff Dossier
+
+| 리스크 | 근거 | 완화 전략 |
+|--------|------|-----------|
+| Spot 인터럽트 | [AWS Docs: Spot Interruptions] | `max_wait`, 체크포인트 |
+| 콜드 스타트 p99 | [Lambda Quotas] | Provisioned Concurrency |
+| … | … | … |
+
+## Staged Migration Checklist
+
+- [ ] **Stage 0 — 사전 준비**
+  - [ ] 서비스 쿼터 확인 / 증액 요청
+  - [ ] Spot placement score 조사 (배치 워크로드만)
+  - [ ] IAM 역할 최소 권한 정의
+- [ ] **Stage 1 — 저위험 검증**
+  - [ ] 데이터 샘플로 dry-run
+  - [ ] 비용 측정
+- [ ] **Stage 2 — 파일럿 이행**
+  - [ ] 트래픽 X% 분기
+  - [ ] 관측·알람 구축
+- [ ] **Stage 3 — 전환**
+  - [ ] DNS/로드밸런서 절체
+  - [ ] 이전 리소스 폐기
+
+## Delegation
+
+> 타겟이 **SageMaker Managed Spot**으로 확정되었습니다.
+> 세부 구현은 `sagemaker-spot-training` 스킬로 이어 진행해주세요.
+
+## Citations
+
+- AWS Docs: Lambda Quotas — §Function config
+- AWS Docs: SageMaker Managed Spot Training
+- AWS Docs: Fargate Capacity Providers
+- AWS Docs: EC2 Spot Interruptions
+- AWS Docs: AWS Batch with Spot
+- AWS Well-Architected Serverless Lens (2022-07-14)
+- Source Project Insight #1, #5, #13 (`source-insights.md`)
+- 검증 사례: `case-study-{autoresearch|openclaw}.md`
+```
+
+---
+
+## 6. 인터뷰 시스템 상세
+
+### 6.1 원칙
+
+- **질문당 2-4 옵션** + 자동 "기타".
+- **단계별 분기**: Phase 1 응답에 따라 Phase 3 질문 집합이 달라짐 (interview-bank.md).
+- **Why/How 설명**: 트레이드오프 질문에는 description에 "왜 이 질문이 중요한지" 기재.
+
+### 6.2 예시 (Phase 3, Tier 2 API 선택 시)
+
+```
+Q: 콜드 스타트 p99 허용 한도는?
+  1. <500ms — Provisioned Concurrency + warm-up 필요
+  2. <2s — Lambda 기본 (Node/Python) 충족 가능
+  3. <5s — Lambda 컨테이너 (Java/대형 의존성) 허용
+  4. 허용 — 비용 최적화 우선
+```
+
+### 6.3 인터뷰 스킵 조건
+
+- IaC 파일에서 핵심 정보가 추출 가능하고 사용자가 "기본값 사용"을 선택한 경우.
+
+---
+
+## 7. Delegation 및 스킬 간 호출
+
+### 7.1 원칙
+
+본 스킬은 **문서화 + 의사결정**까지. 구현 how-to는 다음 스킬로 위임:
+
+| 타겟 확정 서비스 | 위임 스킬 | 상태 |
+|-----------------|-----------|------|
+| SageMaker Managed Spot Training | `sagemaker-spot-training` | 존재 |
+| AWS Lambda | (향후 `lambda-deployment`) | 미존재 |
+| AWS Batch | (향후 `aws-batch-workflow`) | 미존재 |
+| ECS Fargate / Fargate Spot | (향후 `fargate-service`) | 미존재 |
+
+미존재 스킬로 위임 시: 리포트 "Delegation" 섹션에 "이 단계는 아직 전용 스킬이 없습니다. [링크된 AWS Docs]를 참고하세요." 표기.
+
+### 7.2 컨텍스트 핸드오프
+
+리포트 파일(`docs/serverless-migration/{date}-{topic}.md`)이 다음 세션의 컨텍스트 소스가 됨. 위임 스킬은 해당 파일을 읽어 Phase 4-5의 결정사항을 승계.
+
+---
+
+## 8. 검증 사례 인용 규칙
+
+### 8.1 두 검증 사례
+
+| 사례 | 티어 | 핵심 숫자 | 참조 파일 |
+|------|------|-----------|-----------|
+| serverless-autoresearch | Tier 1 | 48실험 $3.94, H100 229s $0.16 vs 상용 $7~24/8h | `case-study-autoresearch.md` |
+| serverless-openclaw | Tier 2 | ~$1/월, Lambda 1.35s 콜드 스타트, Fargate Spot fallback | `case-study-openclaw.md` |
+
+### 8.2 인용 라벨
+
+- `[AWS Docs]` — AWS 공식 문서 출처
+- `[Insight #N]` — source-insights.md의 번호화 항목
+- `[Case: autoresearch/openclaw]` — 사례 근거
+
+세 라벨은 리포트·SKILL.md 본문·references 전체에 일관 적용.
+
+---
+
+## 9. Tier 3 취급 방침
+
+HANDOFF는 Tier 3를 out-of-scope로 제안했으나 사용자 결정으로 **동등 취급**.
+단 실 검증 사례가 없으므로:
+
+1. **원칙 수준 패턴만 제시**: Strangler Fig, Branch by Abstraction, CDC (Change Data Capture) 기반 이전.
+2. **공식문서 링크로 상세 회피**: 스킬은 의사결정 질문과 AWS 공식 가이드 URL 제공까지.
+3. **리스크 경고 명시**: 리포트 상단에 "Tier 3는 검증 사례 없음 — 파일럿 필수" 주의.
+
+---
+
+## 10. 산출물 목록
+
+| 파일 | 설명 |
+|------|------|
+| `plugins/workflow/skills/serverless-migration-advisor/SKILL.md` | 메인 스킬 (500줄 이하) |
+| `plugins/workflow/skills/serverless-migration-advisor/references/*.md` | 13개 참조 파일 |
+| `plugins/workflow/.claude-plugin/plugin.json` | (기존) workflow 플러그인 메타데이터 |
+| `.claude-plugin/marketplace.json` | `workflow` 항목 `skills` 배열에 본 스킬 추가 |
+| `README.md` | Workflow 카테고리 테이블에 본 스킬 추가 |
+| `docs/serverless-migration/` | 런타임 리포트 저장소 (사용 시 생성) |
+
+---
+
+## 11. 비목표(Non-Goals)
+
+HANDOFF §6 계승:
+
+- **실시간 AWS 요금 계산 아님** — 실험 기반 범위만 인용.
+- **Terraform 생성기 아님** — 스니펫·diff 제안까지.
+- **멀티클라우드 아님** — AWS 전용.
+- **침묵 금지** — 실패 모드(CUDA 오류, 쿼터 지연, FA3 호환성 등) 반드시 노출.
+- **구현 how-to 아님** — 타 스킬로 위임.
+
+---
+
+## 12. 성공 기준 (HANDOFF §7 매핑)
+
+사용자가 "야간 배치 작업을 EC2 H100에서 돌리는데 월 $N 비용 — 싸게 하고 싶다"고 시작했을 때 **한 번의 대화**에서:
+
+1. ✅ 워크로드 분류 (Phase 1)
+2. ✅ 타겟 서버리스 패턴 식별 (Phase 4 매핑)
+3. ✅ 비용 절감 범위 추정 (검증 사례 인용)
+4. ✅ Top-3 리스크 플래그 (Tradeoff Dossier)
+5. ✅ 단계별 체크리스트 (Staged Migration Checklist)
+6. ✅ Insight #N / AWS Docs 인용 — traceable
+
+**보조 지표**: IaC 제공 시 diff-style 제안 가능 (Phase 2).
+
+---
+
+## 13. 인터뷰 요약 (설계 결정)
+
+| 항목 | 결정 |
+|------|------|
+| 스킬 구조 | 단일 오케스트레이터 |
+| sagemaker-spot-training 관계 | Upstream advisor + delegation |
+| 카테고리 | workflow/ |
+| 범위 | Tier 1 + Tier 2 + Tier 3 (Tier 3는 원칙 수준) |
+| 인터뷰 방식 | 단계별 AskUserQuestion, 5 Phase |
+| 출력 형식 | 테이블형 리포트 + 단계별 체크리스트 |
+| AWS 문서 참조 | Serverless Lens 포함 폭넓게 |
+| 인용 라벨 | `[AWS Docs]` / `[Insight #N]` / `[Case: …]` |
+| 검증 사례 | autoresearch + openclaw |
+| 런타임 AWS 호출 | 기본 OFF (요청 시만) |
+| 중복 관리 | aws-well-architected 스킬과 역할 분리 (리뷰 vs 이행) |
+
+---
+
+## 14. 오픈 질문 (구현 단계에서 결정)
+
+- **IaC 파서**: Terraform HCL 파싱은 tree-sitter? 간단한 정규식? 첫 버전은 정규식으로 필드 추출, 한계 명시.
+- **리포트 파일명 충돌**: 같은 날 여러 마이그레이션 계획 시 슬러그 중복 방지 규칙 (`-2`, `-3` 접미사).
+- **검증 사례 업데이트**: autoresearch / openclaw 커밋 해시 기록, 6개월마다 리뷰 권고.
+- **Serverless Lens 버전 추적**: 현재 2022-07-14. AWS 개정 시 `references/serverless-lens.md` 스냅샷 날짜 갱신.
+
+---
+
+*설계 기준 일자: 2026-04-18.*
+*근거 HANDOFF: `issues/3-serverless-migration/HANDOFF.md` (serverless-autoresearch commit 5435b37).*
diff --git a/issues/3-serverless-migration/TASKS.md b/issues/3-serverless-migration/TASKS.md
new file mode 100644
index 0000000..d859c46
--- /dev/null
+++ b/issues/3-serverless-migration/TASKS.md
@@ -0,0 +1,1074 @@
+# Serverless Migration Advisor — Implementation Tasks
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Ship `plugins/workflow/skills/serverless-migration-advisor/` — an upstream advisor skill that interviews users about always-on workloads and outputs a traceable serverless+Spot migration plan, citing AWS Docs and two validated case studies.
+
+**Architecture:** Single orchestrator SKILL.md + 13 focused `references/*.md` files. 5-Phase interaction: classification interview → IaC scan → tradeoff interview → target-arch mapping → report generation. Delegates implementation how-to to `sagemaker-spot-training` and future sibling skills.
+
+**Tech Stack:** Markdown (SKILL + references), YAML frontmatter, existing Vitest harness (`src/__tests__/skills.test.ts`, `integrity.test.ts`, `marketplace.test.ts`, `plugin-json.test.ts`). No runtime code.
+
+---
+
+## File Structure
+
+```
+plugins/workflow/
+├── .claude-plugin/plugin.json                           # MODIFY — bump version if needed
+└── skills/
+    └── serverless-migration-advisor/                    # CREATE
+        ├── SKILL.md                                     # CREATE — ≤500 lines, 5-Phase orchestrator
+        └── references/                                  # CREATE
+            ├── tradeoffs-compute.md                     # Lambda/Fargate/Batch/SageMaker/EC2 Spot limits
+            ├── tradeoffs-spot.md                        # Spot capacity/interrupt/HUGI/billable
+            ├── tradeoffs-data-layer.md                  # RDS/Aurora Serverless v2/DynamoDB/S3 Express
+            ├── tradeoffs-event-driven.md                # EventBridge/SQS/Kinesis/Step Functions
+            ├── serverless-lens.md                       # WA Serverless Lens 9 design principles
+            ├── patterns-tier1-batch.md                  # Batch/ETL/training migration patterns
+            ├── patterns-tier2-api.md                    # Always-on API migration patterns
+            ├── patterns-tier3-monolith.md               # Strangler Fig / decomposition
+            ├── patterns-tier3-data.md                   # RDS→DynamoDB / CDC migration
+            ├── interview-bank.md                        # AskUserQuestion templates by phase
+            ├── case-study-autoresearch.md               # Tier 1 case study
+            ├── case-study-openclaw.md                   # Tier 2 case study
+            └── source-insights.md                       # Numbered insights from both cases
+
+.claude-plugin/marketplace.json                          # MODIFY — add skill to workflow.skills
+README.md                                                # MODIFY — add to Workflow skills table
+CHANGELOG.md                                             # MODIFY — new entry
+```
+
+**Global test command** (use after every stage): `npm test`
+**Global dev-loop**: `npx vitest` (watch mode)
+
+---
+
+## Stage A — Research Top-Up (prerequisite to writing references)
+
+Goal: Fill gaps in `RESEARCH.md` so references/ can be authored without further lookups.
+
+### Task A1: Serverless Lens 9 design principles
+
+**Files:**
+- Modify: `issues/3-serverless-migration/RESEARCH.md` (append §12)
+
+- [ ] **Step 1: WebFetch design-principles page**
+
+Run (as tool call):
+```
+WebFetch({
+  url: "https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/design-principles.html",
+  prompt: "List each of the 9 serverless design principles with the exact title and a one-sentence summary. Preserve numbering."
+})
+```
+Expected: 9 numbered principles.
+
+- [ ] **Step 2: Append §12 to RESEARCH.md**
+
+Append a new section:
+
+```markdown
+## 12. Serverless Lens — 9 Design Principles (verbatim titles, paraphrased summaries)
+
+| # | Title | Summary | 본 스킬에서의 활용 |
+|---|-------|---------|-------------------|
+| 1 | … | … | Phase 4 타겟 선택 근거 |
+| … | … | … | … |
+```
+
+Fill from fetch result. Each row must cite the principle title verbatim.
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add issues/3-serverless-migration/RESEARCH.md
+git commit -m "research(serverless-migration): add WA Serverless Lens 9 design principles (#3)"
+```
+
+### Task A2: Lambda SnapStart + Aurora Serverless v2 + DynamoDB + S3 Express + Step Functions
+
+**Files:**
+- Modify: `issues/3-serverless-migration/RESEARCH.md` (append §13-17)
+
+- [ ] **Step 1: Parallel WebFetch (5 calls in one message)**
+
+Fetch these URLs in parallel:
+- `https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html` — "Extract SnapStart supported runtimes, restore latency range, excluded features (VPC ENI lifecycle, uniqueness pitfalls), and cost model."
+- `https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html` — "Extract ACU range, scaling granularity, cold-start behavior vs v1, minimum capacity pricing floor, compatibility with provisioned."
+- `https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html` — "Compare On-Demand vs Provisioned capacity: billing units, auto-scaling, when to prefer each."
+- `https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html` — "Extract S3 Express One Zone characteristics: AZ scope, latency claims, pricing model differences, directory bucket naming rules, workloads recommended vs not."
+- `https://docs.aws.amazon.com/step-functions/latest/dg/concepts-standard-vs-express.html` — "Compare Standard vs Express workflows: max duration, execution history, pricing, at-least-once vs exactly-once, throughput."
+
+- [ ] **Step 2: Append §13-17 to RESEARCH.md**
+
+Each section follows the citation format already established in §2-6 (fact table + 함의 bullet list + URL at top).
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add issues/3-serverless-migration/RESEARCH.md
+git commit -m "research(serverless-migration): add SnapStart/Aurora/DynamoDB/S3-Express/Step-Functions (#3)"
+```
+
+### Task A3: Case study fact extraction — autoresearch
+
+**Files:**
+- Modify: `issues/3-serverless-migration/RESEARCH.md` (§10.1 확장)
+
+- [ ] **Step 1: Read autoresearch insights.md fully**
+
+Run:
+```
+Read({ file_path: "/Users/dohyunjung/Workspace/roboco-io/research/serverless-autoresearch/docs/insights.md" })
+```
+
+- [ ] **Step 2: Read comparison-report.md and experiments/003-h100-comparison/results-summary.md**
+
+Parallel reads.
+
+- [ ] **Step 3: Rewrite §10.1 with numbered insight titles (1-15)**
+
+Each row:
+```markdown
+| # | Title | One-line lesson | Tier usage |
+|---|-------|-----------------|------------|
+| 1 | Spot Capacity Varies Dramatically by Region | 동일 인스턴스 타입도 리전마다 placement score 1~9 편차 | Tier 1 배치 |
+| … | … | … | … |
+```
+
+Record commit hash at the end of §10.1: `(Snapshot: git rev-parse HEAD in autoresearch)`.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add issues/3-serverless-migration/RESEARCH.md
+git commit -m "research(serverless-migration): extract 15 autoresearch insights with tier mapping (#3)"
+```
+
+### Task A4: Case study fact extraction — openclaw
+
+**Files:**
+- Modify: `issues/3-serverless-migration/RESEARCH.md` (§10.2 확장)
+
+- [ ] **Step 1: Fetch openclaw README + architecture docs**
+
+```
+WebFetch({
+  url: "https://github.com/serithemage/serverless-openclaw",
+  prompt: "Extract the architecture: compute choices (Lambda Container primary, ECS Fargate Spot fallback), cost breakdown ($1-2/month, Free Tier $0.23), cold start numbers (1.35s), pre-warming strategy (EventBridge scheduled), API Gateway vs ALB savings ($18-25/month)."
+})
+```
+
+- [ ] **Step 2: Expand §10.2**
+
+Add a principles-extracted table:
+
+```markdown
+| # | Principle from openclaw | Tier usage |
+|---|------------------------|------------|
+| O1 | Lambda Container + dual compute fallback | Tier 2 API |
+| O2 | API Gateway over ALB to eliminate $18-25/mo baseline | Tier 2 API |
+| O3 | EventBridge scheduled pre-warming during active hours | Tier 2 API |
+| O4 | S3 session persistence for stateless Lambda | Tier 2 API |
+| O5 | CloudFront + S3 for web UI | Tier 2 API |
+```
+
+Use `O1-O5` prefix to distinguish from autoresearch's numeric inserts.
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add issues/3-serverless-migration/RESEARCH.md
+git commit -m "research(serverless-migration): extract openclaw architecture principles (#3)"
+```
+
+---
+
+## Stage B — Skill Scaffold
+
+Goal: Create directory structure + empty frontmatter-only files so integrity tests can run incrementally as we fill content.
+
+### Task B1: Create skill directory + SKILL.md stub
+
+**Files:**
+- Create: `plugins/workflow/skills/serverless-migration-advisor/SKILL.md`
+
+- [ ] **Step 1: Verify parent directory**
+
+Run:
+```bash
+ls plugins/workflow/skills/
+```
+Expected: `git-workflow  intent-engineering  tidd` (no `serverless-migration-advisor` yet).
+
+- [ ] **Step 2: Write SKILL.md stub**
+
+Write to `plugins/workflow/skills/serverless-migration-advisor/SKILL.md`:
+
+```markdown
+---
+name: serverless-migration-advisor
+description: AWS always-on 아키텍처(EC2/ALB/ECS/RDS)를 서버리스+Spot 패턴으로 이행할 때 사용. 워크로드 분류, 트레이드오프 평가, 단계별 이행 계획 생성. 구현 how-to는 sagemaker-spot-training 등 후속 스킬로 위임. 트리거 예 - "서버리스 이행", "EC2에서 Lambda로", "Spot 이행", "serverless migration", "ALB에서 API Gateway", "비용 절감 이행".
+---
+
+# Serverless Migration Advisor
+
+TBD — filled in Stage F.
+```
+
+- [ ] **Step 3: Run tests (expect new-skill failures until integrity is updated)**
+
+Run:
+```bash
+npm test
+```
+Expected: tests in `skills.test.ts` may pass (only frontmatter check at this point); `integrity.test.ts` will likely fail because marketplace.json not yet updated. **Record the exact failure** — if only integrity fails, proceed to B2. If skills.test.ts fails, fix frontmatter.
+
+### Task B2: Create 13 reference stubs
+
+**Files:**
+- Create all under `plugins/workflow/skills/serverless-migration-advisor/references/`:
+
+- [ ] **Step 1: Create all 13 stubs in one batch**
+
+For each of the following filenames, create the file with the stub content:
+
+| File | Description line |
+|------|-------|
+| tradeoffs-compute.md | Lambda / Fargate / Batch / SageMaker / EC2 Spot 공식 트레이드오프 |
+| tradeoffs-spot.md | Spot 용량·인터럽트·HUGI·billable 정의 |
+| tradeoffs-data-layer.md | RDS / Aurora Serverless v2 / DynamoDB / S3 Express |
+| tradeoffs-event-driven.md | EventBridge / SQS / Kinesis / Step Functions |
+| serverless-lens.md | AWS Well-Architected Serverless Lens 9개 설계원칙 |
+| patterns-tier1-batch.md | 배치·훈련·ETL 이행 패턴 |
+| patterns-tier2-api.md | 상시형 API·웹 이행 패턴 |
+| patterns-tier3-monolith.md | Strangler Fig 기반 모놀리스 분해 |
+| patterns-tier3-data.md | RDS→DynamoDB, CDC 전이 |
+| interview-bank.md | Phase별 AskUserQuestion 질문 뱅크 |
+| case-study-autoresearch.md | Tier 1 검증 사례 — serverless-autoresearch |
+| case-study-openclaw.md | Tier 2 검증 사례 — serverless-openclaw |
+| source-insights.md | 번호화된 검증 인사이트 (Insight #N 인용 대상) |
+
+Each stub content:
+
+```markdown
+> **Snapshot date**: 2026-04-18
+> **Description**: {description line from table above}
+
+TBD — content filled in Stage C/D/E.
+```
+
+- [ ] **Step 2: Verify all 13 files exist**
+
+Run:
+```bash
+ls plugins/workflow/skills/serverless-migration-advisor/references/ | wc -l
+```
+Expected: `13`
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add plugins/workflow/skills/serverless-migration-advisor/
+git commit -m "scaffold(serverless-migration-advisor): create skill + 13 reference stubs (#3)"
+```
+
+### Task B3: Register skill in marketplace.json
+
+**Files:**
+- Modify: `.claude-plugin/marketplace.json`
+
+- [ ] **Step 1: Read current marketplace.json**
+
+Run:
+```
+Read({ file_path: "/Users/dohyunjung/Workspace/roboco-io/tools/plugins/.claude-plugin/marketplace.json" })
+```
+
+Find the `workflow` entry in `plugins[]`.
+
+- [ ] **Step 2: Append skill path to workflow.skills array**
+
+Edit the workflow entry's `skills` array to include:
+```
+"./skills/serverless-migration-advisor/SKILL.md"
+```
+Preserve alphabetical or existing order (match current convention).
+
+- [ ] **Step 3: Run tests**
+
+Run:
+```bash
+npm test
+```
+Expected: all green. If `skills.test.ts` complains about line count, the SKILL.md stub is within limits; if `integrity.test.ts` still fails, verify the exact path matches SKILL.md location.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add .claude-plugin/marketplace.json
+git commit -m "scaffold(serverless-migration-advisor): register in marketplace (#3)"
+```
+
+---
+
+## Stage C — Authoring `tradeoffs-*.md` (5 files)
+
+Each task in this stage has the same shape: write the file from RESEARCH.md source, then run tests, then commit. Keep each file ≤300 lines.
+
+### Task C1: tradeoffs-compute.md
+
+**Files:**
+- Modify: `plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-compute.md`
+
+- [ ] **Step 1: Author content**
+
+Structure:
+```markdown
+> Snapshot date: 2026-04-18
+
+# Compute Service Tradeoffs
+
+## AWS Lambda
+- Quota table (copy from RESEARCH §2.1 verbatim)
+- Tradeoff implications (bullet list, 5-7 items)
+- When to choose / avoid (table)
+- Cite: [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+
+## SageMaker Managed Spot Training
+- Quota + formula (RESEARCH §3)
+- State transitions table
+- Cite: [AWS Docs — SageMaker Managed Spot Training](…)
+
+## AWS Fargate + Fargate Spot
+## Amazon EC2 Spot
+## AWS Batch (with Spot)
+```
+
+Every quantitative claim MUST have `[AWS Docs]` inline citation.
+Every pattern claim from case studies MUST have `[Insight #N]` or `[Case: autoresearch/openclaw]` citation.
+
+- [ ] **Step 2: Run tests**
+
+```bash
+npm test
+```
+Expected: green.
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-compute.md
+git commit -m "docs(serverless-migration-advisor): tradeoffs-compute from AWS Docs + cases (#3)"
+```
+
+### Task C2: tradeoffs-spot.md
+
+**Files:**
+- Modify: `references/tradeoffs-spot.md`
+
+- [ ] **Step 1: Author content**
+
+Sections:
+1. **Placement scores** — `aws ec2 get-spot-placement-scores` command + interpretation table (1-10 scoring meaning).
+2. **Interruption behaviors** — `terminate` / `stop` / `hibernate` comparison.
+3. **Interruption signals** — EventBridge event + IMDSv2 metadata example curl.
+4. **HUGI principle** — billable vs wall-clock definition, cite SageMaker formula.
+5. **Do / Don't** — e.g., "Don't set maximum price; it increases interruption rate." [AWS Docs — Spot interruptions §Price]
+6. **Testing** — AWS FIS tutorial link.
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/tradeoffs-spot.md && git commit -m "docs(serverless-migration-advisor): tradeoffs-spot with AWS FIS testing guide (#3)"
+```
+
+### Task C3: tradeoffs-data-layer.md
+
+**Files:**
+- Modify: `references/tradeoffs-data-layer.md`
+
+- [ ] **Step 1: Author content**
+
+Sections:
+1. **RDS vs Aurora Serverless v2** — ACU range, scaling, cold-start, minimum cost floor.
+2. **DynamoDB** — On-Demand vs Provisioned, consistency model, single-table tradeoff.
+3. **S3 Standard vs S3 Express One Zone** — latency claim, AZ scope, pricing, directory bucket naming.
+4. **Migration patterns preview** (pointer to patterns-tier3-data.md).
+
+Each subsection cites the RESEARCH.md §14-16 facts.
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/tradeoffs-data-layer.md && git commit -m "docs(serverless-migration-advisor): tradeoffs-data-layer (#3)"
+```
+
+### Task C4: tradeoffs-event-driven.md
+
+**Files:**
+- Modify: `references/tradeoffs-event-driven.md`
+
+- [ ] **Step 1: Author content**
+
+Sections:
+1. **EventBridge** — routing, filtering, schema registry, usage cost.
+2. **SQS** — FIFO vs Standard, long polling, DLQ, visibility timeout.
+3. **Kinesis** — sharding, ordering, data retention.
+4. **Step Functions Standard vs Express** — max duration (1 year vs 5 min), semantics (exactly-once vs at-least-once), pricing, use cases.
+5. **Decision matrix** — when to choose each.
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/tradeoffs-event-driven.md && git commit -m "docs(serverless-migration-advisor): tradeoffs-event-driven (#3)"
+```
+
+### Task C5: serverless-lens.md
+
+**Files:**
+- Modify: `references/serverless-lens.md`
+
+- [ ] **Step 1: Author content**
+
+Structure:
+```markdown
+> Snapshot date: 2026-04-18
+> Lens publication: 2022-07-14
+
+# AWS Well-Architected Serverless Applications Lens
+
+## 9 Design Principles (verbatim titles)
+
+| # | Principle | Summary | 본 스킬 활용처 |
+|---|-----------|---------|---------------|
+(fill from RESEARCH §12)
+
+## 5 Pillars — Serverless-Specific Best Practices
+(brief 1-line each with link)
+
+## How This Skill Uses the Lens
+- Phase 4 매핑에서 각 타겟 서비스 추천 근거로 원칙 N 인용
+- Report의 Tradeoff Dossier에 리스크별 해당 원칙 표기
+```
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/serverless-lens.md && git commit -m "docs(serverless-migration-advisor): serverless-lens 9 principles mapping (#3)"
+```
+
+---
+
+## Stage D — Authoring `patterns-*.md` (4 files)
+
+### Task D1: patterns-tier1-batch.md
+
+**Files:**
+- Modify: `references/patterns-tier1-batch.md`
+
+- [ ] **Step 1: Author content**
+
+Three patterns, each with a standard template:
+1. **EC2 long-running → SageMaker Managed Spot**
+2. **EMR → AWS Batch (SPOT_CAPACITY_OPTIMIZED)**
+3. **Cron on EC2 → EventBridge Scheduler + Lambda or Batch**
+
+Template per pattern:
+```markdown
+### Pattern N.M: {Before} → {After}
+
+**Applicable when:**
+- …
+
+**Before (AS-IS):**
+```text
+(ASCII diagram)
+```
+
+**After (TO-BE):**
+```text
+(ASCII diagram)
+```
+
+**Key changes:**
+- …
+
+**Tradeoffs surfaced:**
+- …
+
+**Cost range (from case study):**
+- {range} [Case: autoresearch]
+
+**Migration checklist skeleton:**
+- [ ] Stage 0: …
+- [ ] Stage 1: …
+- [ ] Stage 2: …
+- [ ] Stage 3: …
+
+**Delegate to:** `sagemaker-spot-training` / `aws-batch-workflow` (future)
+
+**Citations:**
+- [AWS Docs — …]
+- [Insight #N]
+```
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/patterns-tier1-batch.md && git commit -m "docs(serverless-migration-advisor): patterns-tier1-batch (#3)"
+```
+
+### Task D2: patterns-tier2-api.md
+
+**Files:**
+- Modify: `references/patterns-tier2-api.md`
+
+- [ ] **Step 1: Author content**
+
+Patterns:
+1. **ALB + EC2 → API Gateway + Lambda** (cite openclaw O2)
+2. **ECS 상시 서비스 → Fargate + Fargate Spot 혼합** (cite openclaw O1)
+3. **WebSocket 상시 → API Gateway WebSocket + Lambda**
+4. **Java monolith on EC2 → Lambda + SnapStart**
+
+Same template as D1.
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/patterns-tier2-api.md && git commit -m "docs(serverless-migration-advisor): patterns-tier2-api (#3)"
+```
+
+### Task D3: patterns-tier3-monolith.md
+
+**Files:**
+- Modify: `references/patterns-tier3-monolith.md`
+
+- [ ] **Step 1: Author content**
+
+Must include a prominent warning at top:
+
+```markdown
+> **⚠️ Tier 3 범위 경고**: 두 검증 사례(autoresearch, openclaw)는 Tier 3 이행을 직접 검증하지 않았습니다. 본 문서는 AWS 공식 패턴 링크 + 원칙 수준 가이드만 제공합니다. 대규모 프로젝트 전 파일럿 필수.
+```
+
+Patterns:
+1. **Strangler Fig Application** (cite AWS prescriptive guidance)
+2. **Branch by Abstraction**
+3. **Database-per-service 이전**
+
+Each pattern: principle + AWS docs link + "do / don't" list. No before/after diagrams (not validated).
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/patterns-tier3-monolith.md && git commit -m "docs(serverless-migration-advisor): patterns-tier3-monolith with validation warning (#3)"
+```
+
+### Task D4: patterns-tier3-data.md
+
+**Files:**
+- Modify: `references/patterns-tier3-data.md`
+
+- [ ] **Step 1: Author content**
+
+Same Tier 3 warning at top.
+
+Patterns:
+1. **RDS → Aurora Serverless v2** (동일 엔진 유지, 최저 위험)
+2. **RDS → DynamoDB** (access pattern re-design 필요, CDC 기반 전이)
+3. **Add S3 Express One Zone** for high-IO batch reads
+
+Cite AWS DMS (Database Migration Service) where applicable.
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/patterns-tier3-data.md && git commit -m "docs(serverless-migration-advisor): patterns-tier3-data (#3)"
+```
+
+---
+
+## Stage E — Case Studies + Insights + Interview Bank
+
+### Task E1: case-study-autoresearch.md
+
+**Files:**
+- Modify: `references/case-study-autoresearch.md`
+
+- [ ] **Step 1: Author content**
+
+Structure:
+```markdown
+> Snapshot date: 2026-04-18
+> Source: github.com/roboco-io/serverless-autoresearch @ commit 5435b37
+> Local path: /Users/dohyunjung/Workspace/roboco-io/research/serverless-autoresearch/
+
+# Case Study — serverless-autoresearch (Tier 1)
+
+## Headline
+- 48 Spot 실험 / 총비용 $3.94
+- H100 Spot 229초 / $0.16 (상용 대비 $7-24/8h)
+- Karpathy autoresearch 재현: val_bpb 0.9951 (원본 ~0.998)
+
+## Architecture
+(ASCII diagram: 사용자 → batch_launcher → N개 SageMaker Spot jobs → result_collector)
+
+## Quotable Statements (for skill output)
+- "이 워크로드는 상용 대비 20-100× 저렴하게 재현 가능 — [Case: autoresearch]"
+- "Spot 인터럽트율 H100 ~5% — [Case: autoresearch Insight #N]"
+
+## Applicable workloads
+- 배치 훈련, HPO, 여러 config 병렬 비교.
+
+## Not applicable
+- 단일 긴 연속 훈련 (체크포인트 오버헤드), 초저지연 추론.
+
+## Cross-references
+- patterns-tier1-batch.md §1.1
+- source-insights.md #1-15
+```
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/case-study-autoresearch.md && git commit -m "docs(serverless-migration-advisor): case-study-autoresearch (#3)"
+```
+
+### Task E2: case-study-openclaw.md
+
+**Files:**
+- Modify: `references/case-study-openclaw.md`
+
+- [ ] **Step 1: Author content**
+
+Structure parallels E1:
+```markdown
+# Case Study — serverless-openclaw (Tier 2)
+
+## Headline
+- 월 $1-2 (Free Tier 시 $0.23)
+- Lambda Container 콜드 스타트 1.35s
+- API Gateway 채택으로 ALB 고정비 $18-25/월 제거
+- ECS Fargate Spot fallback으로 컴퓨트 70% 절감
+
+## Architecture
+(ASCII: API Gateway → Lambda Container (primary) ↔ S3 session; fallback → ECS Fargate Spot; EventBridge scheduled pre-warming)
+
+## Quotable Statements
+## Applicable / Not applicable
+## Cross-references (patterns-tier2-api.md O1-O5)
+```
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/case-study-openclaw.md && git commit -m "docs(serverless-migration-advisor): case-study-openclaw (#3)"
+```
+
+### Task E3: source-insights.md
+
+**Files:**
+- Modify: `references/source-insights.md`
+
+- [ ] **Step 1: Author content**
+
+```markdown
+> Snapshot date: 2026-04-18
+
+# Source Project Insights (Numbered — stable references)
+
+## From serverless-autoresearch (Insight #1-#15)
+
+### Insight #1 — Spot Capacity Varies Dramatically by Region
+… (one paragraph from autoresearch/docs/insights.md §1)
+**Tier:** 1
+**Cited by:** tradeoffs-spot.md, patterns-tier1-batch.md §1.1
+
+### Insight #2 — …
+(continue for all 15)
+
+## From serverless-openclaw (Insight #O1-#O5)
+
+### Insight #O1 — Dual Compute (Lambda primary + Fargate Spot fallback)
+…
+**Tier:** 2
+**Cited by:** patterns-tier2-api.md §1.2
+```
+
+**Numbering rule:** autoresearch insights are numeric `#1-#15` (stable). openclaw principles are prefixed `#O1-#O5`. Future additions only extend (never renumber).
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/source-insights.md && git commit -m "docs(serverless-migration-advisor): numbered source-insights reference (#3)"
+```
+
+### Task E4: interview-bank.md
+
+**Files:**
+- Modify: `references/interview-bank.md`
+
+- [ ] **Step 1: Author content**
+
+Structure:
+```markdown
+> Snapshot date: 2026-04-18
+
+# Interview Bank — AskUserQuestion Templates
+
+## Phase 1: Classification (always asked)
+
+### Q1 — Workload type
+```json
+{
+  "question": "현재 워크로드의 주 타입을 선택해주세요.",
+  "header": "워크로드",
+  "multiSelect": false,
+  "options": [
+    { "label": "배치/훈련", "description": "..." },
+    ...
+  ]
+}
+```
+(Provide full JSON for Q1-Q4 of Phase 1)
+
+## Phase 3 (branches by Phase 1)
+
+### Branch: batch/training
+- Q_B1: Spot 인터럽트 허용도
+- Q_B2: 작업 최대 허용 wall-clock
+- Q_B3: 체크포인트 지원 가능 여부
+
+### Branch: always-on API
+- Q_A1: 콜드 스타트 p99 허용
+- Q_A2: 지속 연결 요구 (WebSocket / SSE)
+- Q_A3: 트래픽 패턴 (상시 / 버스트 / 주기)
+
+### Branch: ETL
+- Q_E1: 데이터 볼륨 / 실행 당
+- Q_E2: 작업 단위 지속 시간
+- Q_E3: 상태 저장소 (S3 / RDS / DynamoDB)
+
+### Branch: event-driven
+- Q_V1: 이벤트 소스
+- Q_V2: 처리 순서 보장 필요?
+- Q_V3: 중복 허용?
+
+### Branch: monolith (Tier 3)
+- 먼저 경고: "이 스킬은 Tier 3를 원칙 수준으로만 안내합니다."
+- Q_M1: 서비스 경계 식별 단계 (초기/중기/후기)
+- Q_M2: 다운타임 허용 윈도우
+- Q_M3: 데이터 일관성 요구
+
+## Common follow-ups (asked for every tier)
+- Q_C1: 월 목표 비용
+- Q_C2: 규정 준수 (PCI/HIPAA/GDPR/SOC2/없음)
+- Q_C3: RTO / RPO
+```
+
+Provide full JSON for at least Q1 (Phase 1) and one Q per branch as a template. Remaining questions show minimal schema.
+
+- [ ] **Step 2: Run tests; Commit**
+
+```bash
+npm test && git add references/interview-bank.md && git commit -m "docs(serverless-migration-advisor): interview-bank with 5 branches (#3)"
+```
+
+---
+
+## Stage F — Write SKILL.md Body
+
+### Task F1: Full SKILL.md
+
+**Files:**
+- Modify: `plugins/workflow/skills/serverless-migration-advisor/SKILL.md`
+
+- [ ] **Step 1: Draft SKILL.md (≤500 lines)**
+
+Structure (preserve frontmatter from Stage B):
+
+```markdown
+---
+name: serverless-migration-advisor
+description: ... (from Task B1)
+---
+
+# Serverless Migration Advisor
+
+**Role:** 기존 AWS 워크로드를 서버리스+Spot 패턴으로 이행할 때 분류·트레이드오프 평가·단계별 계획을 생성하는 업스트림 어드바이저. 구현 how-to는 후속 스킬로 위임.
+
+**이 스킬이 아닌 것:**
+- aws-well-architected는 기존 아키텍처의 Pillar 준수 리뷰. 본 스킬은 **이행**에 집중.
+- 실시간 AWS 요금 계산기 아님. 실험 기반 범위만 인용.
+- Terraform 생성기 아님. 스니펫 + 체크리스트까지.
+
+## 언제 사용하나
+트리거 키워드: "서버리스 이행", "EC2에서 Lambda로", "Spot 이행", "ALB에서 API Gateway", "월 비용 절감 이행", "serverless migration", "batch on Spot".
+
+## 실행 순서 (5 Phase)
+
+### Phase 1 — 워크로드 분류 인터뷰
+references/interview-bank.md의 Phase 1 JSON 4개 질문을 AskUserQuestion으로 순차 호출.
+
+### Phase 2 — IaC 스캔 (선택)
+사용자가 IaC 경로(Terraform/CDK/CFN)를 제공하면 정적 스캔. 파일 없으면 스킵.
+(첫 버전: 정규식으로 `resource "aws_instance"`, `aws_rds_cluster`, `aws_ecs_service` 등 추출)
+
+### Phase 3 — 제약·리스크 인터뷰
+Phase 1 응답 기반 분기 (interview-bank.md §Phase 3).
+
+### Phase 4 — 타겟 아키텍처 매핑
+(매핑 테이블 인라인 — SPEC §3 Phase 4와 동일)
+
+각 매핑 추천 시:
+1. references/patterns-tier{N}-*.md에서 해당 패턴 로드.
+2. references/tradeoffs-*.md에서 리스크·제약 확인.
+3. references/case-study-*.md에서 수치 범위 인용.
+
+### Phase 5 — 리포트 생성
+docs/serverless-migration/YYYY-MM-DD-{topic}.md 저장.
+(리포트 템플릿 인라인 — SPEC §5 스키마)
+
+## 인용 라벨 규칙 (의무)
+- `[AWS Docs]` — AWS 공식문서 출처 (URL 동반)
+- `[Insight #N]` — source-insights.md 항목 (번호 stable)
+- `[Case: autoresearch|openclaw]` — 검증 사례 근거
+
+모든 정량 주장·추천 근거는 위 세 라벨 중 최소 하나를 동반해야 함.
+
+## Delegation
+| 타겟 서비스 | 위임 스킬 |
+|------------|----------|
+| SageMaker Managed Spot Training | `sagemaker-spot-training` (존재) |
+| Lambda / API Gateway | 없음 — AWS Docs 링크 제공 |
+| AWS Batch | 없음 — AWS Docs 링크 제공 |
+| ECS Fargate + Fargate Spot | 없음 — AWS Docs 링크 제공 |
+
+## Tier 3 주의
+Tier 3 (모놀리스 분해 / DB 교체)는 **검증 사례 없음**.
+patterns-tier3-*.md 상단 경고를 리포트에도 복제.
+
+## 설정 파일 (선택)
+프로젝트 루트에 `.serverless-migration.yaml` 존재 시 기본값 사용:
+```yaml
+default_region: us-west-2
+report_dir: docs/serverless-migration/
+language: ko
+```
+
+## 오류 모드 (의무 노출 — SPEC §11)
+- CUDA / FA3 호환성 [Insight #4, #13]
+- Spot 쿼터 지연 (H100: days) [Insight from spot-capacity-guide]
+- Cold start 신규 리스크 [AWS Docs — Lambda quotas]
+- Fargate Spot 단일 태스크 가용성 위험 [AWS Docs — Fargate capacity providers]
+
+## References (Progressive Disclosure)
+(13 files listed with one-line each)
+```
+
+**Verify line count:**
+```bash
+wc -l plugins/workflow/skills/serverless-migration-advisor/SKILL.md
+```
+Expected: ≤500.
+
+- [ ] **Step 2: Run tests**
+
+```bash
+npm test
+```
+Expected: green.
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add plugins/workflow/skills/serverless-migration-advisor/SKILL.md
+git commit -m "feat(serverless-migration-advisor): write 5-Phase SKILL.md body (#3)"
+```
+
+---
+
+## Stage G — Integration
+
+### Task G1: README.md update
+
+**Files:**
+- Modify: `README.md`
+
+- [ ] **Step 1: Locate Workflow skills section**
+
+Run:
+```
+Grep({ pattern: "### Workflow", path: "README.md", output_mode: "content", -n: true })
+```
+
+- [ ] **Step 2: Add skill row**
+
+Append (or insert alphabetically) in the Workflow table:
+
+```markdown
+| [serverless-migration-advisor](plugins/workflow/skills/serverless-migration-advisor) | AWS always-on 아키텍처를 서버리스+Spot 패턴으로 이행하는 업스트림 어드바이저. 트레이드오프·리스크·단계별 계획 생성 후 구현 스킬로 위임. |
+```
+
+- [ ] **Step 3: Run tests; Commit**
+
+```bash
+npm test && git add README.md && git commit -m "docs: list serverless-migration-advisor in README (#3)"
+```
+
+### Task G2: CHANGELOG.md update
+
+**Files:**
+- Modify: `CHANGELOG.md`
+
+- [ ] **Step 1: Add Unreleased entry (or new version section)**
+
+Insert at top of Unreleased section:
+
+```markdown
+### Added
+- `serverless-migration-advisor` 스킬 추가: AWS always-on 아키텍처를 서버리스+Spot 패턴으로 이행하기 위한 업스트림 어드바이저. 5-Phase 인터뷰 → 타겟 아키텍처 매핑 → 단계별 체크리스트 리포트. AWS Docs + serverless-autoresearch + serverless-openclaw 검증 사례 기반. (#3)
+```
+
+- [ ] **Step 2: Commit**
+
+```bash
+git add CHANGELOG.md
+git commit -m "docs(changelog): note serverless-migration-advisor skill (#3)"
+```
+
+### Task G3: Update issues/3-serverless-migration/PLAN.md status
+
+**Files:**
+- Modify: `issues/3-serverless-migration/PLAN.md`
+
+- [ ] **Step 1: Check implementation checkboxes (§4)**
+
+Mark all 산출물 checkboxes `[x]`.
+
+- [ ] **Step 2: Commit**
+
+```bash
+git add issues/3-serverless-migration/PLAN.md
+git commit -m "docs(issues/3): check off PLAN implementation deliverables (#3)"
+```
+
+---
+
+## Stage H — Verification
+
+### Task H1: Full test suite
+
+- [ ] **Step 1: Run full test suite**
+
+```bash
+npm test
+```
+Expected: all tests pass.
+
+- [ ] **Step 2: Check skill line count**
+
+```bash
+wc -l plugins/workflow/skills/serverless-migration-advisor/SKILL.md
+```
+Expected: ≤500.
+
+- [ ] **Step 3: Check each reference line count (sanity)**
+
+```bash
+wc -l plugins/workflow/skills/serverless-migration-advisor/references/*.md
+```
+Expected: each ≤500 (no hard limit but keeps progressive disclosure working).
+
+### Task H2: Scenario rehearsal (manual)
+
+These run inside Claude Code itself — not automated. For each scenario below, invoke the skill mentally (or actually, if plugin is installed locally) and verify outputs match expectations.
+
+- [ ] **Scenario 1 (Tier 1):** "나는 EC2 H100에서 매일 8시간 배치 훈련을 돌리고 월 $1,800을 지출한다. 서버리스로 이행 가능한가?"
+  - Expected Phase 1 classification: `배치/훈련` + `상시 API 아님`.
+  - Expected Phase 4 target: SageMaker Managed Spot Training.
+  - Expected report cites: [Case: autoresearch], Insight #1, #5, AWS Docs (SageMaker Managed Spot + Spot interruptions).
+  - Expected delegation: "sagemaker-spot-training 스킬로 이어 진행".
+
+- [ ] **Scenario 2 (Tier 2):** "ALB + EC2 Auto Scaling으로 트래픽 변동이 큰 REST API를 운영 중. 월 $800."
+  - Expected Phase 1: `상시 API`.
+  - Expected Phase 4 target: Lambda + API Gateway (primary), Fargate + Fargate Spot (fallback).
+  - Expected report cites: [Case: openclaw] O1-O2, AWS Docs (Lambda quotas + Fargate capacity providers).
+  - Expected delegation: AWS Docs 링크 (전용 스킬 없음 명시).
+
+- [ ] **Scenario 3 (Tier 3):** "Spring Boot 모놀리스 + RDS PostgreSQL. 서버리스로 이행 가능한가?"
+  - Expected Phase 1: `모놀리스`.
+  - Expected response: Tier 3 경고 + Strangler Fig 패턴 + pilot 필수 경고.
+  - Expected target: 먼저 분해, 이후 단계별 이행.
+  - Expected no suggestion to jump directly to Lambda.
+
+Document any deviations in `issues/3-serverless-migration/VERIFICATION.md`.
+
+### Task H3: PR preparation
+
+- [ ] **Step 1: Branch / commit state**
+
+```bash
+git log --oneline main..HEAD
+```
+Expected: clean history of Stage A-G commits.
+
+- [ ] **Step 2: Prepare PR description**
+
+Draft the PR with:
+- Link to Issue #3
+- Link to SPEC.md, PLAN.md, TASKS.md, RESEARCH.md
+- Scenario H2 results
+- Test output snippet (`npm test` green)
+
+- [ ] **Step 3: (User decision) Create PR or keep as local draft**
+
+User must explicitly approve PR creation. Do not push without confirmation.
+
+---
+
+## Self-Review (run before marking plan complete)
+
+Check each spec requirement against a task:
+
+| SPEC section | Implemented by task |
+|--------------|---------------------|
+| §2.1 Directory layout | B1, B2 |
+| §3 Phase 1-5 flow | F1 |
+| §4.1 Lambda tradeoffs | C1 |
+| §4.2 SageMaker Managed Spot | C1 |
+| §4.3 Fargate Spot | C1, C2 |
+| §4.4 EC2 Spot | C2 |
+| §4.5 AWS Batch | C1 |
+| §4.6 Serverless Lens | C5 |
+| §5 Report schema | F1 (inline template) |
+| §6 Interview system | E4, F1 |
+| §7 Delegation | F1, D1-D2 |
+| §8 Case study citation rules | E1, E2, E3 |
+| §9 Tier 3 treatment | D3, D4, F1 |
+| §10 Deliverables list | all Stage B-G |
+| §11 Non-goals | F1 (explicit section) |
+| §12 Success criteria (6 bullets) | H2 scenarios |
+
+**Placeholder scan:** this plan contains no `TBD` outside of Task B1/B2 stubs that are explicitly filled in later stages. Every step with code shows the code. Every command shows expected output.
+
+**Type consistency:** N/A (no code).
+
+**Naming consistency:**
+- `serverless-migration-advisor` (skill name) — used verbatim everywhere.
+- `[AWS Docs]` / `[Insight #N]` / `[Case: …]` — citation labels consistent across all tasks.
+- `references/` paths use exact filenames from §File Structure.
+
+---
+
+## Execution Handoff
+
+Plan saved to `issues/3-serverless-migration/TASKS.md`. Two execution options:
+
+**1. Subagent-Driven (recommended)** — dispatch a fresh subagent per Stage (A/B/C/D/E/F/G/H), review between stages. Best for quality gates.
+
+**2. Inline Execution** — execute tasks in this session with checkpoints after each Stage. Faster but less review surface.
+
+Awaiting user decision.
diff --git a/plugins/workflow/skills/serverless-migration-advisor/SKILL.md b/plugins/workflow/skills/serverless-migration-advisor/SKILL.md
new file mode 100644
index 0000000..5eb972e
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/SKILL.md
@@ -0,0 +1,226 @@
+---
+name: serverless-migration-advisor
+description: AWS always-on 아키텍처(EC2/ALB/ECS/RDS)를 서버리스+Spot 패턴으로 이행할 때 사용. 워크로드 분류, 트레이드오프 평가, 단계별 이행 계획 생성. 구현 how-to는 sagemaker-spot-training 등 후속 스킬로 위임. 트리거 예 - "서버리스 이행", "EC2에서 Lambda로", "Spot 이행", "serverless migration", "ALB에서 API Gateway", "비용 절감 이행".
+---
+
+# Serverless Migration Advisor
+
+**Role:** 기존 AWS 워크로드(EC2/ALB/ECS/RDS/모놀리스)를 서버리스+Spot 패턴으로 이행할 때의 **업스트림 어드바이저**. 워크로드 분류 → 트레이드오프 평가 → 타겟 아키텍처 매핑 → 단계별 이행 계획 생성을 담당한다. 구현 how-to는 후속 스킬로 **위임**한다.
+
+## 이 스킬이 아닌 것
+
+- `aws-well-architected`: 기존 아키텍처의 **Pillar 준수 리뷰** — 본 스킬은 **이행**에 집중한다.
+- 실시간 AWS 요금 계산기 아님. **검증 사례** 기반의 비용 범위만 인용한다.
+- Terraform/CDK 생성기 아님. 스니펫과 **체크리스트**까지만 제공한다.
+- 멀티클라우드 아님. AWS 전용이다.
+- 구현 how-to 아님. 타겟 확정 후 `sagemaker-spot-training` 등으로 **위임**한다.
+
+## 언제 사용하나
+
+트리거 키워드: 서버리스 이행, EC2에서 Lambda로, Spot 이행, ALB에서 API Gateway, 월 비용 절감 이행, serverless migration, batch on Spot, 콜드 스타트 이행.
+
+---
+
+## 실행 순서 (5 Phase)
+
+### Phase 1 — Workload Classification (항상 실행)
+
+[interview-bank.md](references/interview-bank.md) §Phase 1의 Q1-Q4 JSON을 AskUserQuestion으로 순차 호출한다:
+
+- Q1 Workload type: 배치/훈련 / 상시 API / ETL / 이벤트 기반 / 모놀리스(Tier 3)
+- Q2 Current environment
+- Q3 Execution frequency
+- Q4 Single run duration
+
+응답으로 **Tier(1/2/3)** + **sub-type**을 결정한다. 예: `Tier 1 / ML training / daily / 8h`.
+
+**Tier 3** 응답이 하나라도 있으면 즉시 Branch E 안내문을 표시한다:
+
+> ⚠️ Tier 3 이행은 **검증 사례 없음**. 원칙 수준 가이드와 AWS 공식 링크만 제공됩니다. 파일럿 필수. [patterns-tier3-monolith.md](references/patterns-tier3-monolith.md) 및 [patterns-tier3-data.md](references/patterns-tier3-data.md) 참조.
+
+### Phase 2 — IaC Scan (선택)
+
+사용자가 IaC 파일 경로(Terraform `.tf`, CDK `.ts/.py`, CloudFormation `.yaml`)를 제공하면:
+
+1. Glob으로 파일 목록 획득
+2. Read + 정규식 기반 리소스 추출 (첫 버전 한계: `resource "aws_..."`, CDK L2 construct 이름 패턴)
+3. 현 아키텍처 요약 테이블 생성 (인스턴스 타입·개수·월 비용 추정 범위)
+
+IaC가 없거나 사용자가 스킵을 선택하면 Phase 3로 넘어간다.
+
+> **IaC 파서 한계**: 정규식 기반은 참조 추출만 지원한다. 상호 참조·변수 치환·module 재귀는 미지원이다. 깊은 분석이 필요하면 수동 입력을 요청한다.
+
+### Phase 3 — 제약·리스크 심층 인터뷰
+
+Phase 1 Q1 응답에 따라 [interview-bank.md](references/interview-bank.md) §Phase 3의 분기 중 하나를 실행한다:
+
+- **Branch A (Batch/Training)**: A1 Spot 허용도 / A2 wall-clock / A3 체크포인트
+- **Branch B (Always-on API)**: B1 콜드 스타트 p99 / B2 지속 연결 / B3 트래픽 패턴
+- **Branch C (ETL)**: C1 데이터 볼륨 / C2 지속 시간 / C3 상태 저장소
+- **Branch D (Event-driven)**: D1 이벤트 소스(multiSelect) / D2 순서 보장 / D3 중복 허용
+- **Branch E (Monolith / Tier 3)**: E1 bounded context 식별 단계 / E2 다운타임 / E3 데이터 일관성
+
+공통 follow-ups(모든 branch): C_Cost 월 예산 / C_Compliance 규정 / C_RTORPO.
+
+### Phase 4 — Target Architecture Mapping
+
+인터뷰 결과 + IaC 스캔(있으면)을 [patterns-tier{1|2|3}-*.md](references/)와 매핑한다:
+
+| Classification | Primary | Secondary | Delegation |
+|---------------|---------|-----------|------------|
+| Tier 1 / ML training / 체크포인트 가능 | SageMaker Managed Spot | AWS Batch + Spot | `sagemaker-spot-training` 존재 |
+| Tier 1 / ETL (<15min) | Lambda + S3 + EventBridge | - | 없음 (AWS Docs) |
+| Tier 1 / ETL (>15min) | AWS Batch + Spot (Fargate/EC2) | Step Functions + Lambda 분해 | 없음 |
+| Tier 2 / API (low cold-start OK) | API Gateway + Lambda | API Gateway + Fargate Spot | 없음 |
+| Tier 2 / API (p99 <500ms 요구) | Lambda + Provisioned Concurrency / SnapStart | Fargate (On-Demand baseline) | 없음 |
+| Tier 2 / WebSocket | API Gateway WebSocket + Lambda | ALB + Fargate | 없음 |
+| Tier 2 / Java monolith | Lambda + SnapStart | Fargate (hybrid) | 없음 |
+| Tier 3 / monolith | Strangler Fig + API Gateway routing | - | 없음 (원칙만) |
+| Tier 3 / RDS→Aurora v2 | Aurora Serverless v2 (동일 엔진) | - | 없음 |
+| Tier 3 / RDS→DynamoDB | DynamoDB + DMS CDC | 원래 RDS 유지 권고 | 없음 (고위험) |
+
+각 매핑 추천 시 **의무**:
+
+1. [patterns-tier{N}-*.md](references/)에서 해당 패턴 상세를 로드한다.
+2. [tradeoffs-{compute|spot|data-layer|event-driven}.md](references/)에서 리스크·제약을 확인한다.
+3. [case-study-*.md](references/)에서 수치 범위를 인용한다.
+
+### Phase 5 — Report Generation
+
+`docs/serverless-migration/YYYY-MM-DD-{topic-slug}.md` 파일을 생성한다. 슬러그 중복 시 `-2`, `-3` 접미사를 붙인다. 사용자가 리포트 디렉토리를 커스텀하면 그 경로를 사용한다.
+
+리포트 스키마 (SPEC §5 기반):
+
+```markdown
+# Serverless Migration Plan — {workload-name}
+
+**생성 일시**: YYYY-MM-DD HH:MM
+**분류**: Tier {1|2|3} / {sub-type}
+**검증 사례 참조**: {autoresearch | openclaw | 없음}
+
+## Executive Summary
+| 항목 | AS-IS | TO-BE | 변화 |
+|------|-------|-------|------|
+| 월간 비용 | $X | $Y | -Z% |
+| 관리 오버헤드 | … | … | … |
+| 콜드 스타트 p99 | … | … | … |
+| Spot 인터럽트 리스크 | … | … | … |
+
+## Workload Classification
+(Phase 1 응답 요약)
+
+## Target Architecture
+- Primary: {서비스}
+- Alternative: {서비스}
+- 근거: [AWS Docs §…], [Insight #N], [Case: …]
+
+## Tradeoff Dossier
+| 리스크 | 근거 | 완화 전략 |
+|--------|------|-----------|
+(Phase 4 매핑에서 나온 tradeoff 3-5개)
+
+## Staged Migration Checklist
+- [ ] Stage 0 — 사전 준비 (쿼터, placement scores, IAM)
+- [ ] Stage 1 — 저위험 검증 (cheap Spot, 샘플 데이터)
+- [ ] Stage 2 — 파일럿 (부분 트래픽, 관측 구축)
+- [ ] Stage 3 — 전환 (DNS/LB 절체, 이전 리소스 폐기)
+
+## Delegation
+> 타겟이 **{service}**로 확정되었습니다.
+> 세부 구현은 `{next-skill-name}` 스킬로 이어 진행해주세요. (없으면 AWS Docs URL 제공)
+
+## Citations
+- [AWS Docs — …](URL) × N
+- [Insight #N] × N
+- [Case: {autoresearch|openclaw}]
+```
+
+---
+
+## Citation Label 규칙 (의무)
+
+모든 트레이드오프 주장·추천 근거에는 최소 하나의 라벨이 있어야 한다:
+
+- **[AWS Docs]** — AWS 공식 문서 출처 (URL 동반)
+- **[Insight #N]** — [source-insights.md](references/source-insights.md)의 항목 (N은 1-15 또는 O1-O6)
+- **[Case: autoresearch|openclaw]** — 검증 사례
+
+라벨 없는 주장은 **질문으로 전환**하거나 삭제한다. 리포트 하단 Citations 섹션에 모든 라벨을 수집한다.
+
+---
+
+## Delegation Map
+
+| 타겟 서비스 | 위임 스킬 | 상태 |
+|-----------|----------|------|
+| SageMaker Managed Spot Training | `sagemaker-spot-training` | 존재 (`plugins/development/skills/`) |
+| AWS Lambda | (향후 `lambda-deployment`) | 미존재 — AWS Docs 링크 제공 |
+| AWS Batch | (향후 `aws-batch-workflow`) | 미존재 |
+| ECS Fargate / Fargate Spot | (향후 `fargate-service`) | 미존재 |
+
+미존재 스킬로 위임하는 경우 리포트 Delegation 섹션에 "이 단계는 아직 전용 스킬이 없습니다. [AWS Docs: {URL}] 참조"를 명시한다.
+
+---
+
+## Tier 3 주의 (반복 강조)
+
+Tier 3는 **검증 사례가 없다**. [patterns-tier3-monolith.md](references/patterns-tier3-monolith.md) / [patterns-tier3-data.md](references/patterns-tier3-data.md) 상단 경고를 리포트 Executive Summary 바로 아래에 복제한다:
+
+> ⚠️ **Tier 3 이행 경고**: 본 조언은 AWS 공식 patterns + 원칙 수준 가이드 기반이며, 동등한 규모의 검증된 레퍼런스 구현이 없다. **1-2개 bounded context로 파일럿 필수**.
+
+---
+
+## 설정 파일 (선택)
+
+프로젝트 루트에 `.serverless-migration.yaml`이 존재하면 기본값으로 사용한다 (없으면 인터뷰에서 결정):
+
+```yaml
+default_region: us-west-2
+report_dir: docs/serverless-migration/
+language: ko  # 또는 en
+include_iac_scan: true  # false면 Phase 2 스킵
+```
+
+---
+
+## 오류 모드 노출 (의무 — SPEC §11 Non-goal "침묵 금지")
+
+리포트 Tradeoff Dossier 또는 Citations에 반드시 다음을 언급한다:
+
+- CUDA / FA3 GPU arch 호환성 — `[Insight #4, #13]`
+- Spot 쿼터 지연 (H100/B200은 days) — `[Insight from spot-capacity-guide]`
+- Cold start 신규 리스크 — `[AWS Docs — Lambda quotas]`, `[Case: openclaw]` (1.35s baseline)
+- Fargate Spot 단일 태스크 가용성 위험 — `[AWS Docs — Fargate capacity providers]`
+- SnapStart 제약 (런타임 한정, 비결정 초기화) — `[AWS Docs — Lambda SnapStart]`
+- 최대가 지정 안티패턴 (Spot 인터럽트 빈도 증가) — `[AWS Docs — Spot interruptions]`
+
+---
+
+## References (Progressive Disclosure)
+
+- [tradeoffs-compute.md](references/tradeoffs-compute.md) — Lambda / SageMaker / Fargate / EC2 Spot / Batch 공식 트레이드오프
+- [tradeoffs-spot.md](references/tradeoffs-spot.md) — 인터럽트, placement scores, HUGI, billable 정의
+- [tradeoffs-data-layer.md](references/tradeoffs-data-layer.md) — RDS / Aurora Serverless v2 / DynamoDB / S3 Express
+- [tradeoffs-event-driven.md](references/tradeoffs-event-driven.md) — EventBridge / SQS / Kinesis / Step Functions
+- [serverless-lens.md](references/serverless-lens.md) — AWS WA Serverless Lens 원칙 매핑
+- [patterns-tier1-batch.md](references/patterns-tier1-batch.md) — 배치·훈련·ETL 이행 패턴
+- [patterns-tier2-api.md](references/patterns-tier2-api.md) — 상시형 API·웹 이행 패턴
+- [patterns-tier3-monolith.md](references/patterns-tier3-monolith.md) — Strangler Fig (검증 없음)
+- [patterns-tier3-data.md](references/patterns-tier3-data.md) — RDS→DynamoDB (검증 없음)
+- [interview-bank.md](references/interview-bank.md) — Phase별 AskUserQuestion 템플릿
+- [case-study-autoresearch.md](references/case-study-autoresearch.md) — Tier 1 검증 ($3.94 / 48실험)
+- [case-study-openclaw.md](references/case-study-openclaw.md) — Tier 2 검증 ($1/월)
+- [source-insights.md](references/source-insights.md) — 번호화된 Insight #N 인용 앵커
+
+---
+
+## 종료 기준
+
+리포트가 SPEC §12의 6개 성공 기준을 **모두** 충족하면 종료한다:
+
+1. Workload classified (Phase 1)
+2. Target serverless pattern identified (Phase 4)
+3. 비용 절감 범위 추정 (case study 인용)
+4. Top-3 리스크 flagged (Tradeoff Dossier)
+5. Staged Migration Checklist 생성
+6. Insight #N / AWS Docs 인용 — traceable
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/case-study-autoresearch.md b/plugins/workflow/skills/serverless-migration-advisor/references/case-study-autoresearch.md
new file mode 100644
index 0000000..aa4a18a
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/case-study-autoresearch.md
@@ -0,0 +1,91 @@
+> **Snapshot date**: 2026-04-18
+> **Source**: github.com/roboco-io/serverless-autoresearch @ commit 5435b374 (RESEARCH.md §10.1 기록)
+> **Local path**: /Users/dohyunjung/Workspace/roboco-io/research/serverless-autoresearch/
+> **Tier**: 1 (배치·훈련)
+> **Description**: Tier 1 검증 사례 — serverless-autoresearch
+
+# Case Study — serverless-autoresearch
+
+본 케이스는 본 스킬의 **Tier 1 (배치·훈련) 인용 앵커**다. 리포트 Tradeoff Dossier, 비용 범위, 체크리스트 근거는 이 파일과 [source-insights.md](source-insights.md) #1-#15로 역추적된다.
+
+## Headline 숫자 (인용 가능)
+
+- **48 Spot 실험 / 총비용 $3.94** — autonomous HPO 파이프라인 전체 비용
+- **H100 Spot 229초 / $0.16** (상용 비교 $7-24 / 8h) — 44-150× 저렴한 재현
+- **val_bpb 0.9951** — Karpathy upstream ~0.998 재현·소폭 상회
+- **Spot 인터럽트율 H100 ~5%** (실측) — 체크포인트 기반 재시도로 최종 성공률 100%
+- **4 병렬 실험 $0.066 / ~10분 wall clock** — parallel pipeline cost 검증
+- **리전별 placement score 1↔9** — 동일 인스턴스 타입에서도 30분+ 대기 vs ~2분 시작의 극단 편차
+
+## 목표 워크로드
+
+- 소규모 언어 모델 훈련 자동화 (autoresearch 패턴) — Karpathy nanoGPT 기반
+- 48개 config 병렬 비교 실험 (Muon/AdamW, LR/BS 진화)
+- Phase 1 L40S Spot 탐색 + Phase 2 H100 Spot 검증의 2단 파이프라인
+
+## 아키텍처 요약
+
+```text
+users → batch_launcher.py → N개 SageMaker Managed Spot training jobs
+                               ├─ S3 (input data, checkpoints)
+                               ├─ CloudWatch (metrics, logs)
+                               └─ result_collector.py (post-processing)
+```
+
+핵심 구성 요소:
+
+- **Launcher**: `batch_launcher.py` — boto3 `sagemaker.PyTorch` Estimator로 N개 잡을 병렬 제출. `MaxWaitTime > MaxRuntime` 강제.
+- **Spot 활용**: SageMaker Managed Spot (BillableTime = Training 시간만). `(1 - BillableTime/TrainingTime) × 100`으로 절감률 계산.
+- **Checkpointing**: S3↔컨테이너 로컬 경로 자동 동기화 (SageMaker 내장). 인터럽트 후 자동 재시작.
+- **Collector**: `result_collector.py` — 잡 완료 이벤트 수신 후 결과 집계·최적 config 선정.
+- **리전 선택**: `aws ec2 get-spot-placement-scores`로 사전 평가한 리전 고정.
+
+## 이 스킬이 인용 가능한 범위 (Quotable statements)
+
+각 항목은 해당 context에서 리포트 Tradeoff Dossier 또는 비용 범위 칸에 **직접 인용** 가능:
+
+- "이 워크로드 패턴은 상용 대비 **20-100× 저렴하게 재현** 가능 — [Case: autoresearch]"
+- "H100 Spot 인터럽트율 ~5%로 관측됨 (48회 실험, 체크포인트 기반 재시도로 최종 성공률 100%) — [Case: autoresearch + Insight #11]"
+- "SageMaker startup overhead ~3분. 5분 작업에는 60% 오버헤드 — 배치 누적 또는 짧은 작업에는 비효율 — [Insight #5]"
+- "리전 placement score 1-2에서는 30분+ 대기, 9에서는 ~2분 시작 — 리전 선택이 인스턴스 크기보다 중요 — [Insight #1]"
+- "큰 인스턴스가 Spot 시장에서 **더 저렴한 경우가 존재** (g7e.8xlarge < g7e.2xlarge in us-west-2) — [Insight #2]"
+- "Phase 1 L40S에서 찾은 LR을 H100에 그대로 적용하여 upstream baseline 이하 달성 — 값싼 GPU에서의 튜닝이 전이 가능 — [Insight #14]"
+- "단일 가정 점검(BS 고정 해제)이 20회 탐색보다 **100× 비용 효율** — [Insight #13]"
+
+## 적용 가능 워크로드
+
+- 배치 ML 훈련 (체크포인트 지원 필수)
+- 하이퍼파라미터 튜닝 병렬 실행 (SageMaker HPO Managed Spot 호환)
+- Offline batch inference — 결과가 S3에 고정 저장되는 형태
+- 재현 가능한 실험 파이프라인 (연구·논문 재현)
+- 4~48 병렬 수준의 autonomous ML 파이프라인
+
+## 적용 불가
+
+- **단일 긴 연속 훈련** (체크포인트 오버헤드 비효율 · 3분 startup × 다회 재시작)
+- **초저지연 추론** (Lambda 또는 Fargate 필요 — SageMaker Training Job은 endpoint가 아님)
+- **Real-time streaming** (Kinesis/Managed Streaming으로 분리 필수)
+- **FA3 의존 워크로드 on L40S/Ada** — 하드웨어 지원 범위 주의 [Insight #4]
+- **체크포인트 불가능한 알고리즘** — MaxWait 3600s 이내로 압축되지 않으면 SageMaker Managed Spot 부적합
+
+## 하드웨어·리전별 주의사항 (Tier 1 특화)
+
+- **쿼터는 First-class concern** — GPU Spot 기본 0, p5/p6는 수동 검토 (며칠 소요) [Insight #6]
+- **g7e는 Profiler 미지원** — `disable_profiler=True` 필수 [Insight #7]
+- **PyArrow 버전 불일치** — DLC 23.x, 로컬이 더 낮으면 parquet 읽기 실패. `pyarrow>=21.0.0` [Insight #9]
+- **config.yaml 절대 git 커밋 금지** — 역할 ARN·프로필·리전 포함 [Insight #10]
+
+## 관련 파일 참조
+
+- 전체 실험 데이터: `experiments/003-h100-comparison/results-summary.md`
+- 원본 인사이트: `docs/insights.md` 또는 본 스킬의 [source-insights.md](source-insights.md) §Insight #1~#15
+- 리전 선택 가이드: `docs/spot-capacity-guide.md`
+- 비교 분석: `docs/comparison-report.md` (Sequential vs Parallel Spot)
+- Source commit: `5435b374fb5daae5eee95e3e8eb9292caacf94f8`
+
+## Cross-references (내부)
+
+- [patterns-tier1-batch.md](patterns-tier1-batch.md) Pattern 1.1 (SageMaker Spot 이행) — 본 케이스가 근거
+- [tradeoffs-compute.md](tradeoffs-compute.md) §2 (SageMaker Managed Spot)
+- [tradeoffs-spot.md](tradeoffs-spot.md) §1 (placement scores), §4 (HUGI 원칙)
+- [source-insights.md](source-insights.md) #1-#15 — 번호화된 인사이트
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/case-study-openclaw.md b/plugins/workflow/skills/serverless-migration-advisor/references/case-study-openclaw.md
new file mode 100644
index 0000000..b9f10f8
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/case-study-openclaw.md
@@ -0,0 +1,94 @@
+> **Snapshot date**: 2026-04-18
+> **Source**: github.com/serithemage/serverless-openclaw (alpha)
+> **Tier**: 2 (상시형 API)
+> **Description**: Tier 2 검증 사례 — serverless-openclaw
+
+# Case Study — serverless-openclaw
+
+본 케이스는 본 스킬의 **Tier 2 (상시형 API) 인용 앵커**다. 리포트 Tradeoff Dossier, 월 비용 추정, Lambda+Fargate 이중 구성 근거는 이 파일과 [source-insights.md](source-insights.md) #O1-#O6로 역추적된다.
+
+## Headline 숫자
+
+- **월 비용 목표 $1-2** (Free Tier 시 $0.23) — zero-idle 모든 구성요소의 총합
+- **Lambda Container cold start 1.35초** (warm 0.12초)
+- **ECS Fargate Spot fallback → 컴퓨트 비용 70% 절감**
+- **API Gateway 채택으로 ALB 고정비 $18-25/월 제거**
+- **EventBridge scheduled pre-warming** → 액티브 시간대 **0초 first-response** (월 ~$0.07)
+- **Primary: Lambda Container / Fallback: ECS Fargate Spot** — Lambda 15분 상한 초과 세션만 Fargate로
+
+## 목표 워크로드
+
+- **OpenClaw AI agent 온디맨드 실행** — LLM 기반 대화 에이전트
+- **Web UI (React SPA) + Telegram bot 이중 인터페이스** — 두 경로에서 동일 세션
+- **멀티 LLM 지원**: Claude / GPT / DeepSeek — 모델 키만 교체
+- **개인·사이드 프로젝트 수준 트래픽** — 일 수십~수백 요청
+
+## 아키텍처 요약
+
+```text
+User (Web or Telegram)
+        │
+        ▼
+   API Gateway
+        │
+        ▼
+   Lambda Container (primary, zero-idle)
+     ├─ ECS Fargate Spot (fallback when 긴 세션 >15분)
+     ├─ S3 (session persistence, web assets via CloudFront)
+     ├─ DynamoDB (session index)
+     └─ EventBridge (active hours pre-warming)
+```
+
+핵심 구성 요소:
+
+- **API Gateway**: pay-per-request 모델. ALB 고정비($18-25/월) 제거의 핵심.
+- **Lambda Container**: Primary compute. Python + 대형 의존성(LLM SDK)을 컨테이너 이미지로 패키징. Cold start 1.35초 수용.
+- **ECS Fargate Spot**: Fallback compute. Lambda 15분·6MB 페이로드 한계 초과 시에만 호출. 70% 비용 절감.
+- **DynamoDB**: 세션 인덱스·메타데이터. On-Demand 모드로 zero-idle.
+- **S3**: 세션 payload·대화 이력·웹 SPA 에셋. CloudFront로 배포.
+- **EventBridge Scheduler**: 액티브 시간대만 Lambda 주기 호출 (월 ~$0.07). 24/7 Provisioned Concurrency의 비용 회피.
+- **CloudFront**: 웹 UI 정적 호스팅 (S3 origin). EC2/Lambda 없이 UI 서빙.
+
+## 인용 가능 범위 (Quotable statements)
+
+각 항목은 리포트 Target Architecture, Tradeoff Dossier, Executive Summary 칸에 **직접 인용** 가능:
+
+- "ALB를 API Gateway로 교체하면 상시형 API의 고정비 **$18-25/월**이 제거됨 — [Case: openclaw + Insight #O2]"
+- "Lambda Container cold start는 **1.35초** 수준 — Provisioned Concurrency 없이도 대화형 UX 수용 가능 — [Case: openclaw]"
+- "Lambda primary + Fargate Spot fallback 이중 구성으로 요청당 비용 **70% 절감** (긴 세션만 Fargate) — [Insight #O1]"
+- "EventBridge로 액티브 시간만 pre-warming 시 **0초 first-response** 달성, 월 ~$0.07 추가 — [Insight #O3]"
+- "상태 없는 Lambda에서 **S3/DynamoDB 세션 영속화**로 대화 지속성 확보 — [Insight #O4]"
+- "정적 웹 UI는 **CloudFront + S3**로 서버 런타임 없이 서빙 → 0 idle 비용 — [Insight #O6]"
+- "Free Tier 내 월 **$0.23**, Free Tier 소진 후 **$1-2/월** 수준 — 개인·사이드 프로젝트 비용 타겟 근거 — [Insight #O5]"
+
+## 적용 가능 워크로드
+
+- **상시형 REST/GraphQL API** (요청당 <15분, 동기 페이로드 <6MB)
+- **WebSocket chat** (API Gateway WebSocket + Lambda) — 메시지 단위 처리
+- **Free Tier 예산 ~$1 타겟** 개인·사이드 프로젝트
+- **정적 웹 UI** (S3 + CloudFront) — Next.js export / React SPA
+- **이중 인터페이스** (Web + 메신저 봇 동시 운영)
+- **저트래픽·버스트 패턴** (idle → spike → idle 반복)
+
+## 적용 불가
+
+- **대규모 트래픽 상시 피크** — Lambda 동시성 기본 1,000 한계 [tradeoffs-compute.md §1.1]
+- **지속 연결 수초 이상** — Lambda 15분 상한 초과 시 Fargate 불가피 [AWS Docs]
+- **초저지연 API p99 <100ms** — cold start 1.35초가 리스크. Provisioned Concurrency 또는 SnapStart(Java/Python 3.12+) 고려
+- **대용량 동기 응답 >6MB** — Lambda 동기 페이로드 한계. 스트리밍(200MB) 또는 S3 presigned URL로 우회
+- **엄격한 SLA 요구 프로덕션** — Spot 기반 Fargate fallback은 SLA 엄격 워크로드 부적합 [AWS Docs — Batch Spot]
+
+## 비용 설계 원칙 (Tier 2 특화)
+
+- **Zero-idle + per-request 청구** 모든 구성요소에 적용 [Insight #O5]
+- **static vs dynamic 경로 분리** — 정적 UI는 CloudFront+S3, 동적 요청만 Lambda [Insight #O6]
+- **Pre-warming 비용 < Provisioned Concurrency 비용** — EventBridge $0.07/월 vs PC $15~/월 [Insight #O3]
+- **Lambda Container 선택의 대가**: SnapStart 미지원. Java 스택이라면 SnapStart + zip 배포 재고려 [RESEARCH §13]
+
+## Cross-references (내부)
+
+- [patterns-tier2-api.md](patterns-tier2-api.md) Pattern 2.1 (API Gateway + Lambda), Pattern 2.2 (Fargate Spot fallback), Pattern 2.3 (세션 영속화)
+- [tradeoffs-compute.md](tradeoffs-compute.md) §1 (Lambda), §3 (Fargate + Fargate Spot)
+- [tradeoffs-data-layer.md](tradeoffs-data-layer.md) §2 (DynamoDB On-Demand)
+- [tradeoffs-event-driven.md](tradeoffs-event-driven.md) (EventBridge Scheduler)
+- [source-insights.md](source-insights.md) #O1-#O6 — 번호화된 인사이트
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/interview-bank.md b/plugins/workflow/skills/serverless-migration-advisor/references/interview-bank.md
new file mode 100644
index 0000000..98e24e6
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/interview-bank.md
@@ -0,0 +1,246 @@
+> **Snapshot date**: 2026-04-18
+> **Description**: Phase별 AskUserQuestion 질문 뱅크
+
+# Interview Bank — AskUserQuestion Templates
+
+SKILL.md의 Phase 1, 3 인터뷰가 호출할 질문 집합. 모든 질문은 AskUserQuestion 도구 규격(JSON)으로 기재. Phase 1은 항상 실행, Phase 3은 Phase 1 응답에 따라 Branch A~E로 분기.
+
+## Phase 1 — Workload Classification (항상 실행)
+
+Phase 1 Q1-Q4는 스킵 불가. 답변에 따라 Phase 3 Branch가 결정된다.
+
+### Q1 — Workload type
+
+```json
+{
+  "question": "현재 워크로드의 주 타입을 선택해주세요.",
+  "header": "워크로드",
+  "multiSelect": false,
+  "options": [
+    { "label": "배치/훈련", "description": "ML 훈련, 대규모 데이터 가공, 주기적 무거운 계산" },
+    { "label": "상시 API", "description": "REST/GraphQL/WebSocket 상시 접근 API" },
+    { "label": "ETL", "description": "데이터 파이프라인, 변환·적재" },
+    { "label": "이벤트 기반", "description": "큐·스트림·웹훅 트리거로 동작" }
+  ]
+}
+```
+
+(기타 옵션은 AskUserQuestion이 자동 제공 — "모놀리스" 선택 시 Tier 3 트랙 Branch E)
+
+### Q2 — Current environment
+
+```json
+{
+  "question": "현재 실행 환경은?",
+  "header": "환경",
+  "multiSelect": false,
+  "options": [
+    { "label": "EC2 24/7", "description": "상시 인스턴스" },
+    { "label": "EC2 + Auto Scaling", "description": "수평 확장 구성" },
+    { "label": "ECS/EKS", "description": "컨테이너 오케스트레이션" },
+    { "label": "EMR/온프레/기타", "description": "Hadoop, on-prem, 외부 클라우드" }
+  ]
+}
+```
+
+### Q3 — Execution frequency
+
+```json
+{
+  "question": "실행 빈도는?",
+  "header": "빈도",
+  "multiSelect": false,
+  "options": [
+    { "label": "상시", "description": "24/7 지속 처리" },
+    { "label": "일 수회", "description": "하루에 몇 번 수동/자동 실행" },
+    { "label": "일 1회", "description": "야간 배치 등" },
+    { "label": "주·월 단위 / 이벤트", "description": "희소 실행" }
+  ]
+}
+```
+
+### Q4 — Single run duration
+
+```json
+{
+  "question": "단일 작업의 지속 시간은?",
+  "header": "지속시간",
+  "multiSelect": false,
+  "options": [
+    { "label": "<1분", "description": "Lambda·Step Functions Express 수용" },
+    { "label": "1~15분", "description": "Lambda 상한 이내" },
+    { "label": "15분~1시간", "description": "Batch·Fargate·Step Functions Standard" },
+    { "label": ">1시간", "description": "배치·훈련 장기 작업" }
+  ]
+}
+```
+
+## Phase 3 — Branches by Phase 1
+
+Phase 1 Q1의 응답으로 분기. 아래 각 Branch는 대표 질문 1개를 전체 JSON으로, 나머지는 minimal schema로 기재.
+
+### Branch A — Batch / Training
+
+트리거: Phase 1 Q1 = "배치/훈련". Tier 1 패턴 적용.
+
+**A1. Spot 인터럽트 허용도**
+
+```json
+{
+  "question": "Spot 인터럽트를 허용할 수 있나요?",
+  "header": "Spot 허용도",
+  "multiSelect": false,
+  "options": [
+    { "label": "허용 (재시도 가능)", "description": "체크포인트 또는 idempotent 재실행 가능" },
+    { "label": "제한적 (체크포인트 있음)", "description": "2분 내 정리 가능" },
+    { "label": "불가", "description": "연속성 보장 필수 — Spot 대신 On-Demand 권고" }
+  ]
+}
+```
+
+**A2. 최대 허용 wall-clock** (minimal — `<30분` / `<2시간` / `<8시간` / `>8시간` 옵션)
+
+**A3. 체크포인트 구현 여부** (minimal — `없음` / `부분 구현` / `완전 구현` 옵션; SageMaker Managed Spot의 `MaxWaitTime ≤ 3600s` 제약 판단에 활용)
+
+### Branch B — Always-on API
+
+트리거: Phase 1 Q1 = "상시 API". Tier 2 패턴 적용.
+
+**B1. Cold start p99 허용**
+
+```json
+{
+  "question": "콜드 스타트 p99 허용 한도는?",
+  "header": "Cold start",
+  "multiSelect": false,
+  "options": [
+    { "label": "<500ms", "description": "Provisioned Concurrency + warm-up 필요" },
+    { "label": "<2s", "description": "Lambda 기본 (Node/Python) 충족 가능" },
+    { "label": "<5s", "description": "Lambda 컨테이너 (Java/대형 의존성) 허용" },
+    { "label": "허용", "description": "비용 최적화 우선, 간헐적 지연 수용" }
+  ]
+}
+```
+
+**B2. 지속 연결 요구 종류** (minimal — `없음` / `WebSocket` / `SSE` / `gRPC streaming` 옵션; Fargate 필수 여부 판단)
+
+**B3. 트래픽 패턴** (minimal — `상시 평탄` / `스파이키 버스트` / `시간대 패턴` / `예측 불가` 옵션; Provisioned Concurrency vs EventBridge pre-warming [Insight #O3] 선택)
+
+### Branch C — ETL
+
+트리거: Phase 1 Q1 = "ETL". Tier 1 패턴 + 데이터 레이어 질문.
+
+**C1. 데이터 볼륨 per 실행**
+
+```json
+{
+  "question": "한 번의 ETL 실행 당 처리 볼륨은?",
+  "header": "데이터 볼륨",
+  "multiSelect": false,
+  "options": [
+    { "label": "<1 GB", "description": "Lambda + S3 수용" },
+    { "label": "1-100 GB", "description": "AWS Batch (Fargate) 권고" },
+    { "label": "100 GB-1 TB", "description": "AWS Batch (EC2 Spot) + Glue" },
+    { "label": ">1 TB", "description": "EMR on EC2 Spot 또는 Redshift Serverless" }
+  ]
+}
+```
+
+**C2. 작업 지속 시간** (minimal — `<15분` / `15분~1시간` / `1~8시간` / `>8시간` 옵션)
+
+**C3. 상태 저장소** (minimal — `S3 only` / `S3 + RDS/Aurora` / `DynamoDB` / `Redshift/Glue Catalog` 옵션)
+
+### Branch D — Event-driven
+
+트리거: Phase 1 Q1 = "이벤트 기반".
+
+**D1. 이벤트 소스**
+
+```json
+{
+  "question": "주요 이벤트 소스는?",
+  "header": "이벤트 소스",
+  "multiSelect": true,
+  "options": [
+    { "label": "HTTP webhook", "description": "API Gateway → Lambda" },
+    { "label": "S3 object PUT", "description": "S3 event → Lambda/SQS" },
+    { "label": "DB change", "description": "DynamoDB Streams / RDS CDC via DMS" },
+    { "label": "스케줄 기반", "description": "EventBridge Scheduler" }
+  ]
+}
+```
+
+**D2. 순서 보장 요구** (minimal — `필수 (FIFO)` / `best-effort` / `불필요` 옵션; SQS FIFO vs Standard, Kinesis vs EventBridge 선택)
+
+**D3. 중복 허용** (minimal — `멱등` / `at-least-once 허용` / `exactly-once 필수` 옵션; Step Functions Standard vs Express 선택 [RESEARCH §17])
+
+### Branch E — Monolith (Tier 3)
+
+트리거: Phase 1 Q1 = "모놀리스" (기타 옵션).
+
+> ⚠️ Branch E 진입 시, 먼저 다음 문구를 표시:
+> "Tier 3 이행은 검증 사례 없음. 원칙 수준 가이드와 AWS 공식 링크만 제공됩니다. 파일럿 필수."
+
+**E1. 서비스 경계 식별 단계**
+
+```json
+{
+  "question": "Bounded context(서비스 경계) 식별은 어느 정도 진행되었나요?",
+  "header": "경계식별",
+  "multiSelect": false,
+  "options": [
+    { "label": "초기 (분석 전)", "description": "도메인 분석부터 필요 — 본 스킬 범위 밖" },
+    { "label": "중기 (2-3개 후보)", "description": "파일럿 candidate 선별 가능" },
+    { "label": "후기 (분해 계획 존재)", "description": "개별 서비스별 이행 패턴 적용 가능" }
+  ]
+}
+```
+
+**E2. 다운타임 윈도우** (minimal — `무중단 필수` / `주간 몇 분` / `월간 수 시간` / `주말 유지보수 창` 옵션)
+
+**E3. 데이터 일관성 요구** (minimal — `strong consistency` / `eventual 허용` / `per-domain 혼합` 옵션; Strangler Fig 중 데이터 이동 전략 결정 [patterns-tier3-data.md])
+
+## Common follow-ups (모든 Tier에서 질문)
+
+Phase 3 Branch 완료 후, 리포트 Executive Summary 및 Tradeoff Dossier 작성 전에 공통 질문 수행.
+
+### C_Cost. 월 목표 비용
+
+```json
+{
+  "question": "월간 AWS 비용 목표 범위는?",
+  "header": "예산",
+  "multiSelect": false,
+  "options": [
+    { "label": "<$10", "description": "Free Tier + Lambda 중심" },
+    { "label": "$10-100", "description": "소규모 상용 서비스" },
+    { "label": "$100-1000", "description": "중간 규모 워크로드" },
+    { "label": ">$1000", "description": "대규모 — ROI 최적화 중심" }
+  ]
+}
+```
+
+### C_Compliance. 규정 준수
+
+(minimal — `PCI-DSS` / `HIPAA` / `GDPR` / `SOC2` / `없음` 옵션; multiSelect=true 권장. S3 Express One Zone의 단일 AZ, DynamoDB 암호화 선택, VPC 배치 제약 판단에 활용)
+
+### C_RTORPO. RTO/RPO
+
+(자유 입력 권고 — AskUserQuestion의 사전 정의 옵션보다는 텍스트 수집이 적합. 예: "RTO 4시간 / RPO 1시간", "RTO 5분 / RPO 0". 리포트 Tradeoff Dossier의 가용성 섹션에 직접 인용.)
+
+## 인터뷰 스킵 규칙
+
+- **IaC 파일에서 핵심 정보 추출 가능하고 사용자가 "기본값 사용" 선택 시 Phase 3 스킵 가능** (SPEC §6.3). 이 경우 기본값은:
+  - Spot 허용도: 체크포인트 있으면 "허용", 없으면 "불가"
+  - Cold start p99: <2s (Lambda 기본)
+  - 데이터 볼륨: IaC의 컨테이너 메모리·작업 디렉토리로 추정
+- **Phase 1 Q1~Q4는 스킵 불가** — classification 근거.
+- **Branch E 진입 시 경고 메시지 필수** — 검증 사례 없음을 반드시 노출.
+- **C_Compliance에서 PCI-DSS/HIPAA 선택 시** — S3 Express One Zone, Fargate Spot 단일 태스크는 자동 제외 후보로 표시.
+
+## 질문 설계 원칙 (SPEC §6.1)
+
+- **질문당 2-4 옵션** + AskUserQuestion 자동 제공 "기타".
+- **description에 Why/How**: 단순 레이블이 아니라 "왜 이 질문이 결정에 중요한지"를 명시.
+- **단계별 분기**: Phase 1 응답이 Phase 3 질문 집합을 결정 — 불필요한 질문 방지.
+- **트레이드오프 연결**: 각 옵션은 references/의 트레이드오프 표 특정 행과 대응하도록 레이블링.
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier1-batch.md b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier1-batch.md
new file mode 100644
index 0000000..29d13c2
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier1-batch.md
@@ -0,0 +1,306 @@
+> **Snapshot date**: 2026-04-18
+> **Tier**: 1 (배치·훈련·ETL) — serverless-autoresearch로 검증
+> **Description**: 배치·훈련·ETL 이행 패턴
+
+# Tier 1 Migration Patterns — Batch / Training / ETL
+
+이 문서의 모든 패턴은 [Case: autoresearch] 검증 사례 기반. 세부 수치는 RESEARCH.md §10.1.1 참조.
+
+## 1. When Tier 1?
+
+- 작업이 ephemeral — 매 실행마다 시작·종료하는 배치성 워크로드
+- 인터럽트 허용 또는 체크포인트로부터 재개 가능 — Spot 기반 절감의 전제
+- 비용 vs. wall-clock 절충 수용 — Spot 대기로 실행 시간이 다소 늘어도 과금 절감 이득이 큼
+- 실행 당 독립 상태 — 이전 실행의 in-memory 상태에 의존하지 않음 (Serverless Lens 원칙 3 "Share nothing")
+
+## 2. Patterns
+
+### Pattern 1.1: EC2 long-running → SageMaker Managed Spot Training
+
+**적용 조건** (when to use):
+- ML 훈련·재훈련·HPO 워크로드
+- 체크포인트 가능 (PyTorch/TF 표준 checkpoint API)
+- 인터럽트 허용 (job-level retry 수용)
+- GPU 사용 시간이 24시간 중 일부 (상시 GPU 불필요)
+
+**AS-IS:**
+```text
+[User]
+  |
+  v
+[EC2 H100 상시]───(cron or manual)───▶ [Training script]
+   └─ idle 16h/day                          │
+                                            v
+                                       [S3 artifacts]
+
+월 $3,000+ (H100 24h × $3/hr × 30d)
+```
+
+**TO-BE:**
+```text
+[User / Scheduler]
+  |
+  v
+[SageMaker Training Job (Managed Spot, MaxWait>MaxRuntime)]
+    Starting → Downloading → Training → (Interrupted → Starting) → Uploading
+                                  │
+                                  v
+                             [S3 checkpoints + artifacts]
+
+과금 = BillableTime (Training 구간만)
+```
+
+**핵심 변경:**
+- 상시 GPU 인스턴스를 제거하고 훈련 잡 단위로 전환
+- 체크포인트를 S3에 자동 동기화 (SageMaker 내장 기능) [AWS Docs — SageMaker]
+- 리전 선택을 Spot placement score 기반으로 고정 [Insight #1]
+- `MaxWaitTimeInSeconds > MaxRuntimeInSeconds` 조건 강제 [AWS Docs]
+
+**트레이드오프:**
+- **장점**: 과금 구간이 Training 시간만으로 축소 (HUGI 원칙), on-demand 대비 최대 90% 절감 [AWS Docs]
+- **단점 1**: 잡당 ~3분 시작 오버헤드 — 5분 훈련 잡은 60%가 오버헤드 [Insight #5]. 실험 병합 또는 warm pool로 완화.
+- **단점 2**: GPU Spot 쿼터 기본 0 — 리전·인스턴스별 사전 요청 필수 [Insight #6]
+- **단점 3**: 하드웨어별 배치 사이즈·LR 재튜닝 필요 — L40S의 최적은 H100과 다를 수 있음 [Insight #13]
+
+**비용 범위** (예시, 검증 사례 출처 필수):
+- 48 실험 $3.94 (autoresearch autonomous HPO 전체 비용) [Case: autoresearch]
+- H100 단일 Spot run 229초, $0.16 (Karpathy 원본 H100 8h $7-24 대비 44-150× 저렴) [Case: autoresearch] [Insight #15]
+- 4-병렬 실험 $0.066, 약 10분 wall clock [Insight #8]
+
+**이행 체크리스트 스켈레톤:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] `aws ec2 get-spot-placement-scores`로 리전별 Spot 용량 조사 [Insight #1]
+  - [ ] GPU Spot 쿼터 증액 요청 (다중 리전) [Insight #6]
+  - [ ] `disable_profiler=True` 적용 확인 (g7e 필요 시) [Insight #7]
+  - [ ] 체크포인트 S3 경로·IAM 역할 준비
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 소규모 데이터로 dry-run (1 epoch) — Spot 할당·체크포인트 복원 확인
+  - [ ] `MaxWaitTime` / `MaxRuntime` 설정 검증 — `MaxWaitTime > MaxRuntime` 강제
+  - [ ] `(1 - BillableTime/TrainingTime) × 100` 절감률 측정
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] 기존 EC2 잡과 동일 하이퍼파라미터로 1회 비교 실행
+  - [ ] 인터럽트 주입 시나리오로 체크포인트 복구 확인 (AWS FIS 가능)
+  - [ ] 하드웨어별 배치 사이즈 재검증 [Insight #13]
+- [ ] **Stage 3 (전환)**:
+  - [ ] EC2 상시 인스턴스 중단 (우선 stop, 일정 관측 후 terminate)
+  - [ ] EventBridge Scheduler로 주기 실행 연결
+  - [ ] 비용·성공률 CloudWatch 대시보드 구축
+
+**위임(Delegation):** `sagemaker-spot-training` (존재) — 본 스킬이 Phase 5 리포트 생성 후 `sagemaker-spot-training` 스킬로 how-to를 위임한다. 세부 구현(config.yaml 구조, launch 스크립트, 쿼터 자동 확인)은 해당 스킬에서 다룬다.
+
+**Citations:**
+- [AWS Docs — SageMaker Managed Spot Training](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html)
+- [AWS Docs — Spot placement scores](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-placement-score.html)
+- [Insight #1] — 리전별 placement score 편차
+- [Insight #5] — 3분 시작 오버헤드
+- [Insight #6] — GPU Spot 쿼터 관리
+- [Insight #13] — 배치 사이즈 × LR × 하드웨어 상호작용
+- [Insight #15] — 229s/$0.16 검증 사례
+- [tradeoffs-compute.md §2] — SageMaker Managed Spot 정량 한계
+- [tradeoffs-spot.md §4] — HUGI 원칙
+
+---
+
+### Pattern 1.2: EMR → AWS Batch + Spot (SPOT_CAPACITY_OPTIMIZED)
+
+**적용 조건** (when to use):
+- 분산 ETL / Spark / 배치 처리 워크로드
+- retryable · fault-tolerant 작업 단위
+- 작업 단위 실행 시간 15분 초과 (Lambda 불가)
+- 기존 EMR step 인터페이스에 강하게 의존하지 않음
+
+**AS-IS:**
+```text
+[Data event / cron]
+       |
+       v
+[EMR cluster (상시 or on-demand)]
+  ├─ Master (m5.xlarge)
+  ├─ Core × N
+  └─ Task × M
+       │
+       ├─▶ [S3 input] ─▶ [Spark job] ─▶ [S3 output]
+       └─ cluster idle 대기 시간 과금
+```
+
+**TO-BE:**
+```text
+[EventBridge / Step Functions]
+       |
+       v
+[AWS Batch job queue]
+  ├─ Compute env: Fargate/EC2 Spot (SPOT_CAPACITY_OPTIMIZED)
+  ├─ Compute env: On-Demand (fallback queue)
+  └─ retryStrategy.attempts ≥ 2, evaluateOnExit
+       │
+       ├─▶ [S3 input] ─▶ [Container job] ─▶ [S3 output]
+       └─ 과금 = 컨테이너 실행 시간만
+```
+
+**핵심 변경:**
+- 클러스터 상시 유지 → 작업 단위 compute environment 자동 프로비저닝
+- `SPOT_CAPACITY_OPTIMIZED` 할당 전략으로 인터럽트 최소화 [AWS Docs — Batch Spot]
+- Spot compute-env 우선 + On-Demand fallback 큐의 2단 구성 [AWS Docs]
+- Spark 로직은 컨테이너화 (EMR Serverless 또는 AWS Batch on ECS/Fargate)
+
+**트레이드오프:**
+- **장점**: 클러스터 idle 과금 제거, Spot 절감, 작업 단위 독립 리소스
+- **단점 1**: **EMR step 인터페이스 상실** — 기존 Oozie/Airflow-on-EMR 워크플로 Step Functions로 재작성
+- **단점 2**: Hive metastore / HDFS 의존 job은 S3/Glue Catalog로 재아키텍처 필요
+- **단점 3**: `BEST_FIT` 전략은 Spot 인터럽트률이 높아 부적합 — `SPOT_CAPACITY_OPTIMIZED`만 Spot 권장 [AWS Docs]
+- **단점 4**: SIGTERM 핸들러·체크포인트가 컨테이너 이미지 레벨에서 준비돼야 함 [AWS Docs]
+
+**비용 범위** (예시, 검증 사례 출처 필수):
+- 본 패턴은 autoresearch/openclaw가 직접 검증한 워크로드는 아니며, Spot 절감 원칙만 공유한다. 구체 벤치마크는 없음 — `[Case: autoresearch]`의 Spot HUGI 원칙을 동일 전제로 적용한다.
+- 참고: 4-병렬 autoresearch 실험 $0.066 (Batch-유사 패턴의 parallel 특성 입증) [Case: autoresearch] [Insight #8]
+
+**이행 체크리스트 스켈레톤:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] Spark 잡의 상태 저장성 검토 — HDFS 의존 제거, S3 중심 설계
+  - [ ] 컨테이너 이미지 빌드 (Spark + 의존성 + SIGTERM 핸들러)
+  - [ ] Spot 쿼터 / 인스턴스 타입 다양화 확인
+  - [ ] Glue Data Catalog 이전 (Hive metastore 대체)
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 소규모 데이터셋으로 Batch job 제출
+  - [ ] `SPOT_CAPACITY_OPTIMIZED` 할당 전략 적용 확인
+  - [ ] `retryStrategy.attempts = 2-3` + `evaluateOnExit` 테스트
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] EMR과 병렬 운영, 결과 동등성 비교
+  - [ ] 인터럽트 시나리오 검증 (FIS 또는 Spot capacity reclaim 대기)
+  - [ ] Spot 큐 / On-Demand 큐 큐 우선순위 검증
+- [ ] **Stage 3 (전환)**:
+  - [ ] EMR 클러스터 단계적 축소
+  - [ ] Step Functions로 job chain 재배선
+  - [ ] 비용 대시보드 (Spot 절감률, retry 횟수)
+
+**위임(Delegation):** 없음 — 전용 스킬 미존재. 구현 단계는 [AWS Docs — AWS Batch with Spot](https://docs.aws.amazon.com/batch/latest/userguide/spot.html) 참고.
+
+**Citations:**
+- [AWS Docs — AWS Batch with Spot](https://docs.aws.amazon.com/batch/latest/userguide/spot.html)
+- [AWS Docs — Batch allocation strategies](https://docs.aws.amazon.com/batch/latest/userguide/allocation-strategies.html)
+- [AWS Docs — EMR Serverless (대안)](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/what-is-emr-serverless.html)
+- [tradeoffs-compute.md §5] — Batch 할당 전략 비교
+- [tradeoffs-spot.md §5] — Spot Do/Don't
+- [Insight #8] — 병렬 실행의 효율
+
+---
+
+### Pattern 1.3: Cron on EC2 → EventBridge Scheduler + Lambda / Batch
+
+**적용 조건** (when to use):
+- 주기 실행 작업 (daily/hourly/rate expression)
+- EC2 crontab 의존 레거시 스케줄 작업
+- 작업 단위 실행 시간:
+  - **<15분 → Lambda** (동시성 1,000 기본 수용 가능)
+  - **≥15분 → AWS Batch** (Pattern 1.2 참조)
+- 상태를 외부(S3/DynamoDB/RDS)에 위임 가능
+
+**AS-IS:**
+```text
+[EC2 (t3.medium 상시)]
+  └─ /etc/crontab:
+       0 2 * * *  run-daily-etl.sh
+       */15 * * * *  health-check.sh
+       0 0 * * 0  weekly-report.sh
+
+  월 ~$30 EC2 + idle 대부분
+```
+
+**TO-BE:**
+```text
+[EventBridge Scheduler]
+  ├─ rule: cron(0 2 * * ? *) ─▶ [Lambda: daily-etl]
+  ├─ rule: rate(15 minutes) ──▶ [Lambda: health-check]
+  └─ rule: cron(0 0 ? * SUN *) ▶ [Batch job: weekly-report]  (>15분)
+
+  과금 = 실행 시간만 (Lambda) · 컨테이너 시간만 (Batch)
+```
+
+**핵심 변경:**
+- crontab syntax → EventBridge rate/cron expression 변환 [AWS Docs — EventBridge Scheduler]
+- 짧은 잡은 Lambda, 긴 잡은 Batch로 분기 (15분 상한 기준) [tradeoffs-compute.md §1.2]
+- EC2 상시 인스턴스 제거 — zero-idle 달성
+- 로그·메트릭은 CloudWatch로 통합 (로컬 로그 파일 의존 제거)
+
+**트레이드오프:**
+- **장점**: EC2 고정비 제거, 작업 격리 (한 잡 실패가 다른 잡에 영향 없음), 자동 재시도·DLQ 내장
+- **단점 1**: cron syntax 일부 차이 — AWS cron은 6필드 (`분 시 일 월 요일 년`), 표준 5필드와 다름. `?` 플레이스홀더 사용 규칙. [AWS Docs]
+- **단점 2**: 15분 초과 작업은 Lambda 불가 → Step Functions 분해 또는 Batch로 위임 [AWS Docs — Lambda quotas]
+- **단점 3**: 잡 간 암묵적 공유 상태(로컬 파일, 환경변수)를 S3/DynamoDB로 외부화 필요 (Serverless Lens 원칙 3)
+- **단점 4**: `EventBridge Scheduler`는 scheduled rules의 후계 — 신규 설계는 Scheduler 선택 (혼동 주의) [AWS Docs]
+
+**비용 범위** (예시, 검증 사례 출처 필수):
+- openclaw의 EventBridge scheduled pre-warming: 월 **~$0.07** 추가 비용으로 first-response 콜드스타트 페널티 제거 [Case: openclaw] [Insight #O3]
+- 본 스케줄링 패턴 자체는 autoresearch/openclaw가 직접 벤치마크하지 않았으나, openclaw의 $0.07/월 수치는 EventBridge + Lambda 조합의 참조 단가로 활용 가능.
+
+**이행 체크리스트 스켈레톤:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] 현재 crontab 인벤토리 작성 (스케줄, 실행 시간, 의존성)
+  - [ ] 각 작업을 Lambda / Batch 중 어느 쪽에 배치할지 15분 기준 분류
+  - [ ] 로컬 파일·환경 의존성을 S3/DynamoDB로 외부화 설계
+  - [ ] IAM 역할 최소 권한 정의 (작업별 격리)
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 단일 저위험 작업을 EventBridge Scheduler + Lambda로 이전
+  - [ ] 실행 로그·실패 알람 검증
+  - [ ] cron expression 표기 차이 확인 (AWS 6필드 규칙)
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] 상위 5개 작업 이전, EC2 crontab은 주석 처리 (이중 실행 방지)
+  - [ ] 장시간(>15분) 작업을 Batch로 분리 (Pattern 1.2 적용)
+  - [ ] DLQ·재시도·알람 연결 확인
+- [ ] **Stage 3 (전환)**:
+  - [ ] 잔여 작업 이전 완료 후 EC2 terminate
+  - [ ] 유지보수용 운영 대시보드 (스케줄러 실행 성공률, 지연)
+
+**위임(Delegation):** 없음 — 전용 스킬 미존재. 구현 단계는 [AWS Docs — EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html) 및 [AWS Docs — Lambda Scheduled Events](https://docs.aws.amazon.com/lambda/latest/dg/services-cloudwatchevents.html) 참고.
+
+**Citations:**
+- [AWS Docs — EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html)
+- [AWS Docs — EventBridge cron/rate expressions](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-scheduled-rule-pattern.html)
+- [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+- [tradeoffs-compute.md §1.2] — Lambda 15분 상한
+- [tradeoffs-event-driven.md §1] — EventBridge Scheduler vs Scheduled rules
+- [Insight #O3] — EventBridge scheduled pre-warming
+
+---
+
+## 3. Anti-patterns
+
+**A1. Spot 인터럽트 불가 워크로드에 Spot 강제 적용**
+- 규정 준수·무정지 요구 배치(금융 정산, 규제 보고)는 Spot 부적합
+- AWS Batch 문서: 프로덕션 API·데이터베이스·엄격 SLA는 Spot 제외 워크로드로 명시 [AWS Docs — Batch with Spot]
+
+**A2. 체크포인트 없이 MaxWait 60분 초과 시도**
+- SageMaker 내장·마켓플레이스 알고리즘은 체크포인트 미사용 시 `MaxWaitTime ≤ 3600s` 강제
+- 체크포인트 없이 장시간 Spot 잡을 던지면 2회 인터럽트 후 `Stopped: MaxWaitTimeExceeded`로 종료되고 BillableTime만 과금되는 worst case 발생 [AWS Docs — SageMaker]
+
+**A3. 단일 AZ 고정 with Spot**
+- 용량 편차가 리전·AZ별로 극단적이어서 단일 AZ는 Spot 가용성의 안티패턴 [Insight #1]
+- 다중 AZ + 다중 인스턴스 타입 + ASG / SPOT_CAPACITY_OPTIMIZED의 3종 세트가 최소 요건 [tradeoffs-spot.md §5]
+
+**A4. `BEST_FIT` 할당 전략 + Spot**
+- `BEST_FIT`은 최소 가용 인스턴스만 선택 → 인터럽트률 증가 [AWS Docs — Batch allocation strategies]
+- Spot 워크로드의 기본 권장은 `SPOT_CAPACITY_OPTIMIZED` [tradeoffs-compute.md §5.1]
+
+**A5. 시작 오버헤드를 무시한 초단기 잡 양산**
+- SageMaker 잡당 ~3분 오버헤드 → 5분 잡은 60%가 오버헤드 [Insight #5]
+- 단일 잡에 실험 병합 또는 warm pool로 분할 비용 절감 필요
+
+**A6. 최대가(max price) 수동 지정**
+- 최대가를 On-Demand 상한보다 낮게 지정 시 인터럽트 빈도 증가 → 기본값(On-Demand 상한) 유지가 최선 [AWS Docs — EC2 Spot interruptions] [tradeoffs-spot.md §5]
+
+---
+
+## 4. Citations
+
+- [AWS Docs — SageMaker Managed Spot](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html)
+- [AWS Docs — AWS Batch Spot](https://docs.aws.amazon.com/batch/latest/userguide/spot.html)
+- [AWS Docs — Batch allocation strategies](https://docs.aws.amazon.com/batch/latest/userguide/allocation-strategies.html)
+- [AWS Docs — EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html)
+- [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+- [AWS Docs — EC2 Spot interruptions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html)
+- [AWS Docs — Spot placement scores](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-placement-score.html)
+- [AWS Docs — EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/what-is-emr-serverless.html)
+- [Case: autoresearch] — 48 실험 $3.94, H100 229s $0.16
+- [Insight #1, #5, #6, #7, #8, #13, #15] — autoresearch 번호화 인사이트
+- [Insight #O3] — openclaw EventBridge scheduled pre-warming
+- Cross-refs: [tradeoffs-compute.md §1.2, §2, §5], [tradeoffs-spot.md §4, §5]
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier2-api.md b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier2-api.md
new file mode 100644
index 0000000..d6e1155
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier2-api.md
@@ -0,0 +1,407 @@
+> **Snapshot date**: 2026-04-18
+> **Tier**: 2 (always-on API/웹) — serverless-openclaw로 검증
+> **Description**: 상시형 API·웹 이행 패턴
+
+# Tier 2 Migration Patterns — Always-on API / Web
+
+[Case: openclaw]로 검증된 Tier 2 이행. 세부 원칙은 RESEARCH.md §10.2.1 (O1-O6) 참조.
+
+## 1. When Tier 2?
+
+- 트래픽이 버스트 또는 주기적 — 상시 피크가 아니라 spiky 또는 업무시간 집중
+- 요청 처리 시간 <5분 — Lambda 또는 Fargate Spot 수용 가능
+- 콜드 스타트 허용 가능 — p99 <2s 수용하거나 SnapStart/Provisioned Concurrency/pre-warming으로 완화 가능
+- 상시 DB·캐시 연결이 필수가 아니거나 RDS Proxy·DynamoDB로 대체 가능
+
+## 2. Patterns
+
+### Pattern 2.1: ALB + EC2 → API Gateway + Lambda
+
+**적용 조건** (when to use):
+- REST API · 짧은 요청/응답 (요청·응답 각 <6MB) [tradeoffs-compute.md §1.1]
+- 실행 시간 <15분 [AWS Docs — Lambda quotas]
+- Spiky / zero-idle 트래픽 (24시간 중 활용 구간이 일부)
+- 웹소켓·gRPC streaming 등 long-lived 연결 불필요
+
+**AS-IS:**
+```text
+[Client]
+   |
+   v
+[ALB ($18-25/월 고정비)] ─▶ [EC2 ASG (최소 1대 상시)]
+                                │
+                                ├─▶ [RDS]
+                                └─▶ logs/local state
+
+  EC2 idle 시에도 ALB + EC2 고정비 과금
+```
+
+**TO-BE:**
+```text
+[Client]
+   |
+   v
+[API Gateway (per-request 과금)] ─▶ [Lambda (zero-idle)]
+                                       │
+                                       ├─▶ [DynamoDB]
+                                       └─▶ [S3 / RDS Proxy]
+
+  호출 없으면 $0
+```
+
+**핵심 변경:**
+- ALB 고정비 ($18-25/월) 제거 → API Gateway per-request 과금 [Case: openclaw] [Insight #O2]
+- EC2 상시 → Lambda zero-idle
+- 로컬 상태(session, file) → DynamoDB + S3 외부화 [Insight #O4]
+- 데이터베이스 커넥션은 RDS Proxy 경유 (직접 연결 시 커넥션 폭주) [tradeoffs-data-layer.md §1]
+
+**트레이드오프:**
+- **장점**: zero-idle 달성, 자동 스케일링, 운영 오버헤드 제거
+- **단점 1 (신규 리스크)**: **콜드 스타트** — Lambda Container 1.35s [Case: openclaw]. p99 <100ms SLA 요구 시 부적합. Provisioned Concurrency 또는 SnapStart로 완화 [AWS Docs — Lambda SnapStart]
+- **단점 2**: 동기 페이로드 **6MB** 한계 — 대용량 응답은 스트리밍(200MB) 또는 S3 presigned URL로 우회 [tradeoffs-compute.md §1.1]
+- **단점 3**: 동시성 1,000 기본 쿼터 vs API Gateway 10,000 RPS 기본값 불일치 — 부하 테스트 전에 증액 요청 [tradeoffs-compute.md §1.2]
+- **단점 4**: 로컬 상태 의존 코드는 stateless 전환 필요 — 세션·일시 파일은 DynamoDB/S3로 [Insight #O4]
+
+**완화책 (콜드 스타트):**
+- **EventBridge scheduled pre-warming**: openclaw는 활성 시간대 cron으로 컨테이너 주기 호출하여 first-response 콜드 스타트 제거, 월 ~$0.07 추가 [Case: openclaw] [Insight #O3]
+- **SnapStart** (Java 11+, Python 3.12+, .NET 8+): Java는 무료, Python/.NET은 캐시 비용 발생 (최소 3시간분) [AWS Docs — Lambda SnapStart]
+- **Provisioned Concurrency**: 엄격한 p99 SLA 필요 시 — 상시 과금이지만 사실상 콜드 스타트 제거
+
+**비용 범위** (예시, 검증 사례 출처 필수):
+- openclaw 전체 월 비용 **$1-2/월** (Free Tier 시 $0.23) [Case: openclaw] [Insight #O5]
+- ALB 고정비 제거 효과: **$18-25/월** 절감 [Case: openclaw] [Insight #O2]
+- Pre-warming 추가 비용: 월 **~$0.07** [Case: openclaw] [Insight #O3]
+
+**이행 체크리스트 스켈레톤:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] 요청·응답 페이로드 크기 분포 측정 (6MB 초과 경로 식별)
+  - [ ] Lambda 동시성 쿼터 증액 요청 (필요 시)
+  - [ ] 로컬 상태 의존 지점 스캔 → DynamoDB/S3 외부화 설계
+  - [ ] 인증·CORS·API key 정책 API Gateway에 매핑
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 단일 저빈도 엔드포인트를 Lambda로 이전
+  - [ ] 콜드 스타트 p50·p99 측정, SLA 비교
+  - [ ] RDS 직접 연결이면 RDS Proxy로 대체
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] API Gateway weighted routing 또는 Route 53 가중치로 트래픽 X% 분기
+  - [ ] CloudWatch 대시보드 (콜드 스타트 빈도, p95/p99, 에러율)
+  - [ ] EventBridge pre-warming 적용 후 first-response 개선 측정
+- [ ] **Stage 3 (전환)**:
+  - [ ] 100% 전환, ALB + EC2 지연 종료 (롤백 윈도우 확보)
+  - [ ] 비용 비교 리포트 (ALB 제거 효과 정량화)
+
+**위임(Delegation):** 없음 — 전용 `lambda-deployment` 스킬 미존재. 구현 단계는 [AWS Docs — API Gateway + Lambda](https://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-with-lambda-integration.html) 참고.
+
+**Citations:**
+- [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+- [AWS Docs — API Gateway + Lambda](https://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-with-lambda-integration.html)
+- [AWS Docs — Lambda SnapStart](https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html)
+- [Insight #O2] — API Gateway로 ALB 고정비 제거
+- [Insight #O3] — EventBridge pre-warming
+- [Insight #O4] — DynamoDB + S3 session persistence
+- [Insight #O5] — Free-tier first cost target
+- [tradeoffs-compute.md §1] — Lambda 트레이드오프
+- [tradeoffs-data-layer.md §1] — RDS Proxy
+
+---
+
+### Pattern 2.2: ECS 상시 서비스 → Fargate + Fargate Spot 혼합
+
+**적용 조건** (when to use):
+- 상시 실행 필요한 컨테이너 서비스 (Lambda 15분 상한 초과 또는 장기 연결)
+- 인터럽트 **허용 가능** (재시도·stateless 설계)
+- 트래픽이 지속적이나 일부 용량은 중단 내성이 있는 워크로드
+- 서비스 단위 운영 (`desiredCount ≥ 2`)
+
+**AS-IS:**
+```text
+[ALB / API Gateway]
+   |
+   v
+[ECS Service (FARGATE 전부)]
+  ├─ Task A (FARGATE) ─┐
+  ├─ Task B (FARGATE) ─┼─▶ 모두 On-Demand 과금
+  └─ Task C (FARGATE) ─┘
+
+  평균 CPU 30% → 70% idle 비용
+```
+
+**TO-BE:**
+```text
+[ALB / API Gateway]
+   |
+   v
+[ECS Service (capacityProviderStrategy 혼합)]
+  ├─ FARGATE weight=1        ─▶ Task 1 (기준 용량)
+  └─ FARGATE_SPOT weight=4   ─▶ Task 2-5 (Spot, 70% 절감)
+       │
+       └─ SIGTERM 핸들러 + 2분 drain + 외부 상태
+
+  서비스 레벨 desiredCount ≥ 2
+```
+
+**핵심 변경:**
+- `capacityProviderStrategy`로 On-Demand와 Spot 혼합 [AWS Docs — Fargate capacity providers]
+- 대부분 용량을 Spot으로 전환 (~70% 컴퓨트 절감) [Case: openclaw] [Insight #O1]
+- 단일 태스크 Spot 운영 금지 — 서비스 + `desiredCount ≥ 2`가 최소 조건 [tradeoffs-compute.md §3.1]
+- SIGTERM 핸들러로 2분 내 graceful drain 구현 [tradeoffs-compute.md §3.2]
+
+**트레이드오프:**
+- **장점**: 70% 컴퓨트 절감 [Insight #O1], 기존 컨테이너 이미지 그대로 재사용
+- **단점 1**: **자동 On-Demand fallback 없음** — 용량 부족 시 시작 지연만 발생 [AWS Docs]. `FARGATE` weight로 기준 용량 확보 필수
+- **단점 2 (안티패턴)**: 단일 태스크에 Fargate Spot 적용 시 용량 확보까지 완전 중단 — 가용성 위험 [tradeoffs-compute.md §3]
+- **단점 3**: SIGTERM 핸들러 의무 — 2분 내 정리 실패 시 데이터 손상 [AWS Docs]
+- **단점 4**: 상태 저장 불가 — in-memory 세션·임시 파일 의존 코드 stateless화 필요
+
+**openclaw의 설계 패턴 (dual-compute):**
+- **Primary**: Lambda Container (zero-idle, 1.35s cold start)
+- **Fallback**: ECS Fargate Spot (15분 초과 · 고부하 요청, ~70% 절감) [Case: openclaw] [Insight #O1]
+- 본 패턴은 ECS 기존 서비스를 직접 Fargate Spot 혼합으로 전환할 때 사용. openclaw dual-compute는 Lambda primary 구성을 전제로 하므로 Pattern 2.1과 결합 가능.
+
+**비용 범위** (예시, 검증 사례 출처 필수):
+- Fargate Spot 컴퓨트 절감: **~70%** (On-Demand 대비) [Case: openclaw] [Insight #O1]
+- openclaw 전체 월 비용은 $1-2 (Lambda primary + Fargate Spot fallback 조합) [Insight #O5]
+
+**이행 체크리스트 스켈레톤:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] 태스크 stateless 검증 (in-memory 세션·로컬 파일 의존 제거)
+  - [ ] SIGTERM 핸들러 구현 (2분 내 drain + 외부 상태 flush)
+  - [ ] `stopTimeout` 설정 (기본 30초, 최대 120초)
+  - [ ] `desiredCount ≥ 2` 확인
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 비프로덕션 클러스터에 `FARGATE_SPOT` weight 도입
+  - [ ] AWS FIS 또는 수동 task 종료로 SIGTERM 경로 검증
+  - [ ] EventBridge `ECS Task State Change` + `stopCode: "SpotInterruption"` 알람 연결 [tradeoffs-spot.md §6.2]
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] 프로덕션에 `FARGATE` weight=1 + `FARGATE_SPOT` weight=4 적용
+  - [ ] 인터럽트 발생 시 스케줄러 자동 재생성 확인
+  - [ ] 비용·인터럽트률 대시보드 (Spot reclaim 빈도)
+- [ ] **Stage 3 (전환)**:
+  - [ ] weight 비율 최적화 (예: Spot weight=7로 확대)
+  - [ ] 장기 관측 후 `FARGATE` baseline 재조정
+
+**위임(Delegation):** 없음 — 전용 `fargate-service` 스킬 미존재. 구현 단계는 [AWS Docs — Fargate capacity providers](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html) 참고.
+
+**Citations:**
+- [AWS Docs — Fargate capacity providers](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)
+- [AWS Docs — ECS Task State Change events](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html)
+- [Insight #O1] — Lambda + Fargate Spot dual-compute
+- [Insight #O5] — Free-tier cost target
+- [tradeoffs-compute.md §3] — Fargate 트레이드오프
+- [tradeoffs-spot.md §5, §6] — Spot Do/Don't 및 인터럽트 추적
+
+---
+
+### Pattern 2.3: WebSocket 상시 → API Gateway WebSocket + Lambda
+
+**적용 조건** (when to use):
+- 양방향 메시지 기반 실시간 통신 (채팅, 알림, 협업)
+- 개별 메시지 처리가 짧음 (<15분) — **연결 수명 != 함수 수명**
+- 연결 상태는 외부에 저장 가능 (DynamoDB / ElastiCache)
+- 순수 streaming(비디오·gRPC streaming)은 아니고 이벤트 기반 메시지
+
+**AS-IS:**
+```text
+[Client] ◀── WebSocket long-lived ──▶ [EC2 상시 (Node.js ws server)]
+                                         ├─ in-memory connection map
+                                         ├─ room state (local)
+                                         └─ broadcast via memory
+
+  연결 수 × 서버 리소스 = 상시 비용
+```
+
+**TO-BE:**
+```text
+[Client]
+   |
+   └── wss://...execute-api... ──▶ [API Gateway WebSocket API]
+                                     │
+                                     ├─ $connect ─▶ [Lambda] ─▶ [DynamoDB: connectionId ↔ userId]
+                                     ├─ $default ─▶ [Lambda] ─▶ 메시지 라우팅
+                                     └─ $disconnect ─▶ [Lambda] ─▶ DynamoDB cleanup
+
+  메시지 단위 과금 + 연결 시간(분당) 과금
+```
+
+**핵심 변경:**
+- EC2 상시 WebSocket 서버 제거 → API Gateway WebSocket API
+- 연결 상태(connectionId ↔ userId / room) DynamoDB로 외부화
+- 각 메시지는 독립 Lambda 호출로 처리 (stateless)
+- Broadcast는 `PostToConnection` API로 DynamoDB 조회 후 전송
+
+**트레이드오프:**
+- **장점**: 서버 운영 제거, 연결 수에 자동 확장, 메시지 단위 정확한 과금
+- **단점 1**: **연결 상태 외부화 필수** — DynamoDB/ElastiCache 왕복 오버헤드
+- **단점 2**: **Lambda 실행 시간 15분 상한** — 연결 수명 ≠ 함수 수명. long-running 로직 재설계 필요 [AWS Docs — Lambda quotas]
+- **단점 3**: Broadcast 팬아웃 시 DynamoDB 스캔·GSI 설계 필요 — 연결 수 증가 시 지연·비용 상승
+- **단점 4**: API Gateway WebSocket 과금은 **연결 시간(분) + 메시지 개수** 두 축 — 매우 많은 유휴 연결이 상시 있으면 오히려 비쌀 수 있음 [AWS Docs — API Gateway pricing]
+- **단점 5**: 진짜 streaming(비디오)·gRPC streaming 워크로드는 Fargate 유지 권고 [tradeoffs-compute.md §1.3]
+
+**비용 범위** (예시, 검증 사례 출처 필수):
+- openclaw는 메시지 기반 상태 UI 구성에 WebSocket 대신 **EventBridge scheduled pre-warming**으로 first-response를 제거함 [Case: openclaw] [Insight #O3]. 본 패턴의 WebSocket 특화 벤치마크는 검증 사례에 직접 없음 — AWS Docs 공식 과금 모델만 참조.
+
+**이행 체크리스트 스켈레톤:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] 연결·메시지·방(room) 모델을 DynamoDB 테이블 설계로 변환
+  - [ ] GSI 설계 (예: `userId` → `connectionId` 역조회)
+  - [ ] 기존 in-memory 상태 의존 로직 인벤토리
+  - [ ] TTL 설정으로 stale 연결 자동 정리
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 단일 채팅방 프로토타입 ($connect/$default/$disconnect)
+  - [ ] Broadcast 팬아웃 성능 측정 (연결 수 × 메시지 빈도)
+  - [ ] 연결 시간 과금 단가 계산 (`$0.25/million minutes` 기준)
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] 트래픽 일부 WebSocket API로 분기
+  - [ ] 롤백 플랜: Route 53 가중치 / 클라이언트 feature flag
+  - [ ] 이상 연결 탐지·자동 종료 로직
+- [ ] **Stage 3 (전환)**:
+  - [ ] 전체 전환, EC2 WebSocket 서버 종료
+  - [ ] 동시 연결 수 · 메시지 TPS · DynamoDB 비용 대시보드
+
+**위임(Delegation):** 없음 — 전용 스킬 미존재. 구현 단계는 [AWS Docs — API Gateway WebSocket](https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api.html) 참고.
+
+**Citations:**
+- [AWS Docs — API Gateway WebSocket](https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api.html)
+- [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+- [Insight #O3] — openclaw의 pre-warming 대체 접근
+- [tradeoffs-compute.md §1] — Lambda 15분 상한 · streaming 부적합
+
+---
+
+### Pattern 2.4: Java monolith on EC2 → Lambda + SnapStart
+
+**적용 조건** (when to use):
+- Java 11+ / Python 3.12+ / .NET 8+ 런타임 [AWS Docs — Lambda SnapStart]
+- 클래스로딩 · Spring Boot 초기화 · JVM warmup 비용이 큰 앱
+- 요청 처리 자체는 <15분
+- 컨테이너 이미지가 아닌 **zip 배포** (SnapStart는 컨테이너 이미지 미지원)
+- Provisioned Concurrency의 상시 과금을 피하고 싶은 경우
+
+**AS-IS:**
+```text
+[ALB]
+  |
+  v
+[EC2 (Tomcat + Spring Boot, 상시)]
+  ├─ 부팅 시 클래스로딩 30-60s
+  ├─ JIT warmup 수 분
+  └─ 상시 메모리 점유
+
+  ALB + EC2 idle 비용 (Pattern 2.1과 공통)
+```
+
+**TO-BE:**
+```text
+[API Gateway]
+  |
+  v
+[Lambda (Java + SnapStart 활성화)]
+  ├─ 게시된 version 또는 alias
+  ├─ beforeCheckpoint / afterRestore 훅 구현
+  └─ sub-second 복원 지연
+
+  zero-idle + 콜드 스타트 완화
+```
+
+**핵심 변경:**
+- EC2 상시 + ALB → API Gateway + Lambda (Pattern 2.1 기반)
+- **SnapStart 활성화** — published version 또는 alias 단위 [AWS Docs]
+- `$LATEST` unqualified 호출은 SnapStart 미적용 — alias/version 강제 사용
+- Provisioned Concurrency와 SnapStart는 **상호배타** — 하나만 선택 [AWS Docs]
+
+**트레이드오프:**
+- **장점 1**: 콜드 스타트 sub-second 수준 (Spring Boot 전통적 30-60s 대비 극적 개선)
+- **장점 2 (Java)**: 요청·실행시간·메모리만 과금 — **추가 비용 없음** [AWS Docs — Lambda SnapStart]
+- **단점 1 (Python/.NET)**: 캐시 + 복원 비용 추가 (메모리 기반, 리전 단가), 최소 **3시간분** 과금 [AWS Docs]. 호출 빈도가 매우 낮으면 오히려 Provisioned Concurrency가 나을 수 있음
+- **단점 2**: **VPC ENI 재연결 지연** — VPC 안에서 RDS/Redis 초기화하는 함수는 SnapStart 효과 제한적 [AWS Docs]. ENI 연결이 병목.
+- **단점 3 (Uniqueness 함정)**: 스냅샷 시점의 난수·UUID·TLS 세션 키가 모든 복원 인스턴스에 복제됨 — 금융·인증 계열에서는 fresh entropy 재생성 훅 필수 [AWS Docs]
+- **단점 4**: 컨테이너 이미지 미지원 — zip 배포로 전환 필요. 250MB (unzipped, layer 포함) 한계 [tradeoffs-compute.md §1.1]
+- **단점 5**: Java 14일 미호출 시 Inactive → 다음 호출 시 재초기화 (`SnapStartNotReadyException`) [AWS Docs]
+- **단점 6**: Provisioned Concurrency · EFS · 512MB 초과 ephemeral storage와 **상호배타** [AWS Docs]
+
+**Uniqueness 완화 패턴:**
+- 난수·UUID·TLS 세션 키는 **핸들러 내부**에서 생성 (초기화 단계가 아니라)
+- `afterRestore` 훅에서 DB/Redis 커넥션 재확립, JWT 키 회전
+- AWS SDK 자격증명은 `AWS_CONTAINER_CREDENTIALS_FULL_URI`로 자동 갱신됨 [AWS Docs]
+
+**비용 범위** (예시, 검증 사례 출처 필수):
+- openclaw는 Lambda Container를 사용하므로 **SnapStart 미적용** (컨테이너 이미지 제외 조건) — 본 패턴의 Java+SnapStart 직접 벤치마크는 검증 사례에 없음.
+- 참조 상수: openclaw Lambda Container 콜드 스타트 **1.35s**, warm 0.12s [Case: openclaw]. SnapStart는 sub-second 공언 [AWS Docs]로 이보다 낮은 지연 기대.
+- Java SnapStart는 추가 비용 없음 [AWS Docs] — Pattern 2.1 비용 범위와 동일 가정 가능.
+
+**이행 체크리스트 스켈레톤:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] zip 배포 전환 (컨테이너 이미지 사용 중이면)
+  - [ ] Java 11+ / Python 3.12+ / .NET 8+ 확인
+  - [ ] Uniqueness 재검토 — 초기화 단계의 난수·TLS 키 생성 지점 스캔
+  - [ ] `beforeCheckpoint` / `afterRestore` 훅 구현 (Java CRaC API)
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] alias + published version으로 SnapStart 활성화
+  - [ ] 콜드 스타트 p50·p99 측정 (활성 전후 비교)
+  - [ ] DB 커넥션 재확립 경로 검증
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] 트래픽 X% alias로 분기, 14일 inactive 조건 확인
+  - [ ] Uniqueness 취약 지점 (세션 키 중복) 감사
+  - [ ] Python/.NET일 경우 캐시 비용 관찰 (최소 3시간 과금)
+- [ ] **Stage 3 (전환)**:
+  - [ ] EC2 Tomcat 제거, alias 100% 트래픽
+  - [ ] 릴리스마다 alias shift 워크플로 확립
+
+**위임(Delegation):** 없음 — 전용 스킬 미존재. 구현 단계는 [AWS Docs — Lambda SnapStart](https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html) 참고.
+
+**Citations:**
+- [AWS Docs — Lambda SnapStart](https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html)
+- [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+- [tradeoffs-compute.md §1] — Lambda 콜드 스타트 완화 수단
+- [Case: openclaw] — Lambda Container 1.35s cold start 참조 상수
+
+---
+
+## 3. Anti-patterns
+
+**A1. 콜드 스타트 p99 <100ms 요구하면서 Lambda 선택**
+- Lambda Container 1.35s, SnapStart 최적 sub-second — p99 <100ms는 Provisioned Concurrency + warm-up 없이는 불가능 [AWS Docs — Lambda]
+- 해당 SLA는 Fargate 상시 유지 또는 EC2로 회귀하는 것이 맞음
+
+**A2. Fargate Spot 단일 태스크로 운영**
+- `desiredCount = 1` + Fargate Spot = 인터럽트 시 용량 확보까지 완전 중단 [tradeoffs-compute.md §3]
+- 반드시 서비스 + `desiredCount ≥ 2` 또는 `FARGATE` capacity provider weight로 기준 용량 확보
+
+**A3. 6MB 초과 응답을 Lambda 동기 호출로 반환**
+- 동기 요청·응답 각 6MB 한계 [tradeoffs-compute.md §1.1]
+- 대용량은 스트리밍(200MB) 또는 S3 presigned URL로 우회
+
+**A4. SnapStart와 Provisioned Concurrency 동시 활성화 시도**
+- 상호배타 — 둘 중 하나만 선택 가능 [AWS Docs — Lambda SnapStart]
+- 엄격 p99 SLA는 Provisioned Concurrency, 비용 우선은 SnapStart
+
+**A5. 로컬 상태 의존 코드의 직접 Lambda 이식**
+- in-memory session, 로컬 파일, 프로세스 내부 캐시 의존 로직은 "share nothing" 원칙 위배 (Serverless Lens 원칙 3)
+- DynamoDB + S3로 외부화 필요 [Insight #O4]
+
+**A6. WebSocket 대신 Lambda long-polling**
+- Lambda 15분 상한 · streaming 대역폭 제한(처음 6MB 무제한 이후 2 MB/s) 때문에 long-polling 부적합 [tradeoffs-compute.md §1.1]
+- 양방향 실시간 통신은 API Gateway WebSocket (Pattern 2.3) 또는 Fargate 상시 유지
+
+**A7. SnapStart 초기화 단계에서 난수/세션 키 생성**
+- 모든 복원 인스턴스에 복제되는 uniqueness 함정 [AWS Docs]
+- 핸들러 내부 또는 `afterRestore` 훅에서 생성해야 함
+
+---
+
+## 4. Citations
+
+- [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+- [AWS Docs — API Gateway + Lambda](https://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-with-lambda-integration.html)
+- [AWS Docs — API Gateway WebSocket](https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api.html)
+- [AWS Docs — ECS Fargate capacity providers](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)
+- [AWS Docs — Lambda SnapStart](https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html)
+- [AWS Docs — ECS Task State Change events](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html)
+- [Case: openclaw] — Lambda Container 1.35s, ALB $18-25/월 제거, ~$1/월 운영
+- [Insight #O1] — Lambda + Fargate Spot dual-compute
+- [Insight #O2] — API Gateway over ALB
+- [Insight #O3] — EventBridge scheduled pre-warming
+- [Insight #O4] — DynamoDB + S3 session persistence
+- [Insight #O5] — Free-tier first cost target
+- [Insight #O6] — CloudFront + S3 for web UI (참고; 정적 UI는 별도 패턴)
+- Cross-refs: [tradeoffs-compute.md §1, §3], [tradeoffs-data-layer.md §1, §2], [tradeoffs-spot.md §5, §6]
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier3-data.md b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier3-data.md
new file mode 100644
index 0000000..45541e1
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier3-data.md
@@ -0,0 +1,321 @@
+> **Snapshot date**: 2026-04-18
+> **Tier**: 3 (데이터 레이어 이행) — **검증 사례 없음**
+> **Description**: RDS→DynamoDB, CDC 전이
+
+# Tier 3 Migration Patterns — Data Layer
+
+> ⚠️ **Tier 3 검증 경고**
+>
+> 본 스킬이 참조하는 두 검증 사례(serverless-autoresearch, serverless-openclaw)는 Tier 3 데이터 이행을 직접 검증하지 않았습니다. 본 문서는 AWS 공식문서 기반 원칙·체크리스트·AWS Docs 링크만 제공합니다.
+>
+> 데이터 이행은 **Tier 1/2 중에서도 가장 고위험** 영역이며, 특히 **엔진 간 이전(RDS → DynamoDB)**은 모델 재설계·롤백 비용이 크므로 다음을 **필수**로 합니다:
+> 1. 소규모 access pattern으로 파일럿 검증
+> 2. CDC 기반 dual-write + read-diff 모니터링
+> 3. 롤백 플랜 사전 리허설
+>
+> 참고: SPEC §9 "Tier 3 취급 방침", [patterns-tier3-monolith.md](patterns-tier3-monolith.md) §Warning.
+
+## 1. Scope of this document
+
+**포함**:
+- 검증 가능한 공식 한계 기반의 **이전 리스크 단계별 분류**
+- Aurora Serverless v2 / DynamoDB / S3 Express 이전의 원칙 수준 체크리스트
+- 공통 패턴: CDC 기반 dual-write, read-diff 모니터링, 단계적 절체
+
+**미포함**:
+- 조직별 구체 ETL 스크립트
+- 데이터 모델 자동 변환 (관계형 → NoSQL 재설계는 도메인 전문가 영역)
+- 비용·기간 추정 (검증 데이터 없음)
+
+세부 AWS Docs 수치·한계는 [tradeoffs-data-layer.md](tradeoffs-data-layer.md)로 위임.
+
+## 2. Patterns
+
+### Pattern 4.1: RDS → Aurora Serverless v2 (동일 엔진 유지)
+
+**적용 조건** (when to use):
+- 동일 엔진 — MySQL↔MySQL / PostgreSQL↔PostgreSQL [tradeoffs-data-layer.md §1]
+- 기존 애플리케이션 SQL·DDL 그대로 승계 가능
+- idle 구간이 뚜렷하거나 트래픽이 spiky (ACU 스케일 이득)
+- 커넥션 폭주 대비 RDS Proxy 도입 가능
+
+**리스크 수준**: **낮음** — 엔진 호환이라 데이터 모델 재설계 불요
+
+**AS-IS:**
+```text
+[App on EC2/Lambda]
+   |
+   v
+[RDS db.m5.large 상시]
+  ├─ idle 8h/day (예: 야간)
+  └─ 고정 인스턴스 과금
+
+  peak 대응 위해 over-provisioning
+```
+
+**TO-BE:**
+```text
+[App on Lambda]
+   |
+   v
+[RDS Proxy] ─▶ [Aurora Serverless v2 cluster (ACU 0.5-16)]
+                  ├─ 0.5 ACU 단위 초 단위 스케일
+                  ├─ min ACU 선택: 0 (auto-pause) or >0 (항상 warm)
+                  └─ Multi-AZ · Global DB 지원
+
+  burst 시 순식간 확장, idle 시 min까지 축소
+```
+
+**핵심 변경:**
+- 고정 인스턴스 → ACU 기반 서버리스 컴퓨트 [AWS Docs — Aurora Serverless v2]
+- 0.5 ACU 단위의 초 단위 연속 스케일 — v1 대비 개선 [tradeoffs-data-layer.md §1.2]
+- RDS Proxy 도입으로 Lambda ↔ Aurora 커넥션 풀링 최적화
+- 동일 클러스터 내 Provisioned + Serverless v2 혼합 가능 (점진 전환 경로)
+
+**트레이드오프:**
+- **장점**: 동일 엔진 유지로 애플리케이션 변경 최소, 자동 스케일, downtime 없는 온라인 스케일
+- **단점 1 (바닥 비용 함정)**: `min ACU > 0` 설정 시 idle 시에도 항상 과금 [tradeoffs-data-layer.md §1.2]. 완전 zero-idle 원할 경우 min=0 + auto-pause — 단 resume 지연 수용
+- **단점 2**: `min=0` 자동 pause → resume 시 수 초 지연 — 사용자 대면 API의 first-request에 영향
+- **단점 3**: Database Activity Streams, Cluster Cache Management(Aurora PG), Aurora Auto Scaling 미지원 [AWS Docs]
+- **단점 4**: writer + reader 각 인스턴스별 min ACU 과금 — 2개 인스턴스 × min=1이면 항상 최소 2 ACU 청구
+
+**이행 체크리스트:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] 기존 RDS 스냅샷 + 테스트 환경 복원
+  - [ ] 애플리케이션 커넥션 풀링 전략 재검토 (Lambda 동시성 대비 RDS Proxy 필요성)
+  - [ ] ACU 범위 산정 (peak QPS · 메모리 요구 기반)
+  - [ ] min ACU 결정 (0 vs >0) — SLA와 비용 균형
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 동일 클러스터에 Serverless v2 인스턴스 추가 (Provisioned 유지)
+  - [ ] 부하 테스트로 ACU 스케일 동작 확인
+  - [ ] RDS Proxy 엔드포인트로 Lambda 동시 커넥션 테스트
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] reader 일부를 Serverless v2로 전환, read 트래픽 검증
+  - [ ] failover 리허설 (Multi-AZ)
+  - [ ] 비용·스케일 로그 관찰
+- [ ] **Stage 3 (전환)**:
+  - [ ] writer를 Serverless v2로 전환 (엔진 호환이라 DDL 변경 없음)
+  - [ ] 이전 Provisioned 인스턴스 일정 기간 유지 (롤백용)
+  - [ ] 이전 인스턴스 종료
+
+**위임(Delegation):** 없음 — 전용 스킬 미존재. 구현 단계는 [AWS Docs — Aurora Serverless v2](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html) 참고.
+
+**Citations:**
+- [AWS Docs — Aurora Serverless v2](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html)
+- [AWS Docs — RDS Proxy](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy.html)
+- [tradeoffs-data-layer.md §1] — Aurora Serverless v2 정량 한계 및 함의
+
+---
+
+### Pattern 4.2: RDS → DynamoDB (access pattern 재설계)
+
+**적용 조건** (when to use):
+- 주요 access pattern이 **3-5개 이내**로 식별됨 (단일 테이블 설계 가능)
+- 관계형 JOIN 의존이 낮거나 제거 가능 (애플리케이션 수준 해결)
+- key-value 또는 document-oriented 접근이 주류
+- 서버리스 Lambda 동시성 확장과 매칭 필요
+
+**리스크 수준**: **매우 높음** — 쿼리 유연성 상실, 데이터 모델 재설계 필수, 롤백 비용 큼
+
+**AS-IS:**
+```text
+[App]
+   |
+   v
+[RDS (정규화된 다중 테이블)]
+  ├─ users · orders · items · inventories
+  ├─ SQL JOIN으로 조합 쿼리
+  └─ ad-hoc 쿼리 가능
+```
+
+**TO-BE:**
+```text
+[App]
+   |
+   v
+[DynamoDB (single-table design)]
+  ├─ PK: entity_type#id
+  ├─ SK: nested entity sort key
+  ├─ GSI1 / GSI2: access pattern별 역인덱스
+  └─ 사전 식별된 쿼리만 고효율
+
+  ad-hoc 쿼리는 Athena/S3 export로 별도 처리
+```
+
+**핵심 변경:**
+- **데이터 모델 재설계 필수** — 관계형 JOIN 패턴을 DynamoDB 단일 테이블 설계로 재구성 [AWS Docs — DynamoDB single-table design]
+- Access pattern을 **사전에** 나열하고 그것을 전제로 PK/SK/GSI 설계
+- 이전 방식: DMS CDC target으로 2-way sync 구간 설정 → dual-write → read diff → cutover
+
+**트레이드오프:**
+- **장점**: On-Demand mode 시 zero-idle, Lambda 동시성과 자연 매칭, 무한 스케일
+- **단점 1**: **쿼리 유연성 상실** — 사전 설계되지 않은 쿼리는 full scan 또는 별도 분석 스토리지 필요
+- **단점 2**: **롤백 비용 매우 높음** — 재동기 비용, 기간 (수 주~수 개월)
+- **단점 3**: Strong vs eventual consistency 선택이 비용에 직결 [tradeoffs-data-layer.md §2.2]
+- **단점 4**: GSI를 많이 걸면 write 비용 증가 (write 시 모든 GSI 업데이트)
+- **단점 5**: 트랜잭션이 TransactWriteItems로 제한됨 (25개 아이템, 4MB) — 복잡한 멀티 엔티티 트랜잭션 재설계 필요
+- **단점 6**: On-Demand max throttle 설정 없이 운영 시 **무한 비용 폭주** 위험 — per-table max 설정 필수 [tradeoffs-data-layer.md §2.1]
+
+**이행 체크리스트:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] **Access pattern 문서화** — key + query shape를 3-5개 이내로 정리 (전체가 10개 초과면 패턴 재고)
+  - [ ] Single-table design 스케치 (PK/SK/GSI)
+  - [ ] 트랜잭션 경계 재검토 — DynamoDB TransactWriteItems 한계 수용 가능 여부
+  - [ ] Athena·S3 export로 ad-hoc 쿼리 대안 마련
+  - [ ] per-table max capacity 설정값 결정 (비용 폭주 방지)
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 파일럿 테이블 (저중요도 엔티티)로 CDC 설정 (DMS source=RDS, target=DynamoDB)
+  - [ ] 샘플 애플리케이션 경로로 read/write 이중화 (dual-write)
+  - [ ] 모델·쿼리 성능 측정
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] dual-write + **read-diff 모니터링** — 일정 기간 DynamoDB read 결과를 RDS와 대조
+  - [ ] throughput·latency·cost 대시보드
+  - [ ] 실패 시나리오 시뮬레이션 — **롤백 리허설 필수**
+- [ ] **Stage 3 (전환)**:
+  - [ ] read cutover (RDS write 유지, read는 DynamoDB)
+  - [ ] write cutover (DynamoDB primary, RDS는 일정 기간 shadow로 유지)
+  - [ ] 관측 기간 확보 후 RDS 폐기 (최소 수 주)
+  - [ ] 모드 전환 제한(Provisioned → On-Demand 24시간당 최대 4회) 사전 고려
+
+**위임(Delegation):** 없음 — 전용 스킬 미존재. 구현 단계는 [AWS Docs — DMS CDC](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html) 및 [AWS Docs — DynamoDB single-table design](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html) 참고.
+
+**Citations:**
+- [AWS Docs — DMS CDC](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html)
+- [AWS Docs — DynamoDB Read/Write Capacity Mode](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html)
+- [AWS Docs — DynamoDB single-table design](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html)
+- [AWS Docs — DynamoDB TransactWriteItems](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html)
+- [tradeoffs-data-layer.md §2] — DynamoDB 트레이드오프
+
+---
+
+### Pattern 4.3: Batch 읽기 가속 — S3 Express One Zone 캐시 도입
+
+**적용 조건** (when to use):
+- 고빈도 object GET — 요청 단가가 비용 지배적인 워크로드
+- 훈련 데이터 shuffle, 분석 중간 결과, 인터랙티브 크리에이티브 워크로드
+- EC2/Fargate/Lambda와 **동일 AZ**에 배치 가능
+- 장기 아카이브·다중 리전 재해복구 요구 없음
+
+**리스크 수준**: **중간** — 단일 AZ 제약과 비용 역전 함정
+
+**AS-IS:**
+```text
+[Training job on EC2/Fargate (ml AZ: us-east-1a)]
+   |
+   v
+[S3 Standard (리전 복수 AZ)]
+  ├─ shuffle/read 고빈도
+  ├─ 요청 단가 지배적
+  └─ 수십 ms 지연
+```
+
+**TO-BE:**
+```text
+[Training job (동일 AZ us-east-1a)]
+   |
+   v
+[S3 Express One Zone directory bucket (us-east-1a)]
+  ├─ 요청 단가 ~50% 절감
+  ├─ 한 자리 ms 지연 (~10× 저지연)
+  ├─ 중간 결과만 저장
+  └─ 원본은 S3 Standard에 별도 보관
+
+  쓰기 경로는 S3 Standard 유지 가능 (원본 내구성)
+```
+
+**핵심 변경:**
+- 읽기 workload만 S3 Express로 라우팅; 쓰기 경로는 S3 Standard 유지 옵션
+- 버킷 네이밍 규칙: `{base}--{zone-id}--x-s3` 필수 [AWS Docs — S3 Express One Zone]
+- Directory bucket 스키마 — 계층형 디렉토리, flat prefix와 다름 [tradeoffs-data-layer.md §3.1]
+- `s3express:CreateSession` 인증 모델 — SDK·IAM 정책 업그레이드 필요
+
+**트레이드오프:**
+- **장점**: 요청 단가 ~50% 저렴, 한 자리 ms 지연, 동일 AZ DTO 비용 0 [tradeoffs-data-layer.md §3.1]
+- **단점 1 (단일 AZ 가용성)**: AZ 장애 시 데이터 유실 가능 — 원본은 Standard/Glacier에 **반드시** 별도 보관 [tradeoffs-data-layer.md §3.2]
+- **단점 2 (비용 역전 함정)**: 저빈도 접근 시 Standard 대비 **총비용 증가** — 요청 단가 절감은 요청 빈도에 비례 [AWS Docs]
+- **단점 3 (기능 제한)**: 버저닝 없음, CRR/SRR 없음, Lifecycle 제한, Intelligent-Tiering·Glacier 불가 [AWS Docs]
+- **단점 4 (암호화 제한)**: SSE-C 미지원; SSE-S3/SSE-KMS만 [AWS Docs]
+- **단점 5 (CDN 오리진 부적합)**: Block Public Access 항상 On, ACL 비활성 → 공개 CDN 오리진으로 부적합 [AWS Docs]
+
+**이행 체크리스트:**
+- [ ] **Stage 0 (사전 준비)**:
+  - [ ] 워크로드의 AZ 고정 가능성 확인 (EC2/Fargate launch 전략)
+  - [ ] 요청 빈도 측정 — 빈도가 낮으면 비용 역전 위험, Express 도입 재고
+  - [ ] 원본 S3 Standard 유지 경로 설계 (내구성·규제 대응)
+  - [ ] SDK 최신 버전 + IAM 정책 업그레이드 (CreateSession 권한)
+- [ ] **Stage 1 (저위험 검증)**:
+  - [ ] 파일럿 directory bucket 생성 (`{base}--{zone-id}--x-s3` 네이밍)
+  - [ ] 샘플 훈련 잡의 shuffle 경로를 Express로 전환
+  - [ ] 지연·비용·요청 성공률 측정
+- [ ] **Stage 2 (파일럿)**:
+  - [ ] 프로덕션 워크로드 일부 이전
+  - [ ] AZ 장애 시나리오 시뮬레이션 (해당 AZ 차단 → 원본 Standard fallback 검증)
+  - [ ] 쿼터(리전별 디렉토리 버킷 100개) 증설 필요성 모니터링
+- [ ] **Stage 3 (전환)**:
+  - [ ] 읽기 경로 100% Express 전환
+  - [ ] 쓰기 경로는 선택적: 원본 내구성 중요 시 Standard 유지
+  - [ ] 기존 캐시 인프라 (ElastiCache 등) 축소 가능성 재평가
+
+**위임(Delegation):** 없음 — 전용 스킬 미존재. 구현 단계는 [AWS Docs — S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html) 참고.
+
+**Citations:**
+- [AWS Docs — S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html)
+- [AWS Docs — Directory buckets overview](https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html)
+- [tradeoffs-data-layer.md §3] — S3 Standard vs Express 비교
+
+---
+
+## 3. Do / Don't
+
+| Do | Don't |
+|----|-------|
+| 단일 access pattern으로 파일럿 시작 | 전체 테이블 한 번에 이전 시도 |
+| CDC(DMS) 기반 점진 이전 | blue/green 한 번 컷오버 |
+| read-diff 모니터링으로 이행 정확성 검증 | 쓰기만 검증 후 바로 read cutover |
+| per-table max 설정 (DynamoDB On-Demand) | max 없이 운영 (비용 폭주 위험) |
+| Aurora Serverless v2에 RDS Proxy 결합 | Lambda가 직접 Aurora에 커넥션 다수 생성 |
+| S3 Express 도입 시 **동일 AZ** 배치 | 다중 AZ 워크로드에 Express 사용 (DTO·지연 이득 상실) |
+| 원본은 Standard/Glacier에 유지 | 중요 데이터의 **유일 복사본**을 Express에 저장 |
+| 롤백 플랜 사전 리허설 | 이행 후 문제 발생 시 즉흥 대응 |
+| access pattern 문서화 선행 | 기존 SQL을 그대로 NoSQL로 옮기려 시도 |
+| Strong vs eventual 일관성 비용 명시 | 기본값(Strong)만 사용 (RRU 2배 과금) |
+
+## 4. 의사결정 질문 (Phase 3 인터뷰 보조)
+
+- **Q**: 엔진 변경 허용 가능?
+  - No (MySQL/PostgreSQL 유지) → **Pattern 4.1** (Aurora Serverless v2) 경로
+  - Yes → Pattern 4.2 검토 대상
+
+- **Q**: Access pattern 수?
+  - ≤5 → Pattern 4.2 (DynamoDB 재설계) 가능
+  - 6-10 → 파일럿 권장, 단일 테이블 설계 난이도 상승
+  - >10 → **Pattern 4.2 부적합** — Aurora Serverless v2 또는 하이브리드
+
+- **Q**: ad-hoc 분석 쿼리 빈도?
+  - 낮음 → DynamoDB 수용 가능 (S3 export + Athena)
+  - 높음 → Aurora Serverless v2 유지 + DynamoDB 공존 (hot path만 분리)
+
+- **Q**: 읽기 빈도 vs 스토리지 규모?
+  - 고빈도 읽기 (> 1M GET/day) → **Pattern 4.3** (S3 Express) 검토
+  - 저빈도 접근 → Standard 유지 (비용 역전 회피)
+
+- **Q**: AZ 고정 가능?
+  - Yes → S3 Express 도입 가능
+  - No (multi-AZ 워크로드) → S3 Express 부적합
+
+## 5. Citations
+
+- [AWS Docs — Aurora Serverless v2](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html)
+- [AWS Docs — RDS Proxy](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy.html)
+- [AWS Docs — DMS CDC](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html)
+- [AWS Docs — DynamoDB Read/Write Capacity Mode](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html)
+- [AWS Docs — DynamoDB single-table design](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html)
+- [AWS Docs — DynamoDB TransactWriteItems](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html)
+- [AWS Docs — S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html)
+- [AWS Docs — Directory buckets overview](https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html)
+- [AWS prescriptive guidance — Strangler Fig in data persistence](https://docs.aws.amazon.com/prescriptive-guidance/latest/modernization-data-persistence/strangler-fig.html)
+- Cross-refs: [tradeoffs-data-layer.md §1~3](tradeoffs-data-layer.md), [patterns-tier3-monolith.md](patterns-tier3-monolith.md)
+
+---
+
+*본 문서는 Tier 3 데이터 이행 시 **원칙·리스크·체크리스트 skeleton**에 한정. 실제 이행은 조직별 파일럿과 도메인 전문가 설계를 전제로 한다.*
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier3-monolith.md b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier3-monolith.md
new file mode 100644
index 0000000..42e5acc
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/patterns-tier3-monolith.md
@@ -0,0 +1,166 @@
+> **Snapshot date**: 2026-04-18
+> **Tier**: 3 (모놀리스 분해) — **검증 사례 없음**
+> **Description**: Strangler Fig 기반 모놀리스 분해
+
+# Tier 3 Migration Patterns — Monolith Decomposition
+
+> ⚠️ **Tier 3 검증 경고**
+>
+> 본 스킬이 참조하는 두 검증 사례(serverless-autoresearch, serverless-openclaw)는 Tier 3 이행을 직접 검증하지 않았습니다. 본 문서는 AWS 공식 prescriptive guidance + 원칙 수준 가이드만 제공합니다.
+>
+> 대규모 프로젝트 적용 전 **필수**:
+> 1. 서비스 경계(Bounded Context) 식별
+> 2. 파일럿 서비스 1~2개로 소규모 검증
+> 3. 실제 트래픽·장애·롤백 시나리오 리허설
+>
+> Strangler Fig / Branch by Abstraction은 2년 이상 소요되는 대형 이행이 일반적이며, 스킬이 체크리스트 수준으로 제시하는 것 이상을 이 문서에서 자동화하지 않습니다.
+>
+> SPEC §9 "Tier 3 취급 방침"과 일관: 원칙 수준 패턴 + AWS Docs 링크 + 파일럿 권고까지.
+
+## 1. Scope of this document
+
+**포함**:
+- Strangler Fig / Branch by Abstraction의 **원칙** 수준 가이드
+- AWS prescriptive guidance 링크
+- "무엇을 하면 안 되는가"의 Do/Don't
+
+**미포함**:
+- 서비스 경계 식별 자동화
+- 구체적 before/after 다이어그램 (각 조직의 모놀리스 구조 의존도가 너무 큼)
+- 단계별 비용·기간 추정 (검증 데이터 없음)
+- IaC diff 제안
+
+본 스킬의 출력은 **"Tier 3임을 식별하고 AWS Docs와 파일럿 권고로 이어지는 bridge"** 역할에 국한됩니다.
+
+## 2. Applicable approaches (원칙 수준)
+
+### Approach 3.1: Strangler Fig Application
+
+**원칙**:
+- 기존 모놀리스 뒤에 neutral routing 레이어(API Gateway 또는 ALB path-based routing) 배치
+- 신규 기능 또는 추출 대상 기능만 서버리스 컴포넌트로 분리
+- 기존 모놀리스는 **그대로 유지**하며, 점진적으로 라우팅을 새 컴포넌트로 이관
+- 최종적으로 모놀리스에 잔존 코드가 없을 때 폐기
+
+**관련 AWS 서비스**:
+- **Routing 레이어**: API Gateway (HTTP/REST API, 경로별 target), ALB path-based routing, CloudFront origins
+- **신규 capabilities**: Lambda (Tier 2 패턴), Fargate (상시 서비스), Step Functions (워크플로)
+- **데이터 분리**: DynamoDB, Aurora Serverless v2 (서비스별 DB — [patterns-tier3-data.md](patterns-tier3-data.md) §Pattern 4.1-4.2)
+
+**AWS prescriptive guidance**: [Monolith deconstruction strategy](https://docs.aws.amazon.com/prescriptive-guidance/latest/modernization-aspnet-web-services/monolith-deconstruction-strategy.html)
+
+### Approach 3.2: Branch by Abstraction
+
+**원칙**:
+- 코드 내부에 **추상화 레이어(interface)** 도입
+- 기존 구현(모놀리스 메서드) 뒤에 새 구현(서버리스 함수 호출)을 병렬 배치
+- feature flag로 트래픽 비율 이동 → 안전성 검증 후 이전 구현 폐기
+- Strangler Fig가 **네트워크 경계**의 점진적 대체라면 Branch by Abstraction은 **코드 경계**의 점진적 대체
+
+**서버리스 관점**:
+- 추상화의 대체 구현이 Lambda 함수 호출 또는 Fargate task API 호출이 될 수 있음
+- 결제·인증 등 복잡한 내부 서비스 추출에 적합
+
+**상호 보완**: Strangler Fig + Branch by Abstraction을 결합 가능. 외부 경계는 Strangler Fig (API Gateway routing), 내부 호출은 Branch by Abstraction (코드 interface)로 전환.
+
+### Approach 3.3: Database-per-service 이전
+
+**원칙**:
+- 서비스 분리 시 **데이터도 분리** — 공유 DB를 유지한 채 서비스만 쪼개는 것은 분산 모놀리스를 만들 위험
+- 서비스별 소유 DB를 선언하고, 다른 서비스가 필요로 하는 데이터는 **이벤트 또는 API**로 제공
+- CDC(Change Data Capture)로 기존 모놀리스 DB → 서비스별 DB 점진 이전
+
+**관련 AWS 서비스·문서**:
+- DMS (CDC 기반 이전): [patterns-tier3-data.md §Pattern 4.2](patterns-tier3-data.md)
+- EventBridge (서비스 간 이벤트): [tradeoffs-event-driven.md §1](tradeoffs-event-driven.md)
+- Saga 패턴 (Step Functions Standard): [tradeoffs-event-driven.md §4](tradeoffs-event-driven.md)
+
+**AWS prescriptive guidance**: [Database-per-service 패턴은 Strangler Fig 문맥에서 함께 다뤄짐](https://docs.aws.amazon.com/prescriptive-guidance/latest/modernization-data-persistence/strangler-fig.html)
+
+### Approach 3.4: Observability-first
+
+**원칙**:
+- 모놀리스 분해 **이전에** 관측 가능성(로그·메트릭·트레이싱) 먼저 확보
+- 경계를 모르는 상태에서 분해 시작 시 장애 원인 파악 불가
+- X-Ray 트레이싱을 모놀리스 단계에서 도입 → 분해 후에도 호출 경로 추적 유지
+
+**관련 AWS 서비스**:
+- AWS X-Ray (distributed tracing)
+- CloudWatch Logs Insights (구조화 로그 쿼리)
+- CloudWatch Contributor Insights (top-N 경로 분석)
+
+## 3. Do / Don't
+
+| Do | Don't |
+|----|-------|
+| 1-2개 bounded context 파일럿부터 시작 | 전체 모놀리스 동시 분해 시도 |
+| 신규 기능을 **서버리스로 먼저** 구현 | 기존 기능 재작성 우선 |
+| API Gateway / ALB path routing으로 routing 제어 | DNS 기반 분할 (TTL·캐시 복잡도) |
+| 추출 전 observability 확보 | 분해 후 관측 (원인 파악 불가) |
+| 서비스 경계 = 데이터 경계 원칙 준수 | 공유 DB 유지하며 서비스만 분리 (분산 모놀리스) |
+| 서비스 간 비동기 이벤트 (EventBridge) | 동기 RPC 체인으로 묶기 (연쇄 장애) |
+| feature flag로 롤백 가능한 전환 | 빅뱅 전환 (컷오버 실패 시 대안 없음) |
+| 서비스별 독립 배포 파이프라인 | 모놀리스 파이프라인에 새 서비스 붙이기 |
+| 파일럿의 실패 시나리오·롤백 리허설 | 검증 없이 프로덕션 적용 |
+
+## 4. 의사결정 질문 (Phase 3 인터뷰 보조)
+
+본 스킬이 Tier 3로 분류한 사용자에게 Phase 3에서 묻는 질문:
+
+- **Q**: 분해 목표 서비스는 1-2개로 한정 가능한가?
+  - Yes → Strangler Fig 파일럿 경로 추천
+  - No (전체 재작성) → **권장 중단** — 재작성 대신 신규 기능만 서버리스로 추가하는 방향 제안
+
+- **Q**: 현재 모놀리스의 X-Ray 트레이싱·구조화 로그 커버리지?
+  - 충분 → 분해 준비 가능
+  - 부족 → **Stage 0을 "observability 확보"로 고정** — 분해 시작 연기
+
+- **Q**: 서비스 경계 후보의 bounded context가 식별되었나?
+  - Yes (DDD workshop 등 완료) → 진행
+  - No → 진행 전 **DDD 발견 workshop** 권장 — 경계 없는 분해는 분산 모놀리스 생산
+
+- **Q**: 다운타임 윈도우?
+  - 무 (24/7 전환) → feature flag + Strangler Fig 필수
+  - 유 (주기적 유지보수 윈도우) → blue/green 컷오버 가능하나 **여전히 파일럿 권장**
+
+- **Q**: 데이터 일관성 요구?
+  - Strong (금융 · 재고) → Step Functions Standard + Saga 패턴 ([tradeoffs-event-driven.md §4](tradeoffs-event-driven.md))
+  - Eventual 허용 → EventBridge 기반 이벤트 전파로 충분
+
+- **Q**: 벤더 락인 허용도?
+  - High (AWS 전면 수용) → Lambda · DynamoDB 적극 활용
+  - Low (이식성 우선) → Fargate 중심 컨테이너 분해 (Pattern 2.2) 권장
+
+## 5. 이 스킬의 역할 제한
+
+- Tier 3 이행 **상세 체크리스트** 제공하지 않음 — 각 조직의 상황 의존도가 너무 크다
+- 서비스 경계 식별 **자동화** 시도하지 않음 — bounded context 결정은 도메인 전문가의 workshop이 필요
+- 단계별 비용·기간 **추정하지 않음** — 검증 데이터 없음 (SPEC §9)
+- 본 스킬의 출력은 "Tier 3임을 식별하고 AWS Docs와 파일럿 권고로 이어지는 bridge" 역할에 국한
+- 구체적 "어떻게 분해할지"는 사용자·도메인 전문가 의사결정 영역이며, 본 스킬은 그 **결정에 필요한 AWS 서비스 선택지만 제시**
+
+## 6. Citations
+
+### AWS 공식문서
+
+- [AWS prescriptive guidance — Monolith deconstruction strategy](https://docs.aws.amazon.com/prescriptive-guidance/latest/modernization-aspnet-web-services/monolith-deconstruction-strategy.html)
+- [AWS prescriptive guidance — Strangler Fig in data persistence](https://docs.aws.amazon.com/prescriptive-guidance/latest/modernization-data-persistence/strangler-fig.html)
+- [AWS prescriptive guidance — Decompose monoliths into microservices](https://docs.aws.amazon.com/prescriptive-guidance/latest/modernization-decomposing-monoliths/welcome.html)
+- [AWS Well-Architected — Microservices implementation](https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html)
+
+### 내부 cross-refs
+
+- [patterns-tier3-data.md](patterns-tier3-data.md) — 데이터 레이어 이행 패턴
+- [patterns-tier2-api.md](patterns-tier2-api.md) — 추출된 서비스의 타겟 패턴
+- [tradeoffs-compute.md](tradeoffs-compute.md) — 서비스별 compute 트레이드오프
+- [tradeoffs-event-driven.md §1, §4](tradeoffs-event-driven.md) — EventBridge · Step Functions 선택
+- [serverless-lens.md](serverless-lens.md) — 원칙 3 "Share nothing", 원칙 5 "State machines", 원칙 6 "Events", 원칙 7 "Design for failures"
+
+### 외부 참조 (스킬 사용자 자율 판단)
+
+- [Martin Fowler — StranglerFigApplication](https://martinfowler.com/bliki/StranglerFigApplication.html)
+- [Martin Fowler — BranchByAbstraction](https://martinfowler.com/bliki/BranchByAbstraction.html)
+
+---
+
+*본 문서는 Tier 3 의사결정 시 **안전 경고 + 원칙 정리 + AWS Docs 링크**의 세 가지 역할에 한정. 실제 이행은 조직별 파일럿을 전제로 한다.*
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/serverless-lens.md b/plugins/workflow/skills/serverless-migration-advisor/references/serverless-lens.md
new file mode 100644
index 0000000..b827d03
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/serverless-lens.md
@@ -0,0 +1,86 @@
+> **Snapshot date**: 2026-04-18
+> **Lens publication**: 2022-07-14 (AWS 문서 최근 개정 확인 필요 — RESEARCH §12)
+> **Description**: AWS Well-Architected Serverless Lens 설계원칙
+
+# AWS Well-Architected Serverless Applications Lens
+
+AWS Well-Architected Framework의 서버리스 특화 보완 문서. 본 스킬은 Lens를 **원칙 수준 체크포인트**로 활용하여 Phase 4 타겟 추천·Tradeoff Dossier·Delegation 섹션의 근거로 인용한다.
+
+## 1. 7 Design Principles
+
+> **수집 노트**: 원래 "9 design principles"를 기대했으나, 2026-04-18 기준 AWS Serverless Lens 공식문서(`general-design-principles.html`)는 **7개 원칙**만 공식 수록. `design-principles.html` 엔드포인트는 빈 페이지로 리다이렉트됨. 아래 표는 현행 공식 문서의 원문 제목(verbatim)과 요약(원문 1문장) 인용. [RESEARCH §12]
+
+| # | Principle (원문) | 요약 (원문) | 본 스킬에서의 활용 |
+|---|-----------------|------------|-------------------|
+| 1 | Speedy, simple, singular | Functions are concise, short, single-purpose, and their environment may live up to their request lifecycle. | Phase 2 워크로드 특성 평가 기준 (함수 단위 분해 가능성) |
+| 2 | Think concurrent requests, not total requests | Serverless applications take advantage of the concurrency model, and tradeoffs at the design level are evaluated based on concurrency. | Phase 3 RPS → Lambda 동시성 쿼터 매핑 |
+| 3 | Share nothing | Function runtime environment and underlying infrastructure are short-lived, therefore local resources such as temporary storage is not guaranteed. | Phase 2 상태 저장성 평가 (S3/DynamoDB 위임 트리거) |
+| 4 | Assume no hardware affinity | Underlying infrastructure may change. Use code or dependencies that are hardware-agnostic. | Phase 4 타겟 런타임 선정 (GPU/특수 CPU 의존은 비적합) |
+| 5 | Orchestrate your application with state machines, not functions | Chaining Lambda executions within the code to orchestrate the workflow of your application results in a monolithic and tightly coupled application. Instead, use a state machine to orchestrate transactions and communication flows. | Phase 4 Step Functions 도입 권고 근거 |
+| 6 | Use events to trigger transactions | Events such as writing a new Amazon S3 object or an update to a database allow for transaction execution in response to business functionalities. | Phase 4 EventBridge/SQS 기반 이벤트 드리븐 전환 근거 |
+| 7 | Design for failures and duplicates | Operations triggered from requests or events must be idempotent, as failures can occur and a given request or event can be delivered more than once. | Phase 4 멱등성 요구 (Spot 인터럽트 재시도와 결합) |
+
+**주**: 본래 9개였으나 2026-04-18 기준 AWS 공식 페이지는 7개만 유지. 변경 추적은 `references/serverless-lens.md`의 Snapshot date와 함께 수동 리뷰. [AWS Docs]
+
+## 2. 5 Pillars — Serverless 특화 Best Practice
+
+Well-Architected 5대 기둥의 서버리스 특화 질문·베스트프랙티스 요지.
+
+### 2.1 Operational Excellence [AWS Docs §WA-OPS]
+- 배포 자동화: IaC + canary/linear 배포 + 자동 롤백
+- 모니터링: 함수 단위 메트릭 + X-Ray 분산 추적 + 구조화 로깅
+- 경고: 실패율·p99 지연·동시성 포화에 기반한 알람
+
+### 2.2 Security [AWS Docs §WA-SEC]
+- IAM **최소권한** — 함수 단위 역할 분리
+- 이벤트 소스 **검증** — API Gateway WAF·EventBridge event pattern·SQS VisibilityTimeout
+- 비밀 관리 — Secrets Manager / Parameter Store, 환경변수 평문 금지
+
+### 2.3 Reliability [AWS Docs §WA-REL]
+- **멱등성 처리** (원칙 #7과 직결)
+- DLQ / redrive — SQS·EventBridge·Lambda 모두 지원
+- 재시도 정책 — Lambda 비동기 2회·EventBridge 최대 185회 [tradeoffs-event-driven.md §1.2]
+- 멀티 AZ 전제 (Lambda/Aurora SV2/DynamoDB 모두 기본)
+
+### 2.4 Performance [AWS Docs §WA-PERF]
+- **콜드 스타트 완화**: Provisioned Concurrency / SnapStart (배타) [tradeoffs-compute.md §1.2]
+- 메모리·CPU 튜닝 (1,769 MB = 1 vCPU) [AWS Docs]
+- 연결 풀링 — RDS Proxy, HTTP keepalive
+- 캐싱 — API Gateway cache, DAX, ElastiCache
+
+### 2.5 Cost [AWS Docs §WA-COST]
+- **HUGI (Hurry Up and Get Idle)** — billable ≠ wall clock [tradeoffs-spot.md §4]
+- **서비스 우선, 서버 아님** — 관리형 서비스로 고정비 제거
+- 데이터 전송 비용: 동일 AZ 배치, CloudFront 활용
+- 검증 사례: `autoresearch` 48 실험 $3.94, `openclaw` ~$1/월 [Case: autoresearch] [Case: openclaw]
+
+## 3. 본 스킬과 Lens의 매핑
+
+본 스킬의 5 Phase 출력물에서 Lens 원칙을 직접 인용하여 의사결정 근거로 제시.
+
+| 본 스킬 Phase | 활용 원칙 | 활용 형태 |
+|--------------|-----------|-----------|
+| Phase 1 (워크로드 분류) | #1 Speedy, simple, singular / #3 Share nothing | 함수 단위 분해 가능 여부 인터뷰 질문의 이론 근거 |
+| Phase 3 (제약 심층) | #2 Think concurrent requests | RPS → Lambda 동시성 쿼터 매핑 질문 |
+| Phase 4 (타겟 추천) | #2 Services not servers (Pillar Cost) · #4 Use purpose-built data stores | 매핑 테이블의 **선택 근거** 칼럼 |
+| Phase 4 (Step Functions 권고) | #5 Orchestrate with state machines | Lambda 체이닝 탈피 권고의 정당화 |
+| Phase 4 (EventBridge/SQS) | #6 Use events to trigger transactions | 이벤트 드리븐 전환 권고 |
+| Tradeoff Dossier | #7 Design for failures and duplicates | 멱등성·DLQ 리스크 섹션의 필수 체크 |
+| Delegation | #1 Speed up your development cycle (Pillar Operational Excellence) | IaC 자동화 제안의 이론 근거 |
+
+## 4. Snapshot 갱신 정책
+
+- **트리거**: Lens 공식 페이지 개정, 원칙 추가·삭제, 또는 6개월 주기 리뷰.
+- **갱신 대상**: 본 파일 상단의 `Lens publication` 및 `Snapshot date`, §1 표 원문, RESEARCH §12.
+- **버전 추적**: Git log로 이 파일의 변경 이력 관리. SPEC §14 오픈 질문 4번 항목.
+
+## 5. Reference links
+
+- [AWS Well-Architected Framework](https://aws.amazon.com/well-architected/) [AWS Docs]
+- [Serverless Applications Lens — Welcome](https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/welcome.html) [AWS Docs]
+- [Design Principles (general)](https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/general-design-principles.html) [AWS Docs]
+- [The Pillars of the Well-Architected Framework (serverless lens)](https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/the-pillars-of-the-well-architected-framework.html) [AWS Docs]
+
+---
+
+*본 파일은 `aws-well-architected` 스킬(리뷰 관점)과 역할 분리됨 — 본 스킬은 **이행 관점**에서 Lens를 체크포인트로 사용. SPEC §13.*
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/source-insights.md b/plugins/workflow/skills/serverless-migration-advisor/references/source-insights.md
new file mode 100644
index 0000000..c545af3
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/source-insights.md
@@ -0,0 +1,162 @@
+> **Snapshot date**: 2026-04-18
+> **Numbering rule**: autoresearch insights are `#1-#15` (stable). openclaw principles are prefixed `#O1-#O6`. Future additions only extend (never renumber).
+> **Description**: 번호화된 검증 인사이트 (Insight #N 인용 대상)
+
+# Source Project Insights
+
+본 파일은 **인용 앵커**다. 다른 references와 SKILL.md, 그리고 스킬이 생성하는 리포트는 이 번호를 통해 검증 근거를 추적한다.
+
+## From serverless-autoresearch
+
+Source: `/Users/dohyunjung/Workspace/roboco-io/research/serverless-autoresearch/docs/insights.md` @ commit `5435b374`
+
+### Insight #1 — Spot Capacity Varies Dramatically by Region
+
+동일 인스턴스 타입(g7e)이 리전별로 Spot placement score 1 (거의 불가능, 30분+ 대기)에서 9 (즉시 할당, ~2분 시작)까지 극단 편차를 보였다. us-west-2는 1-2, us-east-1은 9. 리전을 선택하기 전 `aws ec2 get-spot-placement-scores` 실행은 사실상 필수 절차다.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: tradeoffs-spot.md §1, patterns-tier1-batch.md Pattern 1.1, case-study-autoresearch.md
+
+### Insight #2 — Larger Instances Can Be Cheaper on Spot
+
+g7e.8xlarge ($0.93/hr)가 g7e.2xlarge ($0.94-$1.82/hr)보다 싼 경우가 us-west-2에서 관측됨. 큰 인스턴스는 Spot 수요가 낮아 가격이 역전된다. "사이즈가 크면 더 비싸다"는 가정을 금지하고 Spot price history를 직접 확인해야 한다.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: tradeoffs-spot.md §1, patterns-tier1-batch.md
+
+### Insight #3 — DEVICE_BATCH_SIZE ≠ Token Throughput (hardware-dependent — see also #13)
+
+TOTAL_BATCH_SIZE를 고정한 채 DEVICE_BATCH_SIZE만 64→128로 2배 늘려도 토큰 처리량은 불변(gradient accumulation step이 4→2로 감소할 뿐). L40S+SDPA에서는 오히려 val_bpb가 1.065→1.081로 악화. 동일 스왑이 H100+FA3에서는 개선된다는 점(#13)과 함께 "하드웨어별 상반된 방향" 패턴을 보여주는 사례.
+
+- **Tier**: 1 (배치·훈련 — 훈련-특화)
+- **Cited by**: case-study-autoresearch.md, source-insights.md #13 (cross-ref)
+
+### Insight #4 — Flash Attention 3 is GPU-Architecture Specific
+
+FA3 pre-compiled 커널은 Hopper (sm_90) / Ampere (sm_80/86)만 지원. Ada Lovelace (sm_89, L40S)는 미지원으로 런타임 CUDA 오류. 해결책은 compute capability 검사 + PyTorch SDPA fallback (SDPA ~20% MFU vs FA3 ~40% MFU — 효율은 절반).
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: case-study-autoresearch.md (하드웨어 주의), patterns-tier1-batch.md
+
+### Insight #5 — SageMaker Startup Overhead is Significant
+
+SageMaker Training Job당 ~3분 startup overhead (인스턴스 할당 + 컨테이너 pull + 데이터 다운로드 + pip install). 5분 훈련 잡이면 **60%가 오버헤드**. 완화 방법 3가지: (1) 멀티-GPU 인스턴스로 N 실험을 1 잡에 합침, (2) pip 대신 Docker 이미지에 의존성 사전 빌드, (3) SageMaker warm pool (단, 추가 비용).
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: tradeoffs-compute.md §2, patterns-tier1-batch.md Pattern 1.1, case-study-autoresearch.md
+
+### Insight #6 — Quota Management is a First-Class Concern
+
+GPU Spot 쿼터는 신규 인스턴스 타입에 대해 기본값 0. g7e는 분 단위 자동 승인되지만 p5/p6은 수동 검토 (CASE_OPENED, 며칠 소요). 다중 리전에서 쿼터를 사전 요청하는 것이 마이그레이션 블록 방지의 전제 조건.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: tradeoffs-compute.md §2, patterns-tier1-batch.md Pattern 1.1 Stage 0 체크리스트
+
+### Insight #7 — SageMaker Profiler Doesn't Support All Instance Types
+
+`ml.g7e` 인스턴스는 잡 생성 시 `ValidationException: Profiler is currently not supported` 오류. PyTorch Estimator에 `disable_profiler=True` 설정으로 해결.
+
+- **Tier**: 1 (배치·훈련, 운영 원칙)
+- **Cited by**: patterns-tier1-batch.md Pattern 1.1 Stage 0 체크리스트
+
+### Insight #8 — The Parallel Evolution Approach Works
+
+파이프라인이 candidate 생성, 병렬 Spot 잡 제출, 결과 수집, best 선정까지 autonomous 동작을 검증. 4 병렬 실험이 총 $0.066, ~10분 wall clock (us-west-2 Spot wait time 제외) — parallel autonomous HPO가 실제로 cost-efficient함을 입증.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: patterns-tier1-batch.md Pattern 1.2 (병렬 Batch), case-study-autoresearch.md
+
+### Insight #9 — PyArrow Version Matters
+
+SageMaker DLC에는 pyarrow 23.x가 설치되지만 로컬 환경이 구버전이면 parquet 파일 읽기 시 `Repetition level histogram size mismatch`. `requirements-train.txt`에 `pyarrow>=21.0.0` 강제가 해결책.
+
+- **Tier**: 1 (배치·훈련, 운영 원칙)
+- **Cited by**: case-study-autoresearch.md (하드웨어·버전 주의)
+
+### Insight #10 — config.yaml Should Never Be in Git
+
+config.yaml에 AWS role ARN, 프로필, 리전 등 환경 특화·민감 정보 포함. gitignore 처리 + `config.yaml.example` 템플릿 제공이 원칙.
+
+- **Tier**: 1 (배치·훈련, 운영 원칙)
+- **Cited by**: case-study-autoresearch.md (운영 원칙), patterns-tier1-batch.md (Stage 0 IAM)
+
+### Insight #11 — Spot GPUs Are Valid Proxies for Large-Scale Training
+
+연구 확인: cheaper GPU (L40S)에서의 HPO가 expensive GPU (H100)로 잘 전이됨. 전이되는 것 — optimizer 선택 (Muon vs AdamW) 상대 순위, 아키텍처 결정 (깊이·너비·어텐션), LR schedule 모양 (코사인, warmup 비율), 상대 하이퍼파라미터 순위. 전이되지 않는 것 — 절대 val_bpb, 최적 batch size (VRAM 의존), memory-dependent 최적화 (FA3·FP8), 절대 LR 값. Phase 1은 Spot L40S로 $0.04/실험 수준, Phase 2는 H100에서 final run.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: case-study-autoresearch.md (인터럽트율 설명과 결합)
+
+### Insight #12 — DEVICE_BATCH_SIZE ≠ More Training (L40S-specific; reversed on H100)
+
+L40S+SDPA: BS 64→128 스왑이 val_bpb 악화 (1.065→1.081). 오로지 gradient accumulation step을 4→2로 줄일 뿐 total token은 불변. H100+FA3: 동일 스왑이 val_bpb 개선 (1.0016→0.9951, -0.0065). 하드웨어+어텐션 커널의 조합이 batch size 방향을 뒤집는다. L40S에서는 TOTAL_BATCH_SIZE를 늘려야 throughput 확보.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: case-study-autoresearch.md (하드웨어별 튜닝 주의), source-insights.md #13
+
+### Insight #13 — Batch Size × LR × Hardware Interact — Evolved LRs Can Be BS-Specific
+
+H100 BS=64에서 5-세대 진화로 찾은 "최적" LR (EMBEDDING 0.7091→0.6433, UNEMBEDDING 0.003369→0.004206)을 BS=128에 적용하자 **악화**. 원본 L40S-진화 LR (0.7091/0.003369)을 BS=128과 조합하면 val_bpb 0.9951 (upstream 0.998 이하) 달성. Effective LR은 BS에 스케일한다 — Phase-2의 LR 조정이 BS=64의 noisy gradient를 보상하던 것이 BS 배증 시 overcorrection이 됨. 규칙: LR과 BS는 **함께** 진화, 하드웨어·BS 변경 시 LR 재방문 필수. **비용 레슨**: 고정한 가정(BS) 점검이 $0.16 1회 실험에 -0.0065 val_bpb, 20회 LR 탐색 $3에 -0.0014 — **100× cost-efficient**.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: case-study-autoresearch.md, patterns-tier1-batch.md Pattern 1.1 트레이드오프
+
+### Insight #14 — Cheap-GPU-Evolved LRs Transfer to Expensive GPUs — Sometimes Better Than Re-Evolving
+
+L40S Spot ($0.40, 24 실험)에서 찾은 LR을 H100에 BS=128 upstream으로 적용하여 **upstream baseline 이하** 달성 (0.9951 < 0.998). Phase-2 H100-native LR 재진화는 오히려 악화 (잘못된 BS 주변에서 탐색). 하이퍼파라미터 *순위*는 하드웨어-독립, *절대 값*은 BS와 더 강하게 결합. 규칙: 값싼 Spot GPU (L40S/A10G)로 Phase-1 LR/아키텍처 탐색, expensive GPU로의 final run은 **cheap-GPU-evolved config 그대로** 먼저 시도 — 강한 시작점이며 종종 최종 답.
+
+- **Tier**: 1 (배치·훈련)
+- **Cited by**: case-study-autoresearch.md (phase 1→2 전이 서사)
+
+### Insight #15 — Serverless Spot Can Match or Beat Dedicated H100 Results at 44–150× Lower Cost
+
+Karpathy upstream H100 autoresearch (val_bpb ~0.998)을 **단일 ml.p5.4xlarge Spot run 229초, ~$0.16**로 재현·소폭 상회 (0.9951). 비교표: Karpathy H100 8h continuous, val_bpb ~0.998, $7-24 vs 이 실험 H100 Spot 229s billable, val_bpb 0.9951, $0.16 — **비용 44-150× 절감, wall clock ~24× 단축**. 단서: 단일 실행 값으로 통계적 우위 주장은 재실행 ±σ 필요. 다만 보고값을 **1-2% 비용**으로 매치·상회한 것 자체가 의미있는 서버리스-ML 결과. 규칙: <30분 잡 재현에서 Spot+HUGI는 strict improvement; 그 이상은 인터럽트 리스크 시작.
+
+- **Tier**: 1 (배치·훈련, 핵심 서사)
+- **Cited by**: case-study-autoresearch.md, tradeoffs-compute.md §2, patterns-tier1-batch.md Pattern 1.1 (비용 범위), tradeoffs-spot.md §4 (HUGI)
+
+## From serverless-openclaw
+
+Source: github.com/serithemage/serverless-openclaw (alpha), README 2026-04-18 snapshot
+
+### Insight #O1 — Lambda Container + Fargate Spot dual compute fallback
+
+기본 Lambda Container로 zero-idle · 1.35초 cold start, 15분 초과·고부하 세션은 ECS Fargate Spot fallback. 컴퓨트 비용 **70% 절감** 달성. Fargate Spot의 자동 On-Demand fallback이 없다는 AWS 문서 제약을 **Lambda를 primary로 두는 이중 구성**으로 우회.
+
+- **Tier**: 2 (상시형 API)
+- **Cited by**: patterns-tier2-api.md Pattern 2.1/2.2, tradeoffs-compute.md §3, case-study-openclaw.md
+
+### Insight #O2 — API Gateway over ALB
+
+ALB 고정비 **$18-25/월**을 API Gateway pay-per-use로 대체. 저트래픽·버스트 패턴에서 결정적 이익 — idle 시 $0, 요청당 $3.5/M (REST) 또는 $1.0/M (HTTP API). 규제·L7 라우팅 요구가 없다면 기본 선택.
+
+- **Tier**: 2 (상시형 API)
+- **Cited by**: patterns-tier2-api.md Pattern 2.1, case-study-openclaw.md
+
+### Insight #O3 — EventBridge scheduled pre-warming
+
+액티브 시간대에만 EventBridge Scheduler cron으로 Lambda를 주기 호출 → **0초 first-response** 달성. 24/7 Provisioned Concurrency (월 ~$15+) 대신 월 **~$0.07** 추가만으로 cold start 페널티 제거. 비액티브 시간에는 idle 유지하여 zero-idle 원칙 보존.
+
+- **Tier**: 2 (상시형 API)
+- **Cited by**: patterns-tier2-api.md Pattern 2.1, patterns-tier1-batch.md Pattern 1.3 (수치 참조), case-study-openclaw.md
+
+### Insight #O4 — S3/DynamoDB session persistence for stateless Lambda
+
+상태 없는 Lambda에서 **세션 지속성** 확보. DynamoDB가 session index·메타데이터 (On-Demand 모드로 zero-idle), S3가 payload·대화 이력 (동시성 제어). Serverless Lens 원칙 3 "Share nothing"을 실현하는 표준 패턴.
+
+- **Tier**: 2 (상시형 API)
+- **Cited by**: patterns-tier2-api.md Pattern 2.3, tradeoffs-data-layer.md §2, case-study-openclaw.md
+
+### Insight #O5 — Free-tier first cost target
+
+개인 사용 **$1-2/월** (Free Tier 내 $0.23) 목표. 모든 구성 요소 (API Gateway, Lambda, DynamoDB On-Demand, S3, CloudFront, EventBridge)에 **zero-idle + per-request 청구** 원칙 적용한 결과. 사이드 프로젝트·개인 도구의 비용 타겟 근거.
+
+- **Tier**: 2 (상시형 API)
+- **Cited by**: case-study-openclaw.md, tradeoffs-compute.md §1 (Lambda zero-idle 근거)
+
+### Insight #O6 — CloudFront + S3 for web UI
+
+정적 웹 UI (React SPA · Next.js export)를 S3 버킷 + CloudFront 배포로 서빙. EC2/Lambda 런타임 없이 **0 idle 비용** 달성. Lambda+API Gateway의 동적 경로와 분리하여 static vs dynamic 경로 비용 구조를 분리.
+
+- **Tier**: 2 (상시형 API)
+- **Cited by**: patterns-tier2-api.md (정적 UI 포함 패턴), case-study-openclaw.md
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-compute.md b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-compute.md
new file mode 100644
index 0000000..669c5d2
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-compute.md
@@ -0,0 +1,162 @@
+> **Snapshot date**: 2026-04-18
+> **Description**: Lambda / Fargate / Batch / SageMaker / EC2 Spot 공식 트레이드오프
+
+# Compute Service Tradeoffs
+
+각 서비스의 AWS 공식 한계와 이행 관점의 함의. 모든 수치는 `[AWS Docs]` 또는 `[Insight #N]`로 traceable.
+상세 Spot 공통 주제(인터럽트 신호·용량 선택·HUGI)는 [tradeoffs-spot.md](tradeoffs-spot.md)로 분리.
+
+## 1. AWS Lambda
+
+Primary citation: [AWS Docs — Lambda quotas](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html)
+
+### 1.1 정량 한계 (운영 판단에 중요한 항목만)
+
+| 항목 | 값 | Increasable | 출처 |
+|------|-----|-------------|------|
+| 함수 타임아웃 | 900초 (15분) | 아니오 | [AWS Docs] |
+| 함수 메모리 | 128-10,240 MB (1MB 단위) | 아니오 | [AWS Docs] |
+| 1 vCPU 등가 메모리 | 1,769 MB | - | [AWS Docs] |
+| 동시 실행 쿼터 | 1,000 | 예 (수만까지) | [AWS Docs] |
+| 지속 실행(durable) | 1,000,000 | 예 | [AWS Docs] |
+| 동시성 스케일링 한도 | 10초마다 +1,000 실행환경 | 아니오 | [AWS Docs] |
+| 요청·응답 페이로드 (동기) | 6 MB · 6 MB | 아니오 | [AWS Docs] |
+| 스트리밍 응답 | 최대 200 MB | 아니오 | [AWS Docs] |
+| 스트리밍 대역폭 | 처음 6MB 무제한, 이후 2 MB/s | 아니오 | [AWS Docs] |
+| 비동기 요청·응답 | 1 MB | 아니오 | [AWS Docs] |
+| zip 배포 | 50 MB (업로드) / 250 MB (unzipped, layer 포함) | 아니오 | [AWS Docs] |
+| 컨테이너 이미지 | 10 GB (unzipped) | 아니오 | [AWS Docs] |
+| `/tmp` | 512 MB-10,240 MB (1MB 단위) | 아니오 | [AWS Docs] |
+| 환경변수 총합 | 4 KB | 아니오 | [AWS Docs] |
+| Layer | 함수당 5개 | 아니오 | [AWS Docs] |
+
+### 1.2 이행 관점 함의
+
+- **15분 상한이 Tier 1 배치 분기의 기준선** — 15분을 넘기면 Step Functions 분해 또는 AWS Batch/Fargate로 위임. [AWS Docs]
+- **메모리 = CPU 연동**: 1,769 MB에서 1 vCPU 등가. 지연 최적화 시 메모리 상향은 CPU 상향과 **비용을 동시에** 상향시킨다. [AWS Docs]
+- **동시성 1,000 기본값 vs API Gateway 10,000 RPS 기본값 불일치** — 부하 테스트 전에 증액 요청 필수. [AWS Docs]
+- **콜드 스타트 완화 수단은 서로 배타적**: Provisioned Concurrency 또는 SnapStart 중 택일 (SnapStart는 Java/Python 3.12+/.NET 8+ 지원, 컨테이너 이미지 불가). [AWS Docs — Lambda SnapStart] ([RESEARCH §13])
+- **6MB 동기 페이로드**가 REST API의 실질 상한 — 대용량 응답은 스트리밍(200MB) 또는 S3 presigned URL로 우회. [AWS Docs]
+- **컨테이너 이미지 10GB** 허용으로 대형 의존성(Java, ML) 배포는 가능하되 SnapStart 미지원이라 트레이드오프 존재. [AWS Docs]
+
+### 1.3 When to choose / When to avoid
+
+| 적합 | 부적합 |
+|------|--------|
+| 이벤트 기반 짧은 핸들러 (≤15분) | 장시간 배치·훈련 |
+| Spiky 트래픽 (Zero-idle 목표) [Case: openclaw] | 지속 연결 (WebSocket long-lived), gRPC streaming |
+| 동시 요청 < 1,000 · 증액 가능 | 6MB 초과 동기 응답 |
+| Node/Python/Java/.NET 함수형 워크로드 | GPU·특수 CPU 의존 (→ SageMaker/EC2) |
+
+## 2. SageMaker Managed Spot Training
+
+Primary citation: [AWS Docs — SageMaker Managed Spot Training](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html)
+
+### 2.1 정량
+
+| 항목 | 값 | 출처 |
+|------|-----|------|
+| 절감 한도 | on-demand 대비 **최대 90%** | [AWS Docs] |
+| 절감률 공식 | `(1 - BillableTimeInSeconds / TrainingTimeInSeconds) × 100` | [AWS Docs] |
+| 예시 | Billable=100s, Training=500s → 절감 80% | [AWS Docs] |
+| 필수 조건 | `MaxWaitTimeInSeconds > MaxRuntimeInSeconds` | [AWS Docs] |
+| 체크포인트 미사용 (내장·마켓플레이스 알고리즘) | `MaxWaitTime ≤ 3600s` | [AWS Docs] |
+| Hyperparameter Tuning 호환 | 지원 | [AWS Docs] |
+
+### 2.2 상태 전이 예
+
+- 인터럽트 없음: `Starting → Downloading → Training → Uploading` [AWS Docs]
+- 1회 인터럽트 후 재개: `Starting → Downloading → Training → Interrupted → Starting → Downloading → Training → Uploading` [AWS Docs]
+- MaxWait 초과 종료: `Stopped: MaxWaitTimeExceeded` [AWS Docs]
+- Spot 미획득: `Starting → Stopping → Stopped: MaxWaitTimeExceeded` [AWS Docs]
+
+### 2.3 이행 관점 함의
+
+- **Billable은 Training 시간만 포함** — Starting/Downloading 구간은 과금되지 않음. HUGI 원칙의 AWS 공식 구현. [AWS Docs]
+- **체크포인트는 S3↔로컬 자동 동기화**이므로 짧은 잡이 아니면 반드시 활성화. [AWS Docs]
+- **시작 오버헤드 ~3분**은 실제 절감률을 훼손 — 5분 훈련 잡의 60%가 오버헤드. 실험을 합치거나 warm pool 사용. [Insight #5]
+- **검증 사례**: 48 실험 $3.94, H100 229s $0.16 vs upstream H100 8h $7~24. [Case: autoresearch] [Insight #15]
+- **쿼터는 First-class concern** — GPU Spot 기본 0, p5/p6은 수동 검토. 리전별·인스턴스별 사전 요청. [Insight #6]
+
+## 3. AWS Fargate + Fargate Spot
+
+Primary citation: [AWS Docs — Fargate capacity providers](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html)
+
+### 3.1 핵심 동작
+
+| 항목 | 값 | 출처 |
+|------|-----|------|
+| 인터럽트 경고 | 2분 (EventBridge + SIGTERM) | [AWS Docs] |
+| `stopTimeout` | 기본 30초, 최대 120초 | [AWS Docs] |
+| 용량 부족 시 시작 | 지연만 발생, **자동 On-Demand 전환 없음** | [AWS Docs] · [Insight #O1] |
+| 서비스 단위 운영 | 인터럽트 시 스케줄러가 추가 태스크 시작 시도 | [AWS Docs] |
+| 단일 태스크 운영 | 용량 확보까지 **중단** — 가용성 위험 | [AWS Docs] |
+
+### 3.2 이행 관점 함의
+
+- **자동 fallback 없음**을 용량공급자 혼합(`FARGATE` + `FARGATE_SPOT` weight)으로 설계 시점에 해결해야 함. [Insight #O1] openclaw는 Lambda Container를 기본으로 두고 Fargate Spot을 fallback으로 배치하여 이 문제를 우회. [Case: openclaw]
+- **서비스 + `desiredCount ≥ 2`가 단일 배포의 최소 조건** — 단일 태스크 + Fargate Spot은 가용성 측면에서 안티패턴. [AWS Docs]
+- **SIGTERM 핸들러 의무** — 2분 내 드레인·상태 저장 실패 시 데이터 손상 위험. [AWS Docs]
+- **정전 이벤트 JSON**은 `stopCode: "SpotInterruption"`으로 관측 — CloudWatch 알람·자동 교체 로직에 활용. [AWS Docs]
+
+## 4. Amazon EC2 Spot
+
+Primary citation: [AWS Docs — EC2 Spot Instance interruptions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html)
+
+### 4.1 인터럽트 사유 3종
+
+| 사유 | 설명 | 출처 |
+|------|------|------|
+| Capacity | EC2가 용량을 다시 필요로 할 때 — **주 원인** | [AWS Docs] |
+| Price | 최대가 지정 시 Spot 가격이 초과하면 | [AWS Docs] |
+| Constraint | launch group / AZ group 제약 충족 불가 시 | [AWS Docs] |
+
+### 4.2 인터럽트 동작 비교
+
+| 동작 | 특성 | 조건 |
+|------|------|------|
+| `terminate` (기본) | 인스턴스 종료 | 항상 가능 |
+| `stop` | EBS 보존, 재시작은 EC2만 가능 | **persistent 요청 전용** |
+| `hibernate` | 메모리 상태 보존, 복원 즉시 시작 (2분 경고 없음) | 인스턴스 패밀리·AMI 지원 필요 |
+
+### 4.3 이행 관점 함의
+
+- **최대가 지정은 안티패턴** — 인터럽트 빈도가 증가하기 때문에 기본값(On-Demand 가격 상한) 유지가 최선. [AWS Docs]
+- **Rebalance recommendation 신호**는 2분 경고보다 먼저 발생 — ASG와 결합하면 사실상 더 긴 drain 여유 확보. [AWS Docs]
+- **CloudTrail `BidEvictedEvent`** + EventBridge로 사후 추적. [AWS Docs]
+- **AWS FIS로 사전 검증**: 인터럽트 주입 테스트를 운영 전 파이프라인에 포함. [AWS Docs — FIS]
+- **다중 AZ + 다중 인스턴스 타입 + ASG**의 3종 세트가 운영 최소 요건. [AWS Docs] · [Insight #1]
+- **long-running GPU 작업**은 `hibernate`로 체크포인트 비용 절감 — 단 인스턴스 패밀리 제약 확인 필수. [AWS Docs]
+
+## 5. AWS Batch (with Spot)
+
+Primary citation: [AWS Docs — AWS Batch with Spot](https://docs.aws.amazon.com/batch/latest/userguide/spot.html)
+
+### 5.1 3가지 할당 전략 비교
+
+| 전략 | 특성 | Spot 권장도 | 출처 |
+|------|------|-------------|------|
+| `BEST_FIT` | 최소 가용 인스턴스만 선택 | 낮음 (인터럽트률 높음) | [AWS Docs] |
+| `BEST_FIT_PROGRESSIVE` | 용량 부족 시 상위 인스턴스로 승격 | 중간 | [AWS Docs] |
+| `SPOT_CAPACITY_OPTIMIZED` | 인터럽트 가능성 최소화 | **Spot 표준** | [AWS Docs] |
+
+### 5.2 이행 관점 함의
+
+- **`SPOT_CAPACITY_OPTIMIZED` = Spot 기본값** — 다른 전략은 비용보다 가용성이 열악할 수 있음. [AWS Docs]
+- **`retryStrategy.attempts ≥ 2` + `evaluateOnExit`** 조합으로 인터럽트 재시도와 실제 오류를 구분. [AWS Docs]
+- **큐 우선순위 패턴**: Spot compute-env 우선 + On-Demand fallback 큐 — 배치 워크로드에서의 표준 안전망. [AWS Docs]
+- **SIGTERM 핸들러 + 체크포인트**가 컨테이너 이미지 수준에서 준비돼야 함. [AWS Docs]
+- **적합 워크로드**: 배치, ML 훈련, CI/CD 등 fault-tolerant · retryable. **부적합**: 프로덕션 API · DB · 엄격 SLA 작업. [AWS Docs]
+
+## 6. When to choose what (decision matrix)
+
+| 워크로드 패턴 | 추천 서비스 | 근거 |
+|---------------|-------------|------|
+| ≤15분 단일 처리 · 이벤트 트리거 | Lambda | 15분 상한 [AWS Docs §1.1] |
+| 15분~수시간 배치 · retryable | AWS Batch (Spot, `SPOT_CAPACITY_OPTIMIZED`) | [AWS Docs §5] |
+| ML 훈련 (체크포인트 가능) | SageMaker Managed Spot Training | 최대 90% 절감 [AWS Docs] + 229s $0.16 검증 [Case: autoresearch] |
+| 상시 컨테이너 워크로드 (서비스형) | Fargate + Fargate Spot 혼합 (`capacityProviderStrategy`) | fallback 없음 보완 [Insight #O1] |
+| 상시 API (spiky · zero-idle 우선) | Lambda + API Gateway (primary) + Fargate Spot fallback | [Case: openclaw] [Insight #O2] |
+| 장기 GPU 작업 · 인터럽트 허용 | EC2 Spot (hibernate 또는 체크포인트) | [AWS Docs §4] |
+| 워크플로 오케스트레이션 (≤5분) | Step Functions Express | [AWS Docs — Step Functions §17] |
+| 워크플로 오케스트레이션 (≤1년) | Step Functions Standard | [AWS Docs — Step Functions §17] |
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-data-layer.md b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-data-layer.md
new file mode 100644
index 0000000..c9059e5
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-data-layer.md
@@ -0,0 +1,129 @@
+> **Snapshot date**: 2026-04-18
+> **Description**: RDS / Aurora Serverless v2 / DynamoDB / S3 Express
+
+# Data Layer Tradeoffs
+
+Tier 3 이행에서 가장 까다로운 의사결정. 실 검증 사례가 부족한 영역이므로 **AWS Docs 기반 원칙 + 공식 한계 + 리스크 경고**로 구성. 구체적인 이행 단계 패턴은 Stage D에서 작성될 [patterns-tier3-data.md](patterns-tier3-data.md)로 위임.
+
+## 1. RDS → Aurora Serverless v2
+
+Primary citation: [AWS Docs — Aurora Serverless v2](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html)
+
+### 1.1 정량 한계
+
+| 항목 | 값 | 출처 |
+|------|-----|------|
+| ACU 정의 | 1 ACU ≈ 2 GiB RAM + 상응 CPU·네트워킹 | [AWS Docs] |
+| ACU 증분 | **0.5 ACU** 단위, 초 단위 연속 측정 | [AWS Docs] |
+| ACU 범위 | 엔진·버전별 `0.5-128` → `0.5-256` → **`0-256`** (Aurora MySQL 3.08.0+ / PostgreSQL 13.15+, 14.12+, 15.7+, 16.3+) | [AWS Docs] |
+| 스케일 반응성 | 수 초 이내 온라인 스케일 (downtime 없음) | [AWS Docs] |
+| Auto-pause | min=0 설정 시 idle 후 자동 pause → 새 커넥션에 즉시 resume | [AWS Docs] |
+| v2 콜드 스타트 | v1 대비 **제거** (지속 실행 인스턴스) — 단 pause→resume 시 수 초 지연 | [AWS Docs] |
+| Multi-AZ | 지원 (Provisioned와 동일 failover) | [AWS Docs] |
+| Global Database | 지원 | [AWS Docs] |
+| RDS Proxy | 지원 (Lambda ↔ Aurora 연결 풀링 최적 조합) | [AWS Docs] |
+| 미지원 | Database Activity Streams, Cluster Cache Management (Aurora PG), Aurora Auto Scaling (reader로 대체) | [AWS Docs] |
+| Promotion Tier | 0-1은 writer와 동일 용량 추적, 2-15는 독립 스케일 | [AWS Docs] |
+
+### 1.2 이행 관점 함의
+
+- **동일 엔진(MySQL/PostgreSQL) 유지** = 마이그레이션 리스크 최소 — DDL·애플리케이션 SQL 대부분 그대로 승계. [AWS Docs]
+- **바닥 비용 함정**: `min ACU > 0` 설정 시 idle 시에도 항상 과금. 완전 zero-idle 원할 경우 min=0 + auto-pause를 선택하되 resume 지연 감수. [AWS Docs]
+- **burst 대응**: 0.5 ACU 단위의 초 단위 스케일로 Lambda 동시성 급증에 연동 가능. [AWS Docs]
+- **v1과의 차이**: v1은 2배수 ACU 스텝·수 분 단위 스케일·콜드 스타트 존재 — v2 마이그레이션의 기본 이유. [AWS Docs]
+- **혼합 운영 가능**: 동일 클러스터에 Provisioned + Serverless v2 인스턴스 동시 존재 가능. 점진 전환 경로로 유효. [AWS Docs]
+- **Lambda와의 궁합**: RDS Proxy 경유 연결 풀링 조합이 표준. Proxy 없이 직접 연결 시 커넥션 폭주 위험. [AWS Docs]
+
+## 2. RDS → DynamoDB
+
+Primary citation: [AWS Docs — DynamoDB Read/Write Capacity Mode](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html)
+
+### 2.1 On-Demand vs Provisioned
+
+| 항목 | On-Demand | Provisioned | 출처 |
+|------|-----------|-------------|------|
+| 청구 단위 | 요청당 (RRU/WRU) | 시간당 용량 예약 (RCU/WCU × hour) | [AWS Docs] |
+| 1 RRU = | 최대 4KB strong-consistent read 1회 또는 eventually-consistent read 2회 | — | [AWS Docs] |
+| 1 WRU = | 최대 1KB write 1회 | — | [AWS Docs] |
+| 스케일링 | 자동, 신규 테이블도 즉시 4,000 writes/sec + 12,000 reads/sec | Auto Scaling (수 분 반응) | [AWS Docs] |
+| 피크 대응 | 이전 피크 **2배**까지 즉시 허용 | burst capacity (5분) 완충 | [AWS Docs] |
+| 바닥 비용 | 0 (호출 없으면 $0) | min × 가동시간 | [AWS Docs] |
+| Reserved Capacity | 불가 | 1년 **최대 54%** / 3년 **최대 77%** 할인 (100 RCU/WCU 단위) | [AWS Docs] |
+| 모드 전환 제한 | Provisioned → On-Demand: 24h당 최대 4회 | On-Demand → Provisioned: 언제든 | [AWS Docs] |
+| 기본 쿼터 | 계정당 합산 **40,000 RCU/WCU**, On-Demand 테이블당 max 40,000 RCU + 40,000 WCU | 동일 | [AWS Docs] |
+| per-table max | On-Demand에 설정 가능 (비용 폭주 방지) | — | [AWS Docs] |
+| 적합 워크로드 | 예측 불가·스파이키·서버리스·신규 앱 | 안정·예측 가능·지속 고부하 + Reserved | [AWS Docs] |
+
+### 2.2 Strong vs Eventual consistency
+
+| 옵션 | RRU 비용 | 지연 | 용례 |
+|------|----------|------|------|
+| Strong-consistent read | 기본 (1 RRU / 4KB) | 약간 높음 | 금융·재고 |
+| Eventually-consistent read | **절반** (0.5 RRU / 4KB) | 낮음 | 대부분의 읽기 |
+
+이 두 모드의 선택이 DynamoDB 비용·성능 트레이드오프의 첫 번째 지렛대. [AWS Docs]
+
+### 2.3 이행 관점 함의
+
+- **데이터 모델 재설계 필수**: 관계형 JOIN 패턴은 DynamoDB 단일 테이블 설계로 재구성. 마이그레이션 리스크의 대부분이 여기서 발생. [AWS Docs]
+- **시작은 On-Demand** — 예측 불가·신규 앱의 기본 권고. Lambda 동시성 확장과 자연 매칭. [AWS Docs]
+- **정상 부하 확정 후 Provisioned로 전환**: Reserved Capacity로 On-Demand 대비 50~70% 절감 가능. 단 24h 전환 제한 고려. [AWS Docs]
+- **GSI 별도 capacity mode**: GSI를 다른 capacity mode로 설정 가능 → 조회 빈도 차이 반영. [AWS Docs]
+- **검증 사례**: openclaw는 DynamoDB에 대화 데이터를 저장하고 S3에 세션 백업을 두어 stateless Lambda의 지속성 문제를 해결. [Case: openclaw] [Insight #O4]
+- **per-table max** 설정으로 비용 폭주 방지는 운영 기본값 — On-Demand의 "무한 스케일"이 곧 "무한 비용"이 되지 않도록. [AWS Docs]
+
+## 3. S3 Standard vs S3 Express One Zone
+
+Primary citation: [AWS Docs — S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html)
+보조: [AWS Docs — Directory buckets overview](https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html)
+
+### 3.1 특성 비교
+
+| 항목 | S3 Standard | S3 Express One Zone | 출처 |
+|------|-------------|---------------------|------|
+| AZ 스코프 | 리전 내 복수 AZ | **단일 AZ** (가용성 SLA 99.95%) | [AWS Docs] |
+| 지연 시간 | 수십 ms | 한 자리 ms (~10× 저지연) | [AWS Docs] |
+| 처리량 (버킷당) | prefix 단위 확장 | 읽기 200,000 TPS / 쓰기 100,000 TPS | [AWS Docs] |
+| 버킷 타입 | General purpose | **Directory bucket** (별도 스키마) | [AWS Docs] |
+| 버킷 네이밍 | 3-63자 | `{base}--{zone-id}--x-s3` 필수 | [AWS Docs] |
+| 스토리지 구조 | flat prefix | 계층형 디렉토리 (slash로 폴더 자동 생성) | [AWS Docs] |
+| 요청 가격 | 기본 | 약 **50% 저렴** | [AWS Docs] |
+| 스토리지 가격 | 기본 | 단가 유사 — **비용 이점 주 원천은 요청 단가** | [AWS Docs] |
+| 일관성 | Strong read-after-write | Strong read-after-write | [AWS Docs] |
+| 암호화 | SSE-S3 / SSE-KMS / SSE-C | SSE-S3 / SSE-KMS (**SSE-C 미지원**) | [AWS Docs] |
+| ACL | 선택 | 항상 bucket-owner-enforced | [AWS Docs] |
+| Block Public Access | 선택 | **항상 On** | [AWS Docs] |
+| 버저닝·CRR·Lifecycle | 완전 | 제한적 (버저닝·CRR 없음) | [AWS Docs] |
+| 쿼터 | — | 계정당 리전별 디렉토리 버킷 **100개** (증가 가능) | [AWS Docs] |
+
+### 3.2 이행 관점 함의
+
+- **적합 워크로드**: 비디오 편집·ML 훈련 데이터 random access·실시간 분석·shuffle/spill·분석 중간 결과. [AWS Docs]
+- **부적합**: 장기 아카이브·다중 리전 재해복구·규제 요구 다내구성 저장·CDN 오리진·ACL 필요 워크로드. [AWS Docs]
+- **단일 AZ 제약**: AZ 장애 시 데이터 유실 가능 → 원본은 Standard/Glacier에 별도 보관, 중간 결과만 Express에. [AWS Docs]
+- **비용 역전 함정**: 저빈도 접근 시 Standard 대비 **총비용 증가** — 요청 단가 절감은 요청 빈도에 비례. [AWS Docs]
+- **AZ 동일 배치 필수**: EC2/Fargate/Lambda와 동일 AZ에서 접근해야 DTO 비용·지연 이득 실현. IaC에서 AZ 명시. [AWS Docs]
+- **CreateSession 인증 모델**: 객체 오퍼레이션 전 `s3express:CreateSession` 필요 → SDK 최신 버전 + IAM 정책 업그레이드. [AWS Docs]
+- **네이밍 규칙 자동화**: `--{zone-id}--x-s3` 패턴이 필수 — Terraform/CDK 템플릿에 강제. [AWS Docs]
+
+## 4. Decision matrix
+
+| 현재 상태 | 추천 타겟 | 리스크 | 근거 |
+|-----------|-----------|--------|------|
+| RDS MySQL/PostgreSQL (일반 OLTP) | Aurora Serverless v2 (min>0) | 낮음 | 동일 엔진 [AWS Docs §1] |
+| RDS + 심한 idle 구간 | Aurora Serverless v2 (min=0 auto-pause) | 중 (resume 지연) | [AWS Docs §1.2] |
+| Key-value access만 + 관계형 JOIN 없음 | DynamoDB On-Demand 시작 | 중 (모델 재설계) | [AWS Docs §2] |
+| 예측 가능 고부하 DynamoDB 전환 | Provisioned + Reserved Capacity | 낮음 | 최대 77% 할인 [AWS Docs §2.1] |
+| S3 Standard, 배치 읽기 과다 (훈련·shuffle) | + S3 Express 캐시층 (동일 AZ) | 낮음~중 | [AWS Docs §3] |
+| S3 Standard, CDN 오리진·규제 저장 | 유지 | — | S3 Express 부적합 [AWS Docs §3.2] |
+| 세션·대화 지속성 필요 (stateless Lambda) | DynamoDB + S3 백업 조합 | 낮음 | [Case: openclaw] [Insight #O4] |
+
+## 5. 이행 패턴 preview
+
+Tier 3 데이터 이행은 **검증 사례 없음**(SPEC §9). 원칙 수준 패턴만 존재:
+
+- Strangler Fig + Branch by Abstraction으로 경로 분기
+- CDC(Change Data Capture) 기반 점진 복제
+- 읽기 먼저 이중화 → 쓰기 이중화 → 읽기 전환 → 쓰기 전환 → 폐기
+
+→ 세부 단계는 [patterns-tier3-data.md](patterns-tier3-data.md) (Stage D에서 작성)
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-event-driven.md b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-event-driven.md
new file mode 100644
index 0000000..fb07700
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-event-driven.md
@@ -0,0 +1,138 @@
+> **Snapshot date**: 2026-04-18
+> **Description**: EventBridge / SQS / Kinesis / Step Functions
+
+# Event-Driven Tradeoffs
+
+이벤트 드리븐 이행의 4대 빌딩블록. 각 서비스의 AWS 공식 한계와 이행 관점의 선택 근거를 정리.
+
+## 1. Amazon EventBridge
+
+Primary citation: [AWS Docs — EventBridge overview](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html)
+보조: [AWS Docs — EventBridge rules](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-rules.html)
+
+### 1.1 핵심 개념
+
+| 개념 | 설명 | 출처 |
+|------|------|------|
+| Event bus | 이벤트 라우터. default / custom / partner 3종 | [AWS Docs] |
+| Pipes | 1:1 point-to-point 통합. source → (filter → enrich → transform) → target | [AWS Docs] |
+| Scheduler | 서버리스 cron/rate/one-time 스케줄러 — 레거시 scheduled rules의 후계 | [AWS Docs] |
+| Rule | event pattern 매칭 또는 스케줄 기반. **룰당 타겟 최대 5개**, 베스트프랙티스는 **1 룰 = 1 타겟** | [AWS Docs] |
+| Event pattern | JSON 기반 content filtering, schema registry 지원 | [AWS Docs] |
+| Input transformation | Matched event / part-of-event / constant JSON / input transformer 4종 | [AWS Docs] |
+
+### 1.2 신뢰성·운영
+
+| 항목 | 값 | 출처 |
+|------|-----|------|
+| 재시도 기본 이벤트 보존 기간 | 24시간 | [AWS Docs] |
+| 재시도 기본 횟수 | 최대 185회 | [AWS Docs] |
+| Dead-letter queue | SQS Standard 큐 지원 (동일 계정 / 다른 계정 모두 가능) | [AWS Docs] |
+| 지연 메트릭 | `IngestionToInvocationStartLatency` — 30초 초과 시 throttle 의심 | [AWS Docs] |
+| Throttle 메트릭 | `ThrottledRules` — 계정 aggregate TPS 초과 시 발생 | [AWS Docs] |
+
+### 1.3 이행 관점 함의
+
+- **1 룰 = 1 타겟 원칙** — 타겟 다양화 시 룰 복제가 유지보수에 유리 (BP 문서 명시). [AWS Docs]
+- **Content-based filtering**으로 Lambda 함수 레벨의 if/else 분기를 인프라 레이어로 밀어낼 수 있음 — 콜드 스타트·동시성 비용 절감. [AWS Docs]
+- **DLQ는 SQS Standard만 지원** — FIFO DLQ 불가. 순서 중요 이벤트는 EventBridge + SQS FIFO가 아닌 SQS FIFO 직접 연결 검토.
+- **Scheduler ≠ Scheduled rules** — 신규 설계는 Scheduler가 공식 권고. 레거시 cron 룰은 단계적 이전. [AWS Docs]
+- **Pipes vs Rules**: 점대점 enrichment가 필요하면 Pipes, 팬아웃은 Rule + 다중 타겟. [AWS Docs]
+
+## 2. SQS Standard vs FIFO
+
+Primary citation: [AWS Docs — SQS queue types](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-queue-types.html)
+
+### 2.1 비교
+
+| 항목 | Standard | FIFO | 출처 |
+|------|----------|------|------|
+| 처리량 | **사실상 무제한** TPS per API | 배칭 시 **3,000 msg/s** per API (300 API calls × 10 msg/batch); 배칭 없이 300 API calls/s per API | [AWS Docs] |
+| 고처리량 모드 | — | **30,000 TPS** (메시지 그룹 내 순서 완화) | [AWS Docs] |
+| 순서 보장 | Best-effort (순서 미보장 가능) | **메시지 그룹 내 FIFO** | [AWS Docs] |
+| 전달 시맨틱 | **At-least-once** (중복 가능) | **Exactly-once processing** (dedup 지원) | [AWS Docs] |
+| Dedup 창 | — | 5분 (content-based dedup 옵션) | [AWS Docs] |
+| 내구성 | 다중 AZ 복제 | 다중 AZ 복제 | [AWS Docs] |
+| Visibility timeout | 지원 | 지원 | [AWS Docs] |
+
+### 2.2 이행 관점 함의
+
+- **기본은 Standard** — 순서·중복 처리 불요 시. Lambda consumer의 멱등 설계로 중복 흡수. [AWS Docs]
+- **FIFO 선택 조건**: 순서 중요(예: 트랜잭션 로그) OR 정확히 한 번 처리(예: 결제). 처리량 한도 검토 필수. [AWS Docs]
+- **High-throughput FIFO 모드**는 메시지 그룹 단위 순서만 유지 — 그룹 수를 늘려 병렬 처리 + 그룹 내 순서 유지 패턴. [AWS Docs]
+- **SnsToSqs fan-out**: SNS → 복수 SQS 큐 조합은 여전히 유효하나, EventBridge rule + 다중 타겟으로 대체 권장 (content filtering 병행 가능). [AWS Docs]
+
+## 3. Amazon Kinesis Data Streams
+
+Primary citation: [AWS Docs — Kinesis Data Streams introduction](https://docs.aws.amazon.com/streams/latest/dev/introduction.html)
+보조: [AWS Docs — Kinesis quotas](https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html)
+
+### 3.1 정량
+
+| 항목 | 값 | 출처 |
+|------|-----|------|
+| 쓰기 (provisioned shard) | **1 MB/s 또는 1,000 records/s** per shard | [AWS Docs] |
+| 읽기 (provisioned shard) | **2 MB/s** per shard, `GetRecords` 최대 5 TPS per shard | [AWS Docs] |
+| 페이로드 최대 | **10 MiB** per record (base64 인코딩 전) | [AWS Docs] |
+| PutRecords batch | 최대 500 records, 총 10 MiB per request | [AWS Docs] |
+| Retention | 기본 **24시간**, 최대 **8,760시간 (365일)** | [AWS Docs] |
+| put-to-get 지연 | 일반적으로 1초 미만 | [AWS Docs] |
+| On-demand 기본 처리량 | 신규 스트림 4 MB/s 쓰기 / 8 MB/s 읽기 | [AWS Docs] |
+| On-demand 최대 (us-east-1/us-west-2/eu-west-1) | **10 GB/s 쓰기 / 20 GB/s 읽기** (증가 요청 가능) | [AWS Docs] |
+| On-demand 최대 (기타 리전) | 200 MB/s 쓰기 / 400 MB/s 읽기 | [AWS Docs] |
+| On-demand stream 기본 쿼터 | 계정당 **50개** (증가 가능) | [AWS Docs] |
+| Enhanced fan-out consumers | 스트림당 최대 **20** (On-demand Advantage: 50) — 각 consumer **2 MB/s** 전용 대역폭 | [AWS Docs] |
+| 모드 전환 | 스트림당 24시간에 2회 (on-demand ↔ provisioned) | [AWS Docs] |
+| 최소 retention | 24시간 (DecreaseStreamRetentionPeriod 불가) | [AWS Docs] |
+
+### 3.2 이행 관점 함의
+
+- **Shard는 순서의 경계** — shard 내부는 순서 보장, 간에는 보장 없음. 순서 유지 로직은 partition key 설계로 결정. [AWS Docs]
+- **Replay/rewind** 가능 — retention 기간 내 임의 시점부터 재처리. DynamoDB/SQS에는 없는 특성. [AWS Docs]
+- **Enhanced fan-out** 은 consumer별 2 MB/s 전용 → 다중 consumer 고처리량 시 표준 consumer의 GetRecords 경합 회피 수단. [AWS Docs]
+- **On-demand 모드는 신규 서비스의 기본값** — 트래픽 예측 어려울 때. 단 리전별 최대치 편차 존재. [AWS Docs]
+- **Kinesis vs EventBridge**: EB는 event router(비동기 fan-out, filter), Kinesis는 내구성 있는 순서 보장 스트림. 동시 사용도 일반적 (Kinesis → Pipes → EventBridge). [AWS Docs]
+
+## 4. Step Functions — Standard vs Express
+
+Primary citation: [AWS Docs — Standard vs Express](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-standard-vs-express.html)
+(상세는 [RESEARCH.md §17](../../../../../issues/3-serverless-migration/RESEARCH.md))
+
+### 4.1 비교
+
+| 항목 | Standard | Async Express | Sync Express | 출처 |
+|------|----------|---------------|--------------|------|
+| 최대 실행 시간 | **1년** | 5분 | 5분 (콘솔 60s 만료, SDK/CLI 5분) | [AWS Docs] |
+| 실행 시맨틱 | **Exactly-once** (상태 영속) | **At-least-once** (중복 가능) | **At-most-once** (재시도 없음) | [AWS Docs] |
+| 실행 이력 | API 조회 + 콘솔 시각 디버깅, **90일** (30일 축소 요청 가능) | CloudWatch Logs 활성화 필수 | CloudWatch Logs 활성화 필수 | [AWS Docs] |
+| 처리량 | state transition rate (account quota) | 초당 수만~수십만 실행 | account 용량과 분리 (자동 스케일) | [AWS Docs] |
+| 청구 | **per state transition** | per execution × duration × memory | per execution × duration × memory | [AWS Docs] |
+| 지원 통합 | 모든 서비스 + `.sync`, `.waitForTaskToken` | `.sync`, `.waitForTaskToken` 미지원 | `.sync`, `.waitForTaskToken` 미지원 | [AWS Docs] |
+| Distributed Map / Activities | 지원 | 미지원 | 미지원 | [AWS Docs] |
+| Idempotency | 동명 재실행 시 자동 idempotent | 자동 관리 없음 | 자동 관리 없음 | [AWS Docs] |
+
+### 4.2 이행 관점 함의
+
+- **Workflow type immutable** — state machine 생성 후 Standard ↔ Express 변경 불가. 설계 초기 결정. [AWS Docs]
+- **Tier 1 배치 분해**: 15분 초과 배치는 Standard로 분해. Spot 재시도 로직을 상태기계에 명시하고 exactly-once 보장 활용. [AWS Docs]
+- **Tier 2 이벤트 후처리**: Async Express — 짧은 fan-out, API 응답 후처리에 적합하지만 **중복 허용** 전제 (멱등성 필수). [AWS Docs]
+- **at-least-once 함정**: Async Express는 중복 실행 가능 → Serverless Lens 원칙 7 (Design for failures and duplicates) 준수 필수. 비멱등 작업(예: 결제)은 Standard. [AWS Docs]
+
+## 5. 선택 가이드
+
+| 요구사항 | 추천 | 근거 |
+|----------|------|------|
+| 단순 이벤트 라우팅 (AWS 서비스 → 다수 타겟) | EventBridge rule + 다중 타겟 | [AWS Docs §1] |
+| 이벤트 enrichment 필요 (source → transform → target) | EventBridge Pipes | [AWS Docs §1.1] |
+| 스케줄 기반 작업 (cron/rate/one-time) | EventBridge Scheduler | [AWS Docs §1.1] |
+| 엄격한 순서 + 정확히 한 번 메시지 처리 | **SQS FIFO** 또는 Step Functions Standard | [AWS Docs §2] / [AWS Docs §4] |
+| 고처리량 대기열 (순서 무관) | SQS Standard | [AWS Docs §2] |
+| 고처리량 스트리밍 + replay 필요 | Kinesis Data Streams | [AWS Docs §3] |
+| 다중 consumer 저지연 스트리밍 | Kinesis Data Streams + Enhanced Fan-out | [AWS Docs §3.1] |
+| 장기 워크플로 (≤1년, 감사 필요) | Step Functions **Standard** | [AWS Docs §4] |
+| 고빈도 저비용 워크플로 (≤5분, 멱등 가능) | Step Functions **Express** | [AWS Docs §4] |
+| API Gateway 뒤 동기 마이크로서비스 | Step Functions Sync Express (at-most-once 수용 시) | [AWS Docs §4] |
+
+---
+
+*근거 상세*: Step Functions는 RESEARCH §17에서 수집 완료. EventBridge / SQS / Kinesis는 본 Stage C에서 AWS Docs 재수집 (RESEARCH §8.6 이월 항목 해결).
diff --git a/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-spot.md b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-spot.md
new file mode 100644
index 0000000..16207c9
--- /dev/null
+++ b/plugins/workflow/skills/serverless-migration-advisor/references/tradeoffs-spot.md
@@ -0,0 +1,175 @@
+> **Snapshot date**: 2026-04-18
+> **Description**: Spot 용량·인터럽트·HUGI·billable 정의
+
+# Spot Tradeoffs
+
+AWS Spot를 활용한 이행 시 공통 고려사항. 서비스별 특화(Fargate Spot / EC2 Spot / SageMaker Managed Spot / Batch Spot)는 [tradeoffs-compute.md](tradeoffs-compute.md) 참조.
+
+## 1. Placement scores — 리전 선택이 첫 번째 비용 결정
+
+**원칙**: 같은 인스턴스 타입이라도 리전마다 Spot 용량은 극단적으로 다름. 실측 사례로 score 1-2 (us-west-2) vs score 9 (us-east-1)가 존재한다. [Insight #1]
+
+### 1.1 명령 예시
+
+```bash
+# 여러 리전 점수 비교 (한 번에 봐야 결정 가능)
+for region in us-east-1 us-east-2 us-west-2 eu-west-1 ap-northeast-1; do
+  echo -n "$region: "
+  aws ec2 get-spot-placement-scores \
+    --instance-types g7e.4xlarge \
+    --target-capacity 1 \
+    --single-availability-zone \
+    --region-names $region \
+    --region $region \
+    --query "max_by(SpotPlacementScores, &Score).Score" \
+    --output text 2>/dev/null
+done
+```
+
+### 1.2 점수 해석 [AWS Docs — Spot placement scores]
+
+| Score | 의미 | 의사결정 |
+|-------|------|----------|
+| 8-10 | 높은 가용성 | 진행 |
+| 5-7 | 중간 | 진행하되 fallback 준비 |
+| 3-4 | 낮음 | 대안 검토 또는 지연 수용 |
+| 1-2 | 매우 낮음 | **리전 전환** |
+
+### 1.3 실측 사례 — autoresearch [Insight #1]
+
+| 리전 | 인스턴스 | 점수 | 결과 |
+|------|----------|------|------|
+| us-west-2 | ml.g7e.2xlarge/4xlarge | 1-2 | 30분+ "Starting" 정체 |
+| us-east-1 | ml.g7e.2xlarge | 9 | 약 2분 내 할당 |
+
+**교훈**: 30초의 CLI 조회가 30분+ 대기를 방지한다. [Case: autoresearch]
+
+## 2. Interruption behaviors
+
+EC2 Spot의 3가지 인터럽트 동작. Fargate Spot·SageMaker Managed Spot은 이 중 `terminate` 시맨틱만 노출된다. [AWS Docs]
+
+| 동작 | 시맨틱 | 2분 경고 | 조건 |
+|------|--------|----------|------|
+| `terminate` | 인스턴스 종료 (기본) | 제공 | 항상 가능 |
+| `stop` | EBS 보존, 재시작은 EC2만 | 제공 | **persistent request** |
+| `hibernate` | RAM 보존, 복원 즉시 시작 | **없음** (즉시) | 패밀리·AMI 지원 필요 |
+
+## 3. Interruption signals
+
+### 3.1 2분 경고 이벤트 (EventBridge)
+
+```json
+{
+  "detail-type": "EC2 Spot Instance Interruption Warning",
+  "source": "aws.ec2",
+  "detail": {
+    "instance-id": "i-0abcd...",
+    "instance-action": "terminate"
+  }
+}
+```
+
+Fargate Spot은 같은 EventBridge 이벤트(`ECS Task State Change` + `stopCode: "SpotInterruption"`)로 관측. [AWS Docs]
+
+### 3.2 IMDSv2 메타데이터 (EC2 Spot)
+
+```bash
+TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
+  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
+curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
+  http://169.254.169.254/latest/meta-data/spot/instance-action
+# → {"action": "terminate", "time": "2026-04-18T10:00:00Z"}
+```
+
+권장 폴링 간격: **5초**. [AWS Docs]
+
+### 3.3 Rebalance recommendation
+
+2분 경고보다 **먼저** 발생하는 선제 신호. EventBridge `EC2 Instance Rebalance Recommendation` 이벤트 또는 IMDSv2 `rebalance` 엔드포인트로 수신. ASG와 결합하면 사실상 drain 여유가 2분 이상으로 확장된다. [AWS Docs]
+
+## 4. HUGI 원칙 — Hurry Up and Get Idle
+
+**핵심**: Spot에서 절감은 "wall clock을 줄이는" 것이 아니라 "과금되는 시간 자체를 줄이는" 것으로 달성된다.
+
+### 4.1 Billable ≠ wall clock
+
+SageMaker Managed Spot의 공식 정의: [AWS Docs]
+
+```
+절감률(%) = (1 - BillableTimeInSeconds / TrainingTimeInSeconds) × 100
+```
+
+- TrainingTime: Starting/Downloading 포함 잡 전체 wall clock
+- BillableTime: **Training 구간만** 포함 (인터럽트 후 재개 시에도 training만 누적)
+
+### 4.2 상시 서버 대비 수학적 근거
+
+| 모델 | 24시간 중 활용률 | Billable |
+|------|------------------|----------|
+| 상시 EC2 (동일 작업을 8시간에 몰아 수행) | 33% | 24h × 1.0 (idle 포함) |
+| Spot burst (8시간 수행 후 종료) | 33% | 8h × 0.3 (Spot 할인) = **약 10%** |
+
+즉, 동일한 wall clock 결과(8h 작업)라도 HUGI 적용 시 과금 시간이 상시 서버의 **1/10 수준**이 된다. [Case: autoresearch] 229s H100 run이 $0.16에 멈추는 이유도 동일.
+
+### 4.3 패턴 선택
+
+- 예측 가능한 배치 → SageMaker Managed Spot (AWS가 billable 공식화)
+- 예측 불가 이벤트 → Lambda (zero-idle이 기본 모델)
+- 상시 + 스파이크 → Fargate + Fargate Spot 혼합 + API Gateway (ALB 고정비 제거) [Case: openclaw] [Insight #O2]
+
+## 5. Do / Don't
+
+| Do | Don't | 근거 |
+|----|-------|------|
+| 기본 max price (= On-Demand 가격 상한) 유지 | 수동으로 낮은 max price 지정 | 인터럽트 빈도 증가 [AWS Docs] |
+| 다중 AZ + 다중 인스턴스 타입 + ASG | 단일 인스턴스 타입·단일 AZ 고정 | 용량 편차 [Insight #1] |
+| ASG로 자동 교체 | 단독 인스턴스 장기 운영 | 교체 메커니즘 부재 [AWS Docs] |
+| AWS FIS로 인터럽트 주입 사전 검증 | 운영 전 테스트 없이 배포 | [AWS Docs — FIS tutorial] |
+| Rebalance recommendation 수신 시 proactive drain | 2분 경고만 기다림 | 선제 신호 미활용 [AWS Docs] |
+| Fargate Spot = 서비스 단위 (`desiredCount ≥ 2`) | 단일 태스크 + Fargate Spot | 자동 fallback 없음 [Insight #O1] |
+| 큰 인스턴스도 확인 | "크면 비쌀 것" 가정 | g7e.8xlarge $0.93 < g7e.2xlarge $1.82 사례 [Insight #2] |
+| 체크포인트 반드시 활성화 (>5분 잡) | 체크포인트 없이 장시간 잡 | MaxWaitTime ≤ 1h 강제 [AWS Docs] |
+
+## 6. 인터럽트 추적
+
+### 6.1 CloudTrail `BidEvictedEvent`
+
+EC2 Spot 인터럽트의 사후 감사 기록. 장애 분석·재시도 통계의 단일 소스. [AWS Docs]
+
+### 6.2 EventBridge rule 예시
+
+```json
+{
+  "source": ["aws.ec2"],
+  "detail-type": ["EC2 Spot Instance Interruption Warning"]
+}
+```
+
+```json
+{
+  "source": ["aws.ecs"],
+  "detail-type": ["ECS Task State Change"],
+  "detail": { "stopCode": ["SpotInterruption"] }
+}
+```
+
+SNS 알람 + Lambda 자동 재시도 타겟으로 연결. [AWS Docs]
+
+## 7. 워크로드별 Spot 적합도
+
+AWS Batch 문서 기준 분류를 이행 관점으로 재배치. [AWS Docs]
+
+| 워크로드 | Spot 적합 | 이유 |
+|----------|-----------|------|
+| 배치 ETL · 데이터 처리 | O | retry·멱등 설계가 자연스러움 [AWS Docs] |
+| ML 훈련 · HPO | O | 체크포인트 → billable만 과금 [AWS Docs] · [Case: autoresearch] |
+| CI/CD 빌드 | O | fault-tolerant · 재실행 저비용 [AWS Docs] |
+| 렌더링·시뮬레이션 | O | 프레임/샘플 단위 분할 가능 [AWS Docs] |
+| 상시 스테이트리스 API (primary) | △ | fallback 설계 필수 — openclaw는 Lambda primary + Fargate Spot fallback으로 해결 [Case: openclaw] [Insight #O1] |
+| 프로덕션 데이터베이스 | X | 인터럽트 시 데이터 무결성 위험 [AWS Docs] |
+| SLA 엄격한 사용자 대면 API | X | 2분 drain으로 부족 [AWS Docs] |
+| 저지연 고정 컨슈머 | X | 재배치 비용이 과 [AWS Docs] |
+
+---
+
+*참고*: Spot 기반 워크로드의 구체적 how-to(지역 선정 스크립트, 쿼터 증액 명령)는 `sagemaker-spot-training` 스킬의 `references/spot-capacity-guide.md`로 위임된다.