diff --git a/TASK.md b/TASK.md
index 3bb0eca..4104ee2 100644
--- a/TASK.md
+++ b/TASK.md
@@ -1,188 +1,71 @@
-## 작업 계획
-
-작업 순서
-- 1) LLM 모듈화(퍼사드/프로바이더/모델 레지스트리) + GPT-5 mini 기본 적용
-- 2) Gemini 도입: 병행 사용(전용) + 대체 사용(퍼사드)
-- 3) 토큰 카운트 및 비용 로깅 추가(양 프로바이더 공통)
-
-참고: 아래 문서의 섹션 순서와 무관하게 실제 구현 순서는 위의 "작업 순서"를 따릅니다.
-
-
-### 1) LLM 모듈화(퍼사드/프로바이더/모델 레지스트리) + GPT-5 mini 기본 적용
-
-목적: LLM 호출을 모듈화하여 옵션 기반으로 모델/프로바이더를 교체 가능하게 만들고, 기본 모델을 `gpt-5-mini`로 전환합니다. 토크나이저/가격표는 3단계에서 처리합니다.
-
-1. [구조] 파일/모듈 구성
-   - 디렉토리: `src/llm/`
-     - `src/llm/types.ts` — 공통 인터페이스 정의
-       - `GenerateRequest`: `{ provider?: 'openai'|'gemini', model?: string, messages?: OpenAIStyleMessages, contents?: GeminiStyleContents, stream?: boolean, tools?, options?: { temperature?, top_p?, max_output_tokens?, reasoning?, text? }, meta?: { userId?, categoryId?, postId? } }`
-       - `GenerateStream`: `onToken(text)`, `onToolCall(json)`, `onEnd()`, `onError(err)`(또는 AsyncIterable)
-     - `src/llm/modelRegistry.ts` — 모델 레지스트리/기본값
-       - 논리 모델 키 → `{ provider, modelId, defaults, tokenizerKey?, pricingKey? }`
-       - 기본값: `defaultChat = { provider: 'openai', modelId: 'gpt-5-mini' }`
-     - `src/llm/providers/openaiResponses.ts` — OpenAI Responses API 구현
-     - `src/llm/providers/gemini.ts` — @google/gemini 구현
-     - `src/llm/index.ts` — 퍼사드: `generate(req: GenerateRequest): GenerateStream` 선택 라우팅
-   - 기존 서비스(`qa.service.ts`)는 퍼사드만 사용하도록 변경
-
-2. [기본 모델] GPT-5 mini 적용(Responses API)
-   - `src/config.ts`의 `CHAT_MODEL` 기본값을 `gpt-5-mini`로 변경
-   - OpenAI 경로: `openai.responses.create/stream`로 마이그레이션(SSE 어댑터 포함)
-   - 기존 Chat Completions 경로는 임시 백업/옵션으로 유지 가능(필요 시)
-
-3. [옵션 기반 모델/프로바이더 선택]
-   - 요청 바디에 `llm?: { provider?: 'openai'|'gemini', model?: string, options?: {...} }` 허용
-   - 미지정 시 레지스트리의 기본값 사용(`gpt-5-mini` on OpenAI)
-   - 향후 기능(Reasoning/Text 옵션, tool calls, timeout 등) 확장 용이
-
-4. [검증/수용 기준]
-   - `/ai/ask` SSE 정상 동작(중단/지연 없음)
-   - 기존 프롬프트/툴 호출이 동일하게 동작(필요 시 어댑터)
-   - 로그/오류 처리 기존 수준 유지
-   
-### 2) Gemini 도입: 병행 사용(전용) + 대체 사용(퍼사드)
-
-목적: Gemini를 독립 엔드포인트로 직접 쓰는 경로와, 기존 GPT 경로의 대체 제공자로 모두 사용할 수 있게 합니다(퍼사드 경유). 이후 3단계에서 토큰/비용 로깅을 공통 적용합니다.
-
-1. [Config] Gemini 키/모델 설정
-   - `.env`
-     - `GEMINI_API_KEY=...`
-     - `GEMINI_CHAT_MODEL=gemini-2.5-flash` (예: 변경 가능)
-   - `src/config.ts`에 항목 반영 및 기본값/검증 추가(Provider 고정 ENV는 사용하지 않음)
-
-2. [Provider] 퍼사드에 Gemini 구현 추가
-   - 1단계에서 만든 LLM 퍼사드(`src/llm/index.ts`)에 Gemini 프로바이더를 추가
-   - 구현 위치: `src/llm/providers/gemini.ts` (OpenAI 구현은 `src/llm/providers/openaiResponses.ts`)
-   - 퍼사드 인터페이스로 라우팅되어 기존 `qa.service.ts`는 퍼사드만 사용(교체 투명)
-
-3. [Gemini 호출] @google/genai SDK 적용 및 스트리밍
-   - 의존성: `@google/genai` 추가 (설치 커맨드: `npm i @google/genai`)
-   - 클라이언트: `import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI({});` (`GEMINI_API_KEY`는 환경변수에서 자동 주입)
-   - 비스트리밍(우선 적용):
-     - `ai.models.generateContent({ model: GEMINI_CHAT_MODEL, contents, config: { thinkingConfig: { thinkingBudget }}})`
-     - 기본값으로 `thinkingBudget=0`(생각 비활성화) 적용, `.env`에서 오버라이드 가능
-     - 응답 텍스트를 한번에 수신한 뒤 SSE로 순차 chunk 분할하여 `answer` 이벤트로 전송(간단 구현)
-   - 스트리밍(선택 적용):
-     - SDK 제공 시 스트리밍 API 사용(예: `generateContentStream` 유사 기능)으로 델타를 받아 즉시 SSE로 전달
-     - SDK에서 미지원일 경우, 비스트리밍으로 우선 릴리즈 후 스트리밍 전환
-   - (옵션) Safety 설정, generationConfig(temperature/topP/maxOutputTokens) 파라미터는 설정값으로 노출
-
-5. [토큰 카운팅] Gemini 대응
-   - 사전 카운트(가능 시): SDK의 토큰 카운트 API(`tokens:count`/`countTokens`)가 제공되면 이를 사용해 프롬프트 토큰 계산 → 비용 선로깅
-     - 네트워크 요청이므로 로깅 토글이 켜져 있을 때만 수행하도록 옵션화
-   - 사후 카운트: 응답 텍스트 기준 동일 API로 출력 토큰 계산(또는 비가용 시 근사치)
-   - 폴백 전략: 카운트 API가 불가한 환경에서는 근사치 사용(문자수/4), 추후 정확도 개선 시 교체
-
-6. [가격 정책] Gemini 추가
-   - `src/config/pricing.ts`의 `PRICING_TABLE`에 Gemini 모델(`gemini-2.5-flash`, 임베딩 모델 등) 단가 추가
-   - 동일한 `calcCost`, `formatCost` 로직 재사용
-
-7. [생각(Thinking) 설정] 기본 비활성화
-   - Gemini 2.5 Flash의 생각 기능은 응답 품질 대신 비용/지연이 증가하므로 기본 `thinkingBudget=0`으로 비활성화
-   - `.env`에 `GEMINI_THINKING_BUDGET`를 두어 필요 시 활성화(정수값)
-
-8. [도구/함수 호출] 호환성 계획(선택)
-   - 현재 OpenAI `tool_calls`를 사용 중. Gemini는 `functionDeclarations`/`toolConfig` 형태로 유사 기능 제공
-   - 1단계: Gemini 경로에서는 도구 호출 비활성화(빠른 도입)
-   - 2단계: 필요 시 `report_content_insufficient`를 Gemini `functionDeclarations`로 매핑하여 동일 동작 구현
-
-9. [Wiring] 사용 패턴
-   - 독립 사용(A): `POST /ai/gemini/ask`로 직접 호출(옵션: thinkingBudget 등)
-   - 대체 사용(B): 기존 `POST /ai/ask`에 `llm.provider?: 'openai'|'gemini'`, `llm.model?` 허용 → 퍼사드가 라우팅
-   - 로깅 시 `provider` 필드를 포함(3단계에서 적용)
-
-10. [검증/수용 기준]
-   - OpenAI/Gemini 각각에서 동일한 SSE 응답 형식으로 동작
-   - 요청 전/후 토큰·비용 로그가 두 프로바이더 모두에서 출력
-   - 로깅 토글이 정상 작동, 스트리밍 성능 저하 없음
-
-설정 확정
-- `GEMINI_CHAT_MODEL=gemini-2.5-flash`
-- `GEMINI_THINKING_BUDGET=0` (기본값으로 비활성화)
-
-### 토큰 카운트 및 비용 로깅 추가
-
-목적: LLM에 요청을 보내기 직전에 프롬프트(메시지) 토큰 수를 계산해 예상 입력 비용을 콘솔로 로깅하고, 스트리밍 응답 완료 후 실제 출력 토큰 수 기반 최종 비용을 추가 로깅합니다. 초기에는 `console.log`만 사용합니다.
-
-1. [Utils] 토크나이저 유틸 추가
-   - 파일: `src/utils/tokenizer.ts`
-   - 내용:
-     - `getTokenizerForModel(model: string)` → 모델명에 따라 적절한 인코딩을 선택
-       - `gpt-5*` → 자료 제공 전까지 임시로 `o200k_base` 사용(TBD, 전환 시 교체)
-       - `gpt-4o`, `gpt-4o-mini`, 기타 `o`계열 → `o200k_base`
-     - `countTextTokens(text: string, model: string): number`
-     - `countChatMessagesTokens(messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[], model: string): number`
-       - 메시지 `content`들을 토크나이즈하여 합산하고, 채팅 포맷 오버헤드(메시지당 소량, 모델별 상수)를 보정치로 가산
-       - 주의: 보정치는 근사치이며, 정확한 정산은 응답 토큰 합산으로 후처리
-   - 비고: 이미 프로젝트에 `@dqbd/tiktoken`이 포함되어 있으므로 이를 사용합니다.
-
-2. [Config] 가격 정책 맵 구조 설계 (임시 하드코딩 + ENV 오버라이드)
-   - 파일: `src/config/pricing.ts`
-   - 내용:
-     - `export type Pricing = { input_per_1k: number; output_per_1k: number; cached_input_per_1k?: number; currency: 'USD' | 'KRW' }`
-     - `PRICING_TABLE: Record<string, Pricing>`: 모델명 키에 따른 단가 설정
-     - 선택: `LLM_PRICING_OVERRIDES`(JSON) 환경변수로 런타임 오버라이드 허용
-   - 초기값은 사용자 제공 정책으로 채울 예정. 제공 전까지는 로깅에 `N/A` 표기 또는 0 처리.
-
-    - 초기 PRICING_TABLE(제공 정책 반영, 단위: per 1K tokens, 통화: USD)
-      - `gpt-5`: { input_per_1k: 0.00125, cached_input_per_1k: 0.000125, output_per_1k: 0.01, currency: 'USD' }
-      - `gpt-5-mini`: { input_per_1k: 0.00025, cached_input_per_1k: 0.000025, output_per_1k: 0.002, currency: 'USD' }
-      - `gpt-5-nano`: { input_per_1k: 0.00005, cached_input_per_1k: 0.000005, output_per_1k: 0.0004, currency: 'USD' }
-
-3. [Utils] 비용 계산 유틸 추가
-   - 파일: `src/utils/cost.ts`
-   - 내용:
-     - `getModelPricing(model: string): Pricing | null`
-     - `calcCost(tokens: number, per_1k: number): number` → 반올림 1~4자리(옵션)
-     - 화폐 표기 함수(선택): `formatCost(amount: number, currency: string)`
-
-4. [Facade] LLM 퍼사드에 비용 로깅 통합
-   - 위치: `src/llm/index.ts` 퍼사드 내부에서 공통 로깅 수행
-   - 기능 흐름(공통):
-     1) 요청 전: 메시지/콘텐츠 토큰 카운트 → `promptTokens`
-        - OpenAI: `countChatMessagesTokens`(토크나이저)
-        - Gemini: `countTokens` API 가능 시 사용(불가 시 근사치)
-     2) 단가 조회: `getModelPricing(model)` → `estInputCost`
-     3) 선로깅: `{type:'llm.request', provider, model, promptTokens, estInputCost, corrId, userId, categoryId, postId}`
-     4) 실제 호출: 등록된 프로바이더(OpenAI Responses 또는 Gemini)로 위임, 스트림은 그대로 중계
-     5) 스트림 종료 후: 출력 텍스트/함수인자 토큰 합산 → `completionTokens`
-     6) 비용 계산: 입력/출력(+cached 입력이 있으면 분리) → `totalCost`
-     7) 후로깅: `{type:'llm.response', provider, model, promptTokens, completionTokens, inputCost, outputCost, totalCost, durationMs, corrId, cachedInputTokens}`
-   - 주의: 기존 SSE 흐름(이벤트명/포맷) 불변 유지. 퍼사드는 원본 델타를 그대로 전달.
-   - 상관관계 ID(`corrId`)는 `uuid` 생성(또는 요청별 식별자 전달 시 사용).
-
-5. [Wiring] `qa.service.ts`에서 퍼사드 사용
-   - 기존 직접 호출부를 LLM 퍼사드로 교체(`generate(req)`)
-   - 요청 바디의 `llm` 옵션을 퍼사드에 그대로 전달(provider/model/options)
-   - 출력 토큰 카운트/비용 로깅은 퍼사드 내부에서 처리
-
-6. [옵션] 임베딩 호출 비용 로깅(확장)
-   - 파일: `src/services/embedding.service.ts`
-   - `createEmbeddings` 호출 직전 `input` 텍스트 전체 토큰 수 계산(`countTextTokens` 누적) → 입력 비용 로깅
-   - 임베딩 모델 단가(`text-embedding-3-*`)도 `PRICING_TABLE`에 포함
-
-7. [환경변수] 로깅 토글 및 라운딩
-   - `.env` 키 추가(기본값은 off)
-     - `LLM_COST_LOG=true|false` (기본: true로 해도 무방)
-     - `LLM_COST_ROUND=2` (소수점 자리수, 선택)
-   - 로깅은 토글 꺼져 있으면 수행하지 않음
-
-8. [로그 포맷] 예시(JSON 라인)
-   - 요청 전: `{ "type": "llm.request", "corrId": "...", "provider": "openai", "model": "gpt-5-mini", "promptTokens": 1234, "estInputCost": 0.00031, "userId": "...", "categoryId": 1, "postId": 42 }`
-   - 응답 후: `{ "type": "llm.response", "corrId": "...", "provider": "openai", "model": "gpt-5-mini", "promptTokens": 1234, "completionTokens": 456, "inputCost": 0.00031, "outputCost": 0.00091, "totalCost": 0.00122, "durationMs": 987, "cachedInputTokens": 0 }`
-
-9. [검증/수용 기준]
-   - `POST /ai/ask` 호출 시 콘솔에 요청 전/후 로그 각각 1회 출력
-   - 모델/프롬프트/토큰 수/예상 비용/총 비용/시간(ms)이 포함되어야 함
-   - 로깅 on/off 토글 동작, 라운딩 반영 확인
-   - 기존 SSE 동작(끊김/지연) 변화 없음
-
-10. [주의/한계]
-   - 채팅 포맷 오버헤드는 모델별로 상이하며 근사치 사용. 최종 비용은 출력 토큰 카운트까지 반영해 오차 최소화
-   - 스트리밍 API는 서버에서 사용량 메타를 즉시 제공하지 않으므로(비스트리밍과 달리), 응답 텍스트 기반 자체 카운트 수행
-   - 함수 호출(tool_calls) 토큰은 인자 길이에 비례하여 증가. 누적 텍스트/인자 기반으로 동일하게 카운트
-   - Cached Input 과금: 제공 API에서 캐시 히트 토큰 정보를 명시적으로 제공하는 경우에만 `cachedInputTokens`로 분리 산정. 그렇지 않으면 일반 입력으로 계산(보수적)
-
-11. [다음 단계(선택)]
-   - `console.log` → 구조화 로거(Pino/Winston)로 교체, 샘플링·보존 기간 설정
-   - DB 또는 시계열(예: ClickHouse/Prometheus) 적재로 사용자별 비용 대시보드 구성
+## 하이브리드 서치 확장(질문 재작성 + 키워드)
+
+목표
+- 리콜 향상: 질문을 LLM이 재작성(rewrites)하고, 핵심 키워드를 생성하여 벡터+텍스트 양쪽에서 검색.
+- 정밀/비용 제어: 재작성/키워드 개수를 제한하고, 융합 가중치로 결과를 안정적으로 선별.
+
+Plan JSON 확장(초안)
+```json
+{
+  "rewrites": ["..."],
+  "keywords": ["..."],
+  "hybrid": { "enabled": true, "alpha": 0.7, "max_rewrites": 4, "max_keywords": 8 }
+}
+```
+- 기존 필드(`top_k`, `threshold`, `weights`, `filters.time`, `sort`, `limit`)와 공존.
+- 서버에서 상한 강제(rewrites<=4, keywords<=8), 품질이 낮거나 중복인 항목 제거.
+
+프롬프트 가이드(확장)
+- 역할: ‘검색 계획 + 재작성/키워드 생성’
+- 출력: 기존 계획 JSON + `rewrites[]`, `keywords[]`, `hybrid{}`
+- 규칙:
+  - 불용/범용 단어(예: "글", "포스트", "블로그") 지양
+  - 과도한 시기/주제 확장 금지(사용자 블로그 컨텍스트 벗어나지 않기)
+  - 중복/동의어 반복 최소화(문장 유사도 과다 시 제거)
+
+하이브리드 검색 파이프라인
+1) 멀티 벡터 검색: 원 질문 + rewrites 각각 임베딩 → pgvector Top-K 검색(합집합)
+2) 텍스트 검색: keywords로 `post_chunks.content`/`blog_post.title` 텍스트 매칭 → Top-K 추출
+3) 랭킹 융합: 점수 정규화 후 `score = α·vec + (1-α)·text` 또는 RRF로 병합 → 상위 N 청크 선택
+4) 컨텍스트 구성: v1과 동일 프롬프트로 최종 LLM 호출
+
+저장소/인덱스(제안)
+- PostgreSQL 확장: `pg_trgm`(간단/범용) 또는 `tsvector`(정교)
+- 1차안(pg_trgm)
+  - DDL: 
+    - `CREATE EXTENSION IF NOT EXISTS pg_trgm;`
+    - `CREATE INDEX IF NOT EXISTS idx_pc_content_trgm ON post_chunks USING gin (content gin_trgm_ops);`
+    - `CREATE INDEX IF NOT EXISTS idx_bp_title_trgm ON blog_post USING gin (title gin_trgm_ops);`
+  - 질의: `similarity(content, $q) > $min OR content ILIKE ANY($patterns)` 등으로 text_score 계산
+
+SSE 확장(선택)
+- `event: rewrite` / `data: ["..."]`
+- `event: keywords` / `data: ["..."]`
+- `event: hybrid_result` / `data: [{ postId, postTitle }]`
+
+보안/안전
+- 재작성/키워드 출력 스키마 강제(JSON), 길이/개수 상한
+- 금칙어/민감어 필터(선택), 카테고리/기간 필터는 서버가 최종 결정
+
+세부 구현 계획(하이브리드 서치)
+1) 스키마/프롬프트
+  - [ ] `src/types/ai.v2.types.ts`: Plan 스키마에 `rewrites[]`, `keywords[]`, `hybrid{enabled,alpha,max_rewrites,max_keywords}` 추가
+  - [ ] `src/prompts/qa.v2.prompts.ts`: 프롬프트/JSON Schema에 확장 필드 반영 + few-shot 보강
+2) 플래너 서비스
+  - [ ] `search-plan.service.ts`: 확장 필드 파싱/검증, 중복/불용어 필터링, 상한 강제
+3) 텍스트 검색 저장소
+  - [ ] `post.repository.ts`: `textSearchChunksV2({ userId, query, keywords[], from?, to?, topK })`
+  - [ ] (옵션) DDL 문서화: pg_trgm 인덱스 생성 스크립트 추가
+4) 하이브리드 서비스
+  - [ ] `hybrid-search.service.ts` 구현: 멀티 임베딩, 텍스트 검색, 정규화, α 융합/RRF, 상위 N 반환
+5) 오케스트레이션/이벤트
+  - [ ] `qa.v2.service.ts`: plan.hybrid.enabled 시 하이브리드 경로 분기, (선택) `rewrite`/`keywords`/`hybrid_result` 송신
+6) 테스트/튜닝
+  - [ ] 통합 테스트: 재작성/키워드 포함 질의에서 리콜↑ 확인
+  - [ ] 가중치 α, Top-K 상수, 불용어/중복 필터 기준 튜닝
+  - [ ] 0건/오류 폴백(v1 RAG) 검증
+
+권장 단계적 도입
+- Phase 1: 스키마/프롬프트/하이브리드 파이프라인 구현(기본 α=0.7, rewrites<=3, keywords<=6)
+- Phase 2: SSE 관측성 이벤트 추가, 품질 튜닝
+- Phase 3: 인덱스/성능 최적화, 불용어 사전/NER 보정
diff --git a/docs/api.md b/docs/api.md
new file mode 100644
index 0000000..2e55bef
--- /dev/null
+++ b/docs/api.md
@@ -0,0 +1,171 @@
+# Bubblog AI API 문서 (v1 ~ v2)
+
+본 문서는 `/ai` (v1)와 `/ai/v2` (v2) 엔드포인트를 정리합니다. 서버는 Express 기반이며, `POST /ask` 류는 Server‑Sent Events(SSE)로 답변을 스트리밍합니다.
+
+## 기본 정보
+- Base Path
+  - v1: `/ai`
+  - v2: `/ai/v2`
+- 인증
+  - `POST /ask` 엔드포인트는 `Authorization: Bearer <JWT>` 필요
+  - 임베딩 생성 엔드포인트는 인증 없이 사용 가능
+- 본문 형식: `application/json`
+- SSE 수신: `Content-Type: text/event-stream`
+  - 이벤트명은 `event:` 라인으로, 데이터는 `data:` 라인으로 전송됩니다.
+  - 일반 텍스트 콘텐츠는 `event: answer`로 분할 전송되며, 종료 시 `event: end` + `data: [DONE]`가 송신됩니다.
+
+## v1 엔드포인트 (`/ai`)
+
+### GET `/ai/health`
+- 인증: 불필요
+- 응답(200): `{ "status": "ok" }`
+
+### POST `/ai/embeddings/title`
+- 인증: 불필요
+- 요청 Body
+  - `post_id`(number, required)
+  - `title`(string, required)
+- 동작: 제목 임베딩 생성 후 저장
+- 응답(200): `{ "ok": true }`
+
+### POST `/ai/embeddings/content`
+- 인증: 불필요
+- 요청 Body
+  - `post_id`(number, required)
+  - `content`(string, required)
+- 동작: 본문을 약 512 토큰 단위로 중첩(50) 청킹 → 임베딩 생성/저장
+- 응답(200): `{ "post_id": number, "chunk_count": number, "success": true }`
+
+### POST `/ai/ask` (SSE)
+- 인증: 필요 (`Authorization: Bearer <JWT>`)
+- 요청 Body
+  - `question`(string, required)
+  - `user_id`(string, required)
+  - `category_id`(number, optional)
+  - `post_id`(number, optional) — 지정 시 해당 글 컨텍스트에 국한하여 답변
+  - `speech_tone`(number, optional)
+    - `-1`: 간결하고 명확한 말투(기본)
+    - `-2`: 해당 글의 말투를 최대한 모사
+    - 양의 정수: 페르소나 ID(해당 유저의 등록된 페르소나 참조)
+  - `llm`(object, optional)
+    - `provider`: `openai` | `gemini`
+    - `model`: string (미지정 시 서버 기본값 사용)
+    - `options`: `{ temperature?, top_p?, max_output_tokens? }`
+- SSE 이벤트(주요)
+  - `exist_in_post_status`: `true|false` — 관련 컨텍스트 존재 여부
+  - `context`: `[ { postId, postTitle }, ... ]` — 검색/선택된 컨텍스트 요약
+  - `answer`: 모델의 부분 응답 텍스트(여러 번 전송)
+  - `end`: 종료 시 `data: [DONE]`
+  - `error`: `{ code?, message }` — 예: `post_id`가 없거나 권한 없음(403), 없음(404)
+- 예시(curl)
+  ```bash
+  curl -N \
+    -H "Authorization: Bearer <JWT>" \
+    -H "Content-Type: application/json" \
+    -X POST http://localhost:3000/ai/ask \
+    -d '{
+      "question": "카테고리 A 관련 요약 해줘",
+      "user_id": "u_123",
+      "category_id": 1,
+      "speech_tone": -1
+    }'
+  ```
+
+## v2 엔드포인트 (`/ai/v2`)
+
+### GET `/ai/v2/health`
+- 인증: 불필요
+- 응답(200): `{ "status": "ok", "v": "v2" }`
+
+### POST `/ai/v2/ask` (SSE)
+- 인증: 필요 (`Authorization: Bearer <JWT>`)
+- 요청 Body
+  - `question`(string, required)
+  - `user_id`(string, required)
+  - `category_id`(number, optional)
+  - `post_id`(number, optional)
+  - `speech_tone`(number, optional)
+    - `-1`: 기본 말투(간결/명확)
+    - `-2`: 해당 글(post 모드) 말투 모사
+    - 양수: 페르소나 ID(해당 유저의 등록 페르소나)
+  - `llm`(object, optional)
+    - `provider`: `openai` | `gemini`
+    - `model`: string (미지정 시 서버 기본값 사용)
+    - `options`: `{ temperature?: number, top_p?: number, max_output_tokens?: number }`
+- 동작 개요
+  - 서버가 질문을 토대로 “검색 계획(JSON)”을 생성·검증·정규화한 뒤, 계획에 따라 시맨틱 또는 하이브리드 검색을 수행하고 결과를 SSE로 스트리밍합니다.
+  - `post_id`가 있으면 post 모드(단일 글 컨텍스트)로 처리하며, 간략한 `search_plan`/`search_result` 이벤트 후 본문 기반 답변을 스트리밍합니다.
+- 하이브리드 검색(벡터+텍스트)
+  - 계획에 `hybrid.enabled: true`인 경우 활성화됩니다.
+  - `rewrites`(재작성 질의)와 `keywords`(핵심 키워드)를 생성하여 벡터/텍스트 두 경로로 후보를 수집하고, `hybrid.retrieval_bias` 라벨을 서버가 `alpha` 값으로 매핑해 점수를 융합하여 상위 `top_k`를 선택합니다.
+    - 매핑(기본): `lexical → 0.3`, `balanced → 0.5`, `semantic → 0.75`
+    - 결합식: `score = alpha*vec + (1-alpha)*text` (각 경로 점수 min-max 정규화 후)
+  - SSE로 `rewrite`, `keywords`, `hybrid_result` 이벤트가 필요한 경우에만 송신됩니다. 하이브리드 결과가 없으면 시맨틱 검색으로 폴백합니다.
+- SSE 이벤트 순서(일반적인 흐름)
+  1) `search_plan`: 정규화된 검색 계획(JSON)
+     - 예시 데이터(정규화):
+       ```json
+       {
+         "mode": "rag",
+         "top_k": 5,
+         "threshold": 0.2,
+         "weights": { "chunk": 0.7, "title": 0.3 },
+         "filters": {
+           "time": { "type": "absolute", "from": "2025-09-01T00:00:00.000Z", "to": "2025-09-30T23:59:59.999Z" }
+         },
+         "sort": "created_at_desc",
+         "limit": 5,
+         "hybrid": { "enabled": true, "retrieval_bias": "balanced", "alpha": 0.5, "max_rewrites": 3, "max_keywords": 6 },
+         "rewrites": ["프로젝트 X 요약", "프로젝트 X 핵심"],
+         "keywords": ["프로젝트 X", "핵심", "요약"]
+       }
+       ```
+       - 비고:
+         - `filters.time`만 포함됩니다. `user_id`/`category_id`/`post_id` 등은 서버가 검색 시 내부적으로 적용합니다.
+         - `hybrid.retrieval_bias`는 LLM 라벨이며 서버가 `alpha`로 변환해 사용합니다.
+       - post 모드에서는 간략한 형태 예: `{ "mode": "post", "filters": { "post_id": 123, "user_id": "u_123" } }`.
+  2) (하이브리드 사용 시) `rewrite`: `string[]`
+  3) (하이브리드 사용 시) `keywords`: `string[]`
+  4) (하이브리드 사용 시) `hybrid_result`: `[ { postId, postTitle }, ... ]`
+  5) `search_result`: `[ { postId, postTitle }, ... ]` — 최종 컨텍스트 요약(하이브리드 또는 시맨틱)
+  6) `exist_in_post_status`: `true|false`
+  7) `context`: `[ { postId, postTitle }, ... ]`
+  8) `answer` — 모델 부분 응답(여러 번)
+  9) `end` — `data: [DONE]`
+  - 오류 시 `error`: `{ code?: number, message: string }`
+- 폴백 동작
+  - 플래너 실패 시 `search_plan`으로 `{ "mode": "rag", "fallback": true }`가 송신되며, v1 스타일 RAG로 컨텍스트를 구성합니다.
+
+- 예시(curl)
+  ```bash
+  curl -N \
+    -H "Authorization: Bearer <JWT>" \
+    -H "Content-Type: application/json" \
+    -X POST http://localhost:3000/ai/v2/ask \
+    -d '{
+      "question": "최근 한 달 블로그에서 프로젝트 X 관련 내용 요약",
+      "user_id": "u_123",
+      "category_id": 3,
+      "llm": { "provider": "openai", "model": "gpt-5-mini", "options": { "temperature": 0.2, "top_p": 0.9, "max_output_tokens": 800 } }
+    }'
+  ```
+
+## 참고 사항
+- `post_id`가 지정된 요청에서 해당 글이 존재하지 않으면 SSE로 `error` 이벤트(404)가 송신되고 스트림이 종료됩니다.
+- `post.is_public`이 `false`인 글은 요청 `user_id`가 글 소유자와 다르면 `error` 이벤트(403)로 차단됩니다. `post.is_public`이 `true`면 누구나 접근 가능합니다.
+- v1/v2 모두 모델 응답 텍스트는 `answer` 이벤트로 분할 전송됩니다. 클라이언트는 누적하여 최종 답변을 구성해야 합니다.
+- EventSource(브라우저) 사용 예시
+  ```js
+  const es = new EventSource('/ai/v2/ask', { withCredentials: true }); // 헤더 인증이 필요한 경우 fetch/XHR 권장
+  es.addEventListener('search_plan', (e) => console.log('plan', e.data));
+  es.addEventListener('search_result', (e) => console.log('result', e.data));
+  es.addEventListener('context', (e) => console.log('ctx', e.data));
+  es.addEventListener('answer', (e) => renderAppend(JSON.parse(e.data)));
+  es.addEventListener('end', () => es.close());
+  es.addEventListener('error', (e) => es.close());
+  ```
+
+## 요약
+- v1 `/ai/ask`: 컨텍스트 존재 여부와 요약(`exist_in_post_status`, `context`) 후 답변 스트리밍
+- v2 `/ai/v2/ask`: 위 흐름에 더해 검색 계획(`search_plan`)과 검색 결과 요약(`search_result`)을 추가로 제공
+- 임베딩 API(v1): 게시물 제목/본문 임베딩 생성 및 저장
diff --git a/docs/migrations/2025-01-pgtrgm.sql b/docs/migrations/2025-01-pgtrgm.sql
new file mode 100644
index 0000000..168a891
--- /dev/null
+++ b/docs/migrations/2025-01-pgtrgm.sql
@@ -0,0 +1,9 @@
+-- Enable pg_trgm and add GIN indexes for text search on content and title
+CREATE EXTENSION IF NOT EXISTS pg_trgm;
+
+CREATE INDEX IF NOT EXISTS idx_pc_content_trgm
+  ON post_chunks USING gin (content gin_trgm_ops);
+
+CREATE INDEX IF NOT EXISTS idx_bp_title_trgm
+  ON blog_post USING gin (title gin_trgm_ops);
+
diff --git a/docs/migrations/README.md b/docs/migrations/README.md
new file mode 100644
index 0000000..3f4de87
--- /dev/null
+++ b/docs/migrations/README.md
@@ -0,0 +1,27 @@
+# Migrations Guide
+
+This folder contains SQL scripts for optional indexes/extensions used by the AI services.
+
+## Apply pg_trgm for text search
+
+File: `2025-01-pgtrgm.sql`
+
+Purpose:
+- Enable `pg_trgm` extension and add GIN indexes on `post_chunks.content` and `blog_post.title` to accelerate partial/fuzzy text search used in hybrid search.
+
+Run (with `DATABASE_URL`):
+
+```bash
+psql "$DATABASE_URL" -f docs/migrations/2025-01-pgtrgm.sql
+```
+
+Or inside `psql`:
+
+```sql
+\i docs/migrations/2025-01-pgtrgm.sql
+```
+
+Notes:
+- Indexes increase disk usage and write overhead; create only on columns used for text search.
+- The extension must be installed once per database.
+
diff --git a/docs/reports/REPORT-askv2.md b/docs/reports/REPORT-askv2.md
new file mode 100644
index 0000000..c9d4229
--- /dev/null
+++ b/docs/reports/REPORT-askv2.md
@@ -0,0 +1,134 @@
+# 보고서: /ai/v2/ask — 구조, 동작, 하이브리드 검색 계획
+
+## 1) 개요
+- 목표: v1의 고정형 RAG 한계를 보완. “검색 계획(JSON)”을 LLM이 생성 → 서버가 안전하게 표준화/검증 → 시맨틱/하이브리드 검색 수행 → 최종 답변을 SSE로 스트리밍.
+- 상태: `POST /ai/v2/ask`(SSE) 운영 중. v1은 유지하며, v2에서 계획 기반 검색과 관측성을 강화.
+
+## 2) v1의 한계와 v2 도입 효과
+- 고정 파라미터 → 동적 계획
+  - v1: 임계치(0.2), LIMIT(5), 가중치(0.7/0.3) 등 고정.
+  - v2: `top_k`, `threshold`, `weights`, `sort`, `limit`을 질문/맥락에 맞게 동적 제어(서버가 범위 강제).
+- 시간/정렬 의도 미반영 → 자연어 시간 해석
+  - v1: “최근/지난주/9월/작년” 같은 시간 의도 반영 불가.
+  - v2: `filters.time`(상대/월/분기/연도)을 받아 `KST 절대범위(from/to)`로 변환해 쿼리에 반영.
+- 리콜/정밀도 균형 한계 → 하이브리드
+  - v1: 임베딩 기반 RAG만.
+  - v2: 벡터+텍스트 하이브리드 융합으로 재현율 향상 및 키워드 민감 질의 대응.
+- 관측성 부족 → SSE 메타 이벤트
+  - v1: 검색/선택 근거가 불투명.
+  - v2: `search_plan`, `rewrite`, `keywords`, `hybrid_result`, `search_result` 등으로 의사결정 가시화.
+- 안전성/정합성
+  - v1: 클라이언트 입력 이탈 감지 어려움.
+  - v2: JSON Schema(strict) + Zod 검증 + 필터 강제 주입으로 안전한 실행계획만 수행.
+- 비용/토큰 관리
+  - v1: 고정 상수로 세밀한 제어 어려움.
+  - v2: `top_k`/향후 `limit`/dedupe로 토큰 예산을 상황별 최적화 가능.
+
+![v1 구조](../structureDiagram/askV1-structure.png)
+
+## 3) 검색 계획(Planner)과 안전 표준화
+- 타입/스키마(`src/types/ai.v2.types.ts`)
+  - `mode`: `rag | post`
+  - `top_k`(1..10), `threshold`(0..1), `weights`(`chunk`,`title`; 합 1로 정규화)
+  - `filters`: `{ user_id, category_ids?, post_id?, time? }`
+  - `sort`: `created_at_desc | created_at_asc`, `limit`(1..20)
+  - `hybrid`: `{ enabled, retrieval_bias, alpha?, max_rewrites, max_keywords }`
+  - `rewrites`: string[], `keywords`: string[]
+- 프롬프트 규칙(`src/prompts/qa.v2.prompts.ts`)
+  - 서버 제공 필터(`user_id`, `category_id`, `post_id`)는 고정(FIXED)이며 변경 금지.
+  - 플래너는 `top_k`, `threshold`, `filters.time`, `sort`, `limit`, `hybrid`, `rewrites`, `keywords`만 결정.
+  - 하이브리드 시 `hybrid.retrieval_bias ∈ {lexical, balanced, semantic}` 라벨을 결정.
+- 생성/정규화(`src/services/search-plan.service.ts`)
+  - OpenAI Responses(JSON Schema strict) → Zod 파싱/검증.
+  - 값 범위 강제, `weights` 합 1로 정규화.
+  - 불용어/중복 제거로 `rewrites`/`keywords` 정제.
+  - 시간 필터를 `KST 절대범위`로 변환(`src/utils/time.ts`).
+  - `retrieval_bias → alpha` 매핑(서버): `lexical→0.3`, `balanced→0.5`, `semantic→0.75`(필요 시 서버가 `alpha`를 직접 클램프/주입).
+  - `user_id/category_id/post_id`는 서버가 최종 주입(정합성 보장). 실패 시 v1 RAG 폴백.
+
+## 4) 검색 엔진: 시맨틱 + 하이브리드
+- 시맨틱(`src/services/semantic-search.service.ts` → `findSimilarChunksV2`)
+  - 입력: 질문 임베딩, `threshold/top_k/weights/sort`, 카테고리/시간 필터.
+  - 점수: `w_chunk*(1 - chunk_dist) + w_title*(1 - title_dist)`.
+  - 저장소: `postRepository.findSimilarChunksV2(...)` 호출, 파라미터 바인딩 기반 안전 쿼리.
+- 하이브리드(`src/services/hybrid-search.service.ts`)
+  - 입력: 원 질문 + `rewrites`(재작성), `keywords`(키워드), `alpha`(서버 매핑).
+  - 벡터 경로: 각 query 임베딩 → 시맨틱 후보 수집 → 동일 청크는 최대 vecScore로 병합.
+  - 텍스트 경로: `textSearchChunksV2`(키워드/질의 기반 텍스트 검색) → 동일 청크는 최대 textScore로 병합.
+  - 정규화/융합: min-max 정규화 후 `score = alpha*vec + (1-alpha)*text`로 결합 → 상위 `top_k`를 반환.
+  - 폴백: 하이브리드 결과 비었을 때 시맨틱 경로로 재시도.
+  - SSE 메타: `rewrite`, `keywords`, `hybrid_result`로 중간 산출물/요약을 별도 송신.
+
+![v2 구조](../structureDiagram/askV2-structure.png)
+
+## 5) 파이프라인(SSE 이벤트)과 모드
+- 컨트롤/오케스트레이션
+  - `src/controllers/ai.v2.controller.ts`: SSE 헤더, 스트림 라우팅.
+  - `src/services/qa.v2.service.ts`: 계획 생성 → 검색 실행(하이브리드/시맨틱) → LLM 호출까지 오케스트레이션.
+- post 모드(`post_id` 존재)
+  - 접근 정책: `post.is_public=false`면 소유자만(403), 미존재 시 404.
+  - 이벤트: `search_plan`(간략) → `search_result` → `exist_in_post_status:true` → `context` → `answer`* → `end`.
+  - 컨텍스트: 해당 글 본문만 사용(전처리 후). 하이브리드 미사용.
+- rag/하이브리드 모드
+  - 이벤트 순서(일반):
+    1) `search_plan`: 정규화된 계획(JSON). `hybrid.enabled`가 true면 `retrieval_bias`와 서버 계산 `alpha`가 함께 포함.
+    2) `rewrite`(선택): 계획의 재작성 질의 목록.
+    3) `keywords`(선택): 계획의 핵심 키워드 목록.
+    4) `hybrid_result`(선택): 하이브리드 후보 요약.
+    5) `search_result`: 최종 컨텍스트 요약.
+    6) `exist_in_post_status`: `true | false`.
+    7) `context`: `[ { postId, postTitle }, ... ]`.
+    8) `answer`*: 모델 부분 응답.
+    9) `end`: 종료 시그널(`[DONE]`).
+  - 오류 시: `error` 송신.
+
+## 6) LLM/비용/프로바이더
+- 기본 모델: `openai/gpt-5-mini`(환경변수로 변경 가능), 임베딩: `text-embedding-3-small`.
+- OpenAI: Responses API 스트리밍 우선, 실패 시 논스트림/Chat Completions 폴백.
+- Gemini: 현재 논스트림 결과를 SSE 청크로 분할 전송.
+- 비용 로깅: 프롬프트/완료 토큰 및 추정 비용 로깅(`src/llm/*`).
+- 툴콜: 컨텍스트 부족 시 `report_content_insufficient` 툴을 통해 안내 문구 유도.
+
+## 7) 보안·정합성
+- 생성 계획 방어선: JSON Schema(strict) + Zod 검증 + 서버 측 범위 강제.
+- 필터 주입: `user_id/category_id/post_id`는 서버가 최종 주입하여 일탈 방지.
+- SQL 안전성: 파라미터 바인딩/화이트리스트 템플릿.
+- 데이터 최소화: SSE에는 ID/제목 위주 요약만 전송.
+- 접근 제어: post 모드에서 `is_public`/소유자 검사.
+
+## 8) 현재 동작과 한계(향후 과제)
+- 현재
+  - `top_k`로 리콜 폭 제어, 시맨틱/하이브리드 모두 적용.
+  - `limit`은 계획에는 존재하나 컨텍스트 축약에는 미적용(향후 포스트 단위 dedupe와 함께 적용 예정).
+  - 청크 단위 랭킹 → 동일 포스트 다수 노출 가능.
+  - 카테고리 배열은 첫 항목만 반영.
+- 향후
+  - 포스트 단위 dedupe + `limit` 적용으로 다양성/비용 균형.
+  - `retrieval_bias → alpha` 매핑의 AB 테스트/텔레메트리 튜닝(예: {0.25,0.5,0.8}).
+  - 점수 정규화 고도화(z-score/quantile) 및 RRF 옵션 도입 검토.
+  - 텍스트 경로 향상(전처리/순위 함수/키워드 확장) 및 프로바이더 다변화.
+  - SSE `search_sql`/`search_debug`로 투명성 강화.
+
+## 9) 파일 맵(핵심)
+- 라우트/컨트롤러: `src/routes/ai.v2.routes.ts`, `src/controllers/ai.v2.controller.ts`, `src/app.ts`
+- 플래너/타입/프롬프트: `src/services/search-plan.service.ts`, `src/types/ai.v2.types.ts`, `src/prompts/qa.v2.prompts.ts`
+- 시간 유틸: `src/utils/time.ts`
+- 검색: `src/services/semantic-search.service.ts`, `src/services/hybrid-search.service.ts`, `src/repositories/post.repository.ts`
+- 오케스트레이션: `src/services/qa.v2.service.ts`
+- LLM: `src/llm/*`
+
+## 10) 예시
+- 요청
+```json
+{
+  "question": "지난달 프로젝트 X 관련 핵심만 3개 보여줘",
+  "user_id": "u_123",
+  "category_id": 3,
+  "llm": { "provider": "openai", "options": { "temperature": 0.2 } }
+}
+```
+- 이벤트 예시
+  - `search_plan`(rag, time=지난달, top_k/threshold/weights/sort/limit/hybrid 포함; hybrid.retrieval_bias와 서버 주입 alpha 함께 표기)
+  - `rewrite`/`keywords`(선택)
+  - `hybrid_result`(선택)
+  - `search_result` → `exist_in_post_status` → `context` → `answer`* → `end`
diff --git a/docs/structureDiagram/askV1-structure.png b/docs/structureDiagram/askV1-structure.png
new file mode 100644
index 0000000..daad8e9
Binary files /dev/null and b/docs/structureDiagram/askV1-structure.png differ
diff --git a/docs/structureDiagram/askV2-structure.png b/docs/structureDiagram/askV2-structure.png
new file mode 100644
index 0000000..c621efc
Binary files /dev/null and b/docs/structureDiagram/askV2-structure.png differ
diff --git a/package.json b/package.json
index 2911813..f4594d4 100644
--- a/package.json
+++ b/package.json
@@ -7,7 +7,8 @@
     "dev": "nodemon --watch 'src/**/*.ts' --exec 'ts-node' src/server.ts",
     "build": "tsc",
     "start": "node dist/server.js",
-    "test": "echo \"Error: no test specified\" && exit 1"
+    "test": "echo \"Error: no test specified\" && exit 1",
+    "db:migrate:pgtrgm": "psql \"$DATABASE_URL\" -f docs/migrations/2025-01-pgtrgm.sql"
   },
   "keywords": [],
   "author": "",
diff --git a/readme.md b/readme.md
index a656468..fa301f2 100644
--- a/readme.md
+++ b/readme.md
@@ -87,3 +87,45 @@ CREATE INDEX idx_post_title_embeddings_embedding
 ## 4. AI 서버 작동
 
 ### `.env` 파일 작성 (루트 디렉토리 `BUBBLOG_AI`에 위치)
+
+---
+
+## 5. 텍스트 검색 인덱스 (pg_trgm)
+
+하이브리드 검색의 키워드/부분일치 성능 향상을 위해 `pg_trgm` 확장 및 GIN 인덱스를 추가합니다.
+
+### 5.1 확장 및 인덱스 생성 스크립트
+
+프로젝트에 제공된 스크립트를 사용하세요:
+
+`docs/migrations/2025-01-pgtrgm.sql`
+
+내용:
+
+```sql
+CREATE EXTENSION IF NOT EXISTS pg_trgm;
+CREATE INDEX IF NOT EXISTS idx_pc_content_trgm ON post_chunks USING gin (content gin_trgm_ops);
+CREATE INDEX IF NOT EXISTS idx_bp_title_trgm   ON blog_post   USING gin (title   gin_trgm_ops);
+```
+
+### 5.2 적용 방법
+
+- 환경변수 `DATABASE_URL`이 설정된 경우:
+
+```bash
+psql "$DATABASE_URL" -f docs/migrations/2025-01-pgtrgm.sql
+```
+
+- 또는 npm 스크립트 사용:
+
+```bash
+npm run db:migrate:pgtrgm
+```
+
+- 또는 수동 실행(PostgreSQL 쉘 접속 후):
+
+```sql
+\i docs/migrations/2025-01-pgtrgm.sql
+```
+
+주의: 인덱스는 쓰기 비용과 디스크 사용량을 증가시킵니다. 텍스트 검색에 사용하는 컬럼(`post_chunks.content`, `blog_post.title`)에만 생성하세요.
diff --git a/src/app.ts b/src/app.ts
index 6968488..dfd79c4 100644
--- a/src/app.ts
+++ b/src/app.ts
@@ -1,6 +1,7 @@
 import express, { Express, Request, Response, NextFunction } from 'express';
 import cors from 'cors';
 import aiRouter from './routes/ai.routes';
+import aiV2Router from './routes/ai.v2.routes';
 
 const app: Express = express();
 
@@ -21,6 +22,7 @@ app.get('/', (request: Request, response: Response) => {
 });
 
 app.use('/ai', aiRouter);
+app.use('/ai/v2', aiV2Router);
 
 // Central Error Handler
 app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
diff --git a/src/config.ts b/src/config.ts
index ba81564..5bd402a 100644
--- a/src/config.ts
+++ b/src/config.ts
@@ -12,6 +12,7 @@ const configSchema = z.object({
   TOKEN_AUDIENCE: z.string().default('bubblog'),
   ALGORITHM: z.string().default('HS256'),
   EMBED_MODEL: z.string().default('text-embedding-3-small'),
+  // 기본 LLM 모델: GPT-5 계열
   CHAT_MODEL: z.string().default('gpt-5-mini'),
   GEMINI_API_KEY: z.string().optional(),
   GEMINI_CHAT_MODEL: z.string().default('gemini-2.5-flash'),
diff --git a/src/controllers/ai.controller.ts b/src/controllers/ai.controller.ts
index 1b9a593..d276f23 100644
--- a/src/controllers/ai.controller.ts
+++ b/src/controllers/ai.controller.ts
@@ -51,12 +51,46 @@ export const askHandler = async (
   try {
     const { question, user_id, category_id, speech_tone, post_id, llm } = req.body as any;
 
-    res.setHeader('Content-Type', 'text/event-stream');
-    res.setHeader('Cache-Control', 'no-cache');
+    // SSE headers and anti-buffering hints
+    res.setHeader('Content-Type', 'text/event-stream; charset=utf-8');
+    res.setHeader('Cache-Control', 'no-cache, no-transform');
     res.setHeader('Connection', 'keep-alive');
+    // Nginx buffering off
+    res.setHeader('X-Accel-Buffering', 'no');
+    // Flush headers early so clients start processing immediately
+    (res as any).flushHeaders?.();
+    // Reduce Nagle’s algorithm buffering on the socket for faster flush
+    (res.socket as any)?.setNoDelay?.(true);
+    // Prime the SSE stream to break proxy buffering thresholds
+    res.write(':ok\n\n');
 
     const stream = await answerStream(question, user_id, category_id, speech_tone, post_id, llm);
-    stream.pipe(res);
+    // Manually bridge to ensure flushing of SSE deltas
+    stream.on('data', (chunk) => {
+      const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(String(chunk));
+      res.write(buf);
+      const canFlush = typeof (res as any).flush === 'function';
+      // try to flush if supported by runtime/middleware
+      (res as any).flush?.();
+      try {
+        console.log(
+          JSON.stringify({ type: 'debug.sse.write', at: Date.now(), bytes: buf.length, flushed: canFlush })
+        );
+      } catch {}
+    });
+    stream.on('end', () => {
+      res.end();
+    });
+    stream.on('error', () => {
+      res.end();
+    });
+
+    // Cleanup if client disconnects
+    req.on('close', () => {
+      try {
+        stream.destroy();
+      } catch {}
+    });
 
   } catch (error) {
     next(error);
diff --git a/src/controllers/ai.v2.controller.ts b/src/controllers/ai.v2.controller.ts
new file mode 100644
index 0000000..ed1f579
--- /dev/null
+++ b/src/controllers/ai.v2.controller.ts
@@ -0,0 +1,29 @@
+import { Request, Response, NextFunction } from 'express';
+import { AskV2Request } from '../types/ai.v2.types';
+import { answerStreamV2 } from '../services/qa.v2.service';
+
+export const askV2Handler = async (
+  req: Request<{}, {}, AskV2Request>,
+  res: Response,
+  next: NextFunction
+) => {
+  try {
+    const { question, user_id, category_id, speech_tone, post_id, llm } = req.body as any;
+    res.setHeader('Content-Type', 'text/event-stream');
+    res.setHeader('Cache-Control', 'no-cache');
+    res.setHeader('Connection', 'keep-alive');
+
+    const stream = await answerStreamV2(
+      question,
+      user_id,
+      category_id,
+      speech_tone,
+      post_id,
+      llm
+    );
+    stream.pipe(res);
+  } catch (error) {
+    next(error);
+  }
+};
+
diff --git a/src/llm/index.ts b/src/llm/index.ts
index ba30ce4..3c67eb4 100644
--- a/src/llm/index.ts
+++ b/src/llm/index.ts
@@ -67,6 +67,13 @@ export const generate = async (req: GenerateRequest): Promise<PassThrough> => {
   let buffer = '';
   let outputText = '';
 
+  // Debug: start info
+  try {
+    console.log(
+      JSON.stringify({ type: 'debug.llm.start', provider, model, messages: (messages || []).length })
+    );
+  } catch {}
+
   const flushBuffer = () => {
     // Split by double newline to get SSE events
     const chunks = buffer.split('\n\n');
@@ -130,6 +137,11 @@ export const generate = async (req: GenerateRequest): Promise<PassThrough> => {
       };
       console.log(JSON.stringify(post));
     }
+    try {
+      console.log(
+        JSON.stringify({ type: 'debug.llm.end', provider, model, durationMs, outputChars: outputText.length })
+      );
+    } catch {}
     outer.end();
   });
   providerStream.on('error', (e) => {
@@ -138,6 +150,11 @@ export const generate = async (req: GenerateRequest): Promise<PassThrough> => {
         JSON.stringify({ type: 'llm.error', corrId, provider, model, message: (e as any)?.message || 'error' })
       );
     }
+    try {
+      console.error(
+        JSON.stringify({ type: 'debug.llm.error', provider, model, message: (e as any)?.message || 'error' })
+      );
+    } catch {}
     outer.emit('error', e);
   });
 
diff --git a/src/llm/providers/openai-responses.ts b/src/llm/providers/openai-responses.ts
index 1100db6..65cc1ec 100644
--- a/src/llm/providers/openai-responses.ts
+++ b/src/llm/providers/openai-responses.ts
@@ -7,76 +7,203 @@ const openai = new OpenAI({ apiKey: config.OPENAI_API_KEY });
 
 const toResponsesInput = (messages: OpenAIStyleMessage[] = []) => {
   // Convert simple chat-style messages to Responses API input format
+  // Responses API expects 'input_text' as the content type (not 'text').
   return messages.map((m) => ({
     role: m.role,
-    content: [{ type: 'text', text: m.content }],
+    content: [{ type: 'input_text', text: m.content }],
   }));
 };
 
+const toResponsesTools = (tools: OpenAIStyleTool[] = []) => {
+  // Map Chat Completions style tool definitions to Responses API format
+  // Chat: { type: 'function', function: { name, description, parameters } }
+  // Responses: { type: 'function', name, description, parameters }
+  return tools
+    .filter((t) => t && (t as any).type === 'function' && (t as any).function?.name)
+    .map((t) => ({
+      type: 'function',
+      name: t.function.name,
+      description: t.function.description,
+      parameters: t.function.parameters,
+    }));
+};
+
 export const generateOpenAIStream = async (req: GenerateRequest): Promise<PassThrough> => {
   const stream = new PassThrough();
+  // Guard to avoid writing after stream end
+  let closed = false;
+  const safeWrite = (chunk: string) => {
+    if (!closed && !stream.writableEnded && !stream.destroyed) {
+      stream.write(chunk);
+    }
+  };
+  const safeEnd = () => {
+    if (!closed && !stream.writableEnded && !stream.destroyed) {
+      closed = true;
+      stream.end();
+    } else {
+      closed = true;
+    }
+  };
   const model = req.model || 'gpt-5-mini';
   const messages = req.messages || [];
-  const tools = (req.tools || []) as unknown as OpenAI.Responses.ResponseCreateParams['tools'];
+  const toolsChat = (req.tools || []) as OpenAIStyleTool[];
 
   // For gpt-5-* prefer Responses API. For other models, fall back to Chat Completions streaming.
   const isGpt5Family = /(^|\b)gpt-5/i.test(model);
 
+  // Debug: basic call info
+  try {
+    console.log(
+      JSON.stringify({
+        type: 'debug.openai.start',
+        model,
+        isGpt5Family,
+        hasTools: Array.isArray(req.tools) && req.tools.length > 0,
+        options: {
+          temperature: req.options?.temperature,
+          top_p: req.options?.top_p,
+          max_output_tokens: req.options?.max_output_tokens,
+          reasoning_effort: (req as any)?.options?.reasoning_effort,
+          text_verbosity: (req as any)?.options?.text_verbosity,
+        },
+      })
+    );
+  } catch {}
+
   try {
     if (isGpt5Family) {
       // Prefer Responses API streaming for gpt-5
       try {
-        const responsesStream: any = await (openai as any).responses.stream({
+        const respParams: any = {
           model,
           input: toResponsesInput(messages) as any,
-          tools: tools as any,
-          temperature: req.options?.temperature,
-          top_p: req.options?.top_p,
+          tools: toolsChat && toolsChat.length > 0 ? toResponsesTools(toolsChat) : undefined,
           max_output_tokens: req.options?.max_output_tokens,
-        });
+        };
+        // GPT-5 family: omit temperature/top_p; allow reasoning/text controls
+        if (req.options?.reasoning_effort) {
+          respParams.reasoning = { effort: req.options.reasoning_effort };
+        } else {
+          // 기본값: 생각(추론) 강도를 최소화하여 지연을 줄임
+          respParams.reasoning = { effort: 'minimal' };
+        }
+        if (req.options?.text_verbosity) {
+          respParams.text = { verbosity: req.options.text_verbosity };
+        } else {
+          // Encourage text output on GPT-5 if not specified
+          respParams.text = { verbosity: 'low' };
+        }
+        try {
+          console.log(
+            JSON.stringify({ type: 'debug.openai.path', path: 'responses.stream', paramsKeys: Object.keys(respParams) })
+          );
+        } catch {}
+        const responsesStream: any = await (openai as any).responses.stream(respParams);
 
-        responsesStream.on('response.output_text.delta', (delta: string) => {
-          if (delta) {
-            stream.write(`event: answer\n`);
-            stream.write(`data: ${JSON.stringify(delta)}\n\n`);
+        // let loggedFirstDelta = false;
+        responsesStream.on('response.output_text.delta', (ev: any) => {
+          const text = typeof ev === 'string' ? ev : ev?.delta ?? '';
+          if (text) {
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(text)}\n\n`);
+            // try { console.log(JSON.stringify({ type: 'debug.openai.delta', len: String(text).length, at: Date.now() })); } catch {}
+            // if (!loggedFirstDelta) {
+              // try { console.log(JSON.stringify({ type: 'debug.openai.delta', len: String(text).length })); } catch {}
+              // loggedFirstDelta = true;
+            // }
           }
         });
 
         // Stream tool-call arguments as answer chunks to maintain SSE shape
-        responsesStream.on('response.tool_call.delta', (toolDelta: any) => {
-          const argsDelta = toolDelta?.arguments_delta || toolDelta?.arguments || '';
+        responsesStream.on('response.tool_call.delta', (ev: any) => {
+          const argsDelta = ev?.arguments_delta || ev?.arguments || ev?.delta || '';
           if (argsDelta) {
-            stream.write(`event: answer\n`);
-            stream.write(`data: ${JSON.stringify(argsDelta)}\n\n`);
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(argsDelta)}\n\n`);
+          }
+        });
+        // Also handle non-delta tool_call events
+        responsesStream.on('response.tool_call', (ev: any) => {
+          const args = ev?.arguments || ev?.arguments_delta || '';
+          if (args) {
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(args)}\n\n`);
+          }
+        });
+
+        // Catch-all messages to ensure we don't miss alternative text events
+        responsesStream.on('message', (msg: any) => {
+          try {
+            const m = typeof msg === 'string' ? JSON.parse(msg) : msg;
+            if (!m) return;
+            // Prefer explicit output_text delta
+            if (m.type === 'response.output_text.delta' && m.delta) {
+              safeWrite(`event: answer\n`);
+              safeWrite(`data: ${JSON.stringify(m.delta)}\n\n`);
+            }
+            // Some SDKs may emit full output_text chunk at once
+            else if (m.type === 'response.output_text' && typeof m.text === 'string') {
+              safeWrite(`event: answer\n`);
+              safeWrite(`data: ${JSON.stringify(m.text)}\n\n`);
+            }
+            // Generic delta fallback
+            else if (m.type === 'response.delta' && typeof m.delta === 'string') {
+              safeWrite(`event: answer\n`);
+              safeWrite(`data: ${JSON.stringify(m.delta)}\n\n`);
+            }
+            // Log for visibility
+            console.log(
+              JSON.stringify({ type: 'debug.openai.msg', mtype: m.type, keys: Object.keys(m || {}) })
+            );
+          } catch (e) {
+            try { console.log(JSON.stringify({ type: 'debug.openai.msg_parse_error' })); } catch {}
           }
         });
 
         responsesStream.on('response.completed', () => {
-          stream.write(`event: end\n`);
-          stream.write(`data: [DONE]\n\n`);
-          stream.end();
+          safeWrite(`event: end\n`);
+          safeWrite(`data: [DONE]\n\n`);
+          safeEnd();
+          try { console.log(JSON.stringify({ type: 'debug.openai.completed' })); } catch {}
         });
 
         responsesStream.on('error', (e: any) => {
-          stream.write(`event: error\n`);
-          stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
-          stream.end();
+          safeWrite(`event: error\n`);
+          safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+          safeEnd();
+          try {
+            console.error(
+              JSON.stringify({ type: 'debug.openai.error', path: 'responses.stream', message: (e as any)?.message })
+            );
+          } catch {}
         });
 
-        // Ensure the stream starts and we await its completion
-        await responsesStream.done();
+        // Do not await completion here; return immediately so callers can consume deltas in real-time
+        (async () => {
+          try {
+            await responsesStream.done();
+          } catch {}
+        })();
         return stream;
       } catch (e) {
         // Fallback to non-streaming Responses if streaming path fails
         try {
-          const response = await openai.responses.create({
+          const createParams: any = {
             model,
             input: toResponsesInput(messages) as any,
             // Avoid tools in non-streaming mode to ensure text output
-            temperature: req.options?.temperature,
-            top_p: req.options?.top_p,
             max_output_tokens: req.options?.max_output_tokens,
-          });
+          };
+          if (req.options?.reasoning_effort) createParams.reasoning = { effort: req.options.reasoning_effort };
+          else createParams.reasoning = { effort: 'low' };
+          if (req.options?.text_verbosity) createParams.text = { verbosity: req.options.text_verbosity };
+          try {
+            console.log(
+              JSON.stringify({ type: 'debug.openai.path', path: 'responses.create', paramsKeys: Object.keys(createParams) })
+            );
+          } catch {}
+          const response = await openai.responses.create(createParams);
           const text = (response as any).output_text ?? '';
           const answerText = typeof text === 'string' ? text : '';
           const fallbackText = (() => {
@@ -99,60 +226,95 @@ export const generateOpenAIStream = async (req: GenerateRequest): Promise<PassTh
           const chunkSize = 400;
           for (let i = 0; i < finalText.length; i += chunkSize) {
             const chunk = finalText.slice(i, i + chunkSize);
-            stream.write(`event: answer\n`);
-            stream.write(`data: ${JSON.stringify(chunk)}\n\n`);
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(chunk)}\n\n`);
           }
-          stream.write(`event: end\n`);
-          stream.write(`data: [DONE]\n\n`);
-          stream.end();
+          safeWrite(`event: end\n`);
+          safeWrite(`data: [DONE]\n\n`);
+          safeEnd();
           return stream;
         } catch (e2) {
           // fall through to chat completions streaming below
+          try {
+            console.error(
+              JSON.stringify({ type: 'debug.openai.error', path: 'responses.create', message: (e2 as any)?.message })
+            );
+          } catch {}
         }
       }
     }
 
     // Chat Completions streaming as universal fallback
-    const chatStream = await openai.chat.completions.create({
-      model,
-      messages: messages as any,
-      tools: (req.tools as OpenAI.Chat.Completions.ChatCompletionTool[]) || undefined,
-      tool_choice: req.tools && req.tools.length > 0 ? 'auto' : undefined,
-      stream: true,
-      temperature: req.options?.temperature,
-      top_p: req.options?.top_p,
-      max_tokens: req.options?.max_output_tokens as any,
-    });
-
-    for await (const chunk of chatStream) {
-      const content = chunk.choices[0]?.delta?.content || '';
-      const toolCalls = chunk.choices[0]?.delta?.tool_calls;
-
-      if (toolCalls) {
-        for (const toolCall of toolCalls) {
-          if (toolCall.function?.arguments) {
-            stream.write(`event: answer\n`);
-            stream.write(`data: ${JSON.stringify(toolCall.function.arguments)}\n\n`);
+    try { console.log(JSON.stringify({ type: 'debug.openai.path', path: 'chat.completions.stream' })); } catch {}
+    let chatStream: any;
+
+    // temperature/top_p are not supported on reasoning models (e.g., GPT-5 family)
+    if (!isGpt5Family) {
+      chatStream = await openai.chat.completions.create({
+        model,
+        messages: messages as any,
+        tools: (req.tools as OpenAI.Chat.Completions.ChatCompletionTool[]) || undefined,
+        tool_choice: req.tools && req.tools.length > 0 ? 'auto' : undefined,
+        stream: true,
+        temperature: req.options?.temperature,
+        top_p: req.options?.top_p,
+        max_tokens: req.options?.max_output_tokens as any,
+      });
+    } else {
+      chatStream = await openai.chat.completions.create({
+        model,
+        messages: messages as any,
+        tools: (req.tools as OpenAI.Chat.Completions.ChatCompletionTool[]) || undefined,
+        tool_choice: req.tools && req.tools.length > 0 ? 'auto' : undefined,
+        stream: true,
+        max_tokens: req.options?.max_output_tokens as any,
+      });
+    }
+    
+    // Iterate asynchronously; return stream immediately to allow real-time consumption
+    (async () => {
+      try {
+        for await (const chunk of chatStream) {
+          const content = chunk.choices[0]?.delta?.content || '';
+          const toolCalls = chunk.choices[0]?.delta?.tool_calls;
+
+          if (toolCalls) {
+            for (const toolCall of toolCalls) {
+              if (toolCall.function?.arguments) {
+                safeWrite(`event: answer\n`);
+                safeWrite(`data: ${JSON.stringify(toolCall.function.arguments)}\n\n`);
+              }
+            }
+          } else if (content) {
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(content)}\n\n`);
           }
-        }
-      } else if (content) {
-        stream.write(`event: answer\n`);
-        stream.write(`data: ${JSON.stringify(content)}\n\n`);
-      }
 
-      if (chunk.choices[0]?.finish_reason) {
-        stream.write(`event: end\n`);
-        stream.write(`data: [DONE]\n\n`);
-        stream.end();
-        break;
+          if (chunk.choices[0]?.finish_reason) {
+            safeWrite(`event: end\n`);
+            safeWrite(`data: [DONE]\n\n`);
+            safeEnd();
+            try { console.log(JSON.stringify({ type: 'debug.openai.completed', path: 'chat.completions.stream' })); } catch {}
+            break;
+          }
+        }
+      } catch (e) {
+        safeWrite(`event: error\n`);
+        safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+        safeEnd();
       }
-    }
+    })();
 
     return stream;
   } catch (err) {
-    stream.write(`event: error\n`);
-    stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
-    stream.end();
+    safeWrite(`event: error\n`);
+    safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+    safeEnd();
+    try {
+      console.error(
+        JSON.stringify({ type: 'debug.openai.error', path: 'top', message: (err as any)?.message, model, isGpt5Family })
+      );
+    } catch {}
     return stream;
   }
 };
diff --git a/src/llm/types.ts b/src/llm/types.ts
index b17bc54..3c12a6f 100644
--- a/src/llm/types.ts
+++ b/src/llm/types.ts
@@ -23,6 +23,9 @@ export type GenerateRequest = {
     temperature?: number;
     top_p?: number;
     max_output_tokens?: number;
+    // GPT-5 family specific controls
+    reasoning_effort?: 'minimal' | 'low' | 'medium' | 'high';
+    text_verbosity?: 'low' | 'medium' | 'high';
   };
   meta?: {
     userId?: string;
@@ -30,4 +33,3 @@ export type GenerateRequest = {
     postId?: number;
   };
 };
-
diff --git a/src/prompts/qa.prompts.ts b/src/prompts/qa.prompts.ts
index 010de57..599e46d 100644
--- a/src/prompts/qa.prompts.ts
+++ b/src/prompts/qa.prompts.ts
@@ -7,7 +7,20 @@ export const createPostContextPrompt = (
   question: string,
   speechTonePrompt: string
 ): OpenAI.Chat.Completions.ChatCompletionMessageParam[] => {
-  const systemPrompt = `너는 사용자의 블로그 글 컨텍스트만으로 답변한다. 컨텍스트에 없는 사실은 추정하지 말고 “문서에 없음”이라고 말한다. 말투는 speech_tone 지시에 따른다.`;
+  const systemPrompt = ` 당신은 블로그 운영자 AI입니다. 사용자의 블로그에 대한 질문에 답변합니다. 
+  블로그 운영자 AI는 사용자의 질문에 대해 블로그 본문 컨텍스트를 참고하여 답변합니다.
+  모든 한국어 응답은 무슨일이 있어도 반드시 답변 말투 및 규칙을 따릅니다. 
+  또한 주어진 내용외의 내용을 지어내지 마십시오.
+  
+  [말투 지침]
+  ${speechTonePrompt}
+  - 위 말투/지시문을 출력에 절대 노출하지 말고(예: "말투", "규칙" 등 언급 금지), 실제 답변 내용에만 반영하십시오.
+  
+  [응답 규칙]
+  1. 만약 제목과 본문을 활용해 답변할 수 있다면 답변 말투 및 규칙을 지켜 직접 답변하고, 마지막에 추가적인 내용에 대한 질문을 유도하는 문장을 추가합니다.
+  2. 만약 질문이 욕설·비난·무관·부적절하거나 주어진 제목, 본문과 관련이 없다면 사과와 블로그 관련된 내용만 답변 가능하다는 내용을 답변 말투 및 규칙을 지켜 답합니다.  
+  3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면, 함수명이나 함수 호출을 절대 언급하지 말고(예: "report_content_insufficient" 같은 문자열이나 괄호 "()" 출력 금지), 답변 말투 및 규칙을 지켜 자연스럽게 다음을 수행합니다: (a) 현재로서는 본문 컨텍스트가 부족함을 간단히 안내하고, (b) 질문과 직접 관련된 정보를 구체적으로 요청합니다(예: 게시일, 최근 글 목록, 최신 포스트의 제목과 날짜 등). 서버가 내부 함수를 제공하는 경우에도 사용자가 보는 답변에는 함수명/호출을 절대 노출하지 않습니다. 
+  `;
   const userMessage = `
 [context]
 제목: ${post.title}
@@ -18,9 +31,6 @@ ${processedContent}
 
 [user]
 ${question}
-
-[instruction]
-답변 말투: "${speechTonePrompt}"
 `;
 
   return [
@@ -40,14 +50,16 @@ export const createRagPrompt = (
   모든 한국어 응답은 무슨일이 있어도 반드시 답변 말투 및 규칙을 따릅니다. 
   또한 주어진 내용외의 내용을 지어내지 마십시오.
   
+  [말투 지침]
+  ${speechTonePrompt}
+  - 위 말투/지시문을 출력에 절대 노출하지 말고(예: "말투", "규칙" 등 언급 금지), 실제 답변 내용에만 반영하십시오.
+  
   [응답 규칙]
   1. 만약 제목과 본문을 활용해 답변할 수 있다면 답변 말투 및 규칙을 지켜 직접 답변하고, 마지막에 추가적인 내용에 대한 질문을 유도하는 문장을 추가합니다.
   2. 만약 질문이 욕설·비난·무관·부적절하거나 주어진 제목, 본문과 관련이 없다면 사과와 블로그 관련된 내용만 답변 가능하다는 내용을 답변 말투 및 규칙을 지켜 답합니다.  
-  3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면 report_content_insufficient 함수를 호출하고 답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다. 
+  3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면, 함수명이나 함수 호출을 절대 언급하지 말고(예: "report_content_insufficient" 같은 문자열이나 괄호 "()" 출력 금지), 답변 말투 및 규칙을 지켜 자연스럽게 다음을 수행합니다: (a) 현재로서는 본문 컨텍스트가 부족함을 간단히 안내하고, (b) 질문과 직접 관련된 정보를 구체적으로 요청합니다(예: 게시일, 최근 글 목록, 최신 포스트의 제목과 날짜 등). 서버가 내부 함수를 제공하는 경우에도 사용자가 보는 답변에는 함수명/호출을 절대 노출하지 않습니다. 
   `;
   const userMessage = `
-    답변 말투 및 규칙: "${speechTonePrompt}"
-    반드시 말투 및 규칙에 따라 대답하세요!
     아래의 질문과 블로그 본문 컨텍스트를 참고하여 답변하세요.
     사용자의 질문: ${question}
     가장 근접한 블로그 본문 컨텍스트:
diff --git a/src/prompts/qa.v2.prompts.ts b/src/prompts/qa.v2.prompts.ts
new file mode 100644
index 0000000..a752324
--- /dev/null
+++ b/src/prompts/qa.v2.prompts.ts
@@ -0,0 +1,117 @@
+// Search Plan prompt templates for v2
+
+import { SearchPlan } from '../types/ai.v2.types';
+
+export const getSearchPlanSchemaJson = (): Record<string, unknown> => ({
+  type: 'object',
+  additionalProperties: false,
+  properties: {
+    mode: { enum: ['rag', 'post'] },
+    top_k: { type: 'integer', minimum: 1, maximum: 10 },
+    threshold: { type: 'number', minimum: 0, maximum: 1 },
+    weights: {
+      type: 'object',
+      additionalProperties: false,
+      properties: {
+        chunk: { type: 'number', minimum: 0, maximum: 1 },
+        title: { type: 'number', minimum: 0, maximum: 1 },
+      },
+      required: ['chunk', 'title'],
+    },
+    rewrites: { type: 'array', items: { type: 'string' } },
+    keywords: { type: 'array', items: { type: 'string' } },
+    hybrid: {
+      type: 'object',
+      additionalProperties: false,
+      properties: {
+        enabled: { type: 'boolean' },
+        retrieval_bias: { enum: ['lexical', 'balanced', 'semantic'] },
+        max_rewrites: { type: 'integer', minimum: 0, maximum: 4 },
+        max_keywords: { type: 'integer', minimum: 0, maximum: 8 },
+      },
+      required: ['enabled', 'retrieval_bias', 'max_rewrites', 'max_keywords'],
+    },
+    // Only time is allowed under filters. Responses JSON Schema requires closed objects
+    // with explicit required fields at each level. We constrain time to label-form only.
+    filters: {
+      type: 'object',
+      additionalProperties: false,
+      properties: {
+        time: {
+          type: 'object',
+          additionalProperties: false,
+          properties: {
+            type: { type: 'string', enum: ['label'] },
+            label: { type: 'string', minLength: 1 },
+          },
+          required: ['type', 'label'],
+        },
+      },
+      required: ['time'],
+    },
+    sort: { enum: ['created_at_desc', 'created_at_asc'] },
+    limit: { type: 'integer', minimum: 1, maximum: 20 },
+  },
+  required: ['mode', 'top_k', 'threshold', 'weights', 'rewrites', 'keywords', 'hybrid', 'filters', 'sort', 'limit'],
+});
+
+export const buildSearchPlanPrompt = (params: {
+  now_utc: string;
+  now_kst: string;
+  timezone: string;
+  user_id: string;
+  category_id?: number;
+  post_id?: number;
+  defaults?: Partial<SearchPlan>;
+  question: string;
+}): string => {
+  const defaults = JSON.stringify(
+    params.defaults || {
+      top_k: 5,
+      threshold: 0.2,
+      weights: { chunk: 0.7, title: 0.3 },
+      sort: 'created_at_desc',
+      limit: 5,
+    },
+  );
+
+  const schemaHint = JSON.stringify(getSearchPlanSchemaJson());
+
+  return [
+    'You are a Search Plan Generator for a Korean blogging platform.',
+    'Your task is to read the user question and output ONLY a JSON object that defines a safe search plan.',
+    '',
+    `now_utc: ${params.now_utc}`,
+    `now_kst: ${params.now_kst}`,
+    `timezone: ${params.timezone}`,
+    `user_id: ${params.user_id}`,
+    `category_id: ${params.category_id ?? ''}`,
+    `post_id: ${params.post_id ?? ''}`,
+    '',
+    'Rules:',
+    '1) Output ONLY a single JSON object matching the schema. No extra text.',
+    '2) Respect bounds: top_k 1..10, limit 1..20, threshold 0..1, weights in [0,1]. The server normalizes their sum.',
+    '3) Do NOT output any filters other than filters.time. The server injects user_id/category_id/post_id.',
+    '   - Your job: decide top_k, threshold, sort, limit; and optionally rewrites/keywords/hybrid only.',
+    '4) Mode must follow constraints: if post_id exists, use mode="post"; otherwise use mode="rag".',
+    '5) Time MUST be provided via a label only: filters.time = { "type": "label", "label": "..." }',
+    '   - Allowed labels (examples): "all_time"(no filter), "today", "yesterday", "last_7_days", "last_14_days", "last_30_days", "this_month", "last_month",',
+    '     or structured: "2006_to_now", "2019-2022", "2024-Q3", "Q3-2024", "2024-09", "2024".',
+    '   - For queries like "최근 글", prefer a short window label such as "last_7_days" or "last_30_days" (choose appropriately).',
+    '6) Do NOT include any temporal words or ranges inside rewrites/keywords. Temporal intent must live ONLY in filters.time.',
+    '7) If the question asks for N items (e.g., “N개”), set limit=N within bounds.',
+    '8) Keep weights to defaults unless a clear need implies otherwise.',
+    '8) When helpful for recall, set hybrid.enabled=true and choose hybrid.retrieval_bias ∈ {lexical, balanced, semantic}. Then generate concise rewrites (<= max_rewrites) and focused keywords (<= max_keywords).',
+    '   - lexical: keyword/정확 매칭이 중요할 때 (숫자, 버전, 고유명사 등).',
+    '   - balanced: 일반 질의.',
+    '   - semantic: 개념/요약/의도 중심일 때.',
+    '9) Avoid stop/common words (예: "글", "포스트", "블로그", "소개", "정리"). Keep within the user context; avoid over-broad topics or time spans.',
+    '10) Remove near-duplicates: if rewrites/keywords are synonymous or highly similar, include only one.',
+    '',
+    `Schema: ${schemaHint}`,
+    '',
+    `Question (Korean):\n${params.question}`,
+    '',
+    'Respond with ONLY the JSON object. No markdown, no explanation.',
+  ].join('\n');
+};
diff --git a/src/repositories/post.repository.ts b/src/repositories/post.repository.ts
index 545dc57..ffc17c0 100644
--- a/src/repositories/post.repository.ts
+++ b/src/repositories/post.repository.ts
@@ -9,6 +9,7 @@ export interface Post {
   tags: string[];
   created_at: string;
   user_id: string;
+  is_public:boolean;
 }
 
 export interface SimilarChunk {
@@ -18,14 +19,36 @@ export interface SimilarChunk {
   similarityScore: number;
 }
 
+export interface TextSearchHit {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  textScore: number;
+}
+
 // ========= READ QUERIES =========
 export const findPostById = async (postId: number): Promise<Post | null> => {
   const pool = getDb();
-  const { rows } = await pool.query<Post>(
-    'SELECT id, title, content, tags, created_at, user_id FROM blog_post WHERE id = $1',
+  // Some databases may not have a `tags` column on blog_post.
+  // Select existing columns and populate `tags` as an empty array fallback.
+  const { rows } = await pool.query(
+    'SELECT id, title, content, created_at, user_id, is_public FROM blog_post WHERE id = $1',
     [postId]
   );
-  return rows.length > 0 ? rows[0] : null;
+  if (rows.length === 0) return null;
+
+  const row = rows[0] as any;
+  const post: Post = {
+    id: row.id,
+    title: row.title,
+    content: row.content,
+    // Fallback: DB has no tags column; keep empty list so prompts render gracefully
+    tags: Array.isArray(row.tags) ? row.tags : [],
+    created_at: row.created_at,
+    user_id: row.user_id,
+    is_public: row.is_public,
+  };
+  return post;
 };
 
 export const findSimilarChunks = async (
@@ -131,4 +154,208 @@ export const storeContentEmbeddings = async (
     await pool.query('ROLLBACK');
     throw error;
   }
-};
\ No newline at end of file
+};
+
+// ========= READ QUERIES (V2 dynamic) =========
+export const findSimilarChunksV2 = async (params: {
+  userId: string;
+  embedding: number[];
+  categoryId?: number;
+  from?: string; // ISO UTC
+  to?: string;   // ISO UTC
+  threshold?: number; // 0..1
+  topK?: number; // default 5, max 10
+  weights?: { chunk: number; title: number };
+  sort?: 'created_at_desc' | 'created_at_asc';
+}): Promise<SimilarChunk[]> => {
+  const pool = getDb();
+  const wChunk = Math.max(0, Math.min(1, params.weights?.chunk ?? 0.7));
+  const wTitle = Math.max(0, Math.min(1, params.weights?.title ?? 0.3));
+  const thr = params.threshold != null ? Math.max(0, Math.min(1, params.threshold)) : 0.2;
+  const limit = Math.min(10, Math.max(1, params.topK ?? 5));
+
+  const parts: string[] = [];
+  const values: any[] = [];
+
+  // $1: userId, $2: embedding
+  values.push(params.userId);
+  values.push(pgvector.toSql(params.embedding));
+
+  const hasCategory = typeof params.categoryId === 'number';
+  const hasTime = !!(params.from && params.to);
+
+  if (hasCategory) {
+    const catParam = values.length + 1; // next index
+    parts.push(`
+      WITH category_ids AS (
+        SELECT DISTINCT cc.descendant_id
+        FROM category_closure cc
+        WHERE cc.ancestor_id = $${catParam}
+      ),
+      filtered_posts AS (
+        SELECT bp.id AS post_id, bp.title AS post_title, bp.created_at
+        FROM blog_post bp
+        WHERE bp.user_id = $1 AND bp.category_id IN (SELECT descendant_id FROM category_ids)
+      )`);
+    values.push(params.categoryId);
+  } else {
+    parts.push(`
+      WITH filtered_posts AS (
+        SELECT id AS post_id, title AS post_title, created_at
+        FROM blog_post
+        WHERE user_id = $1
+      )`);
+  }
+
+  // base select and threshold
+  const thrParam = values.length + 1;
+  parts.push(`
+    SELECT
+      fp.post_id,
+      fp.post_title,
+      pc.content AS post_chunk,
+      (${wChunk} * (1.0 - (pc.embedding <=> $2))) + (${wTitle} * (1.0 - (pte.embedding <=> $2))) AS similarity_score,
+      fp.created_at
+    FROM filtered_posts fp
+    JOIN post_chunks pc ON pc.post_id = fp.post_id
+    JOIN post_title_embeddings pte ON pte.post_id = fp.post_id
+    WHERE (1.0 - (pc.embedding <=> $2)) > $${thrParam}
+  `);
+  values.push(thr);
+
+  if (hasTime) {
+    const fromParam = values.length + 1;
+    const toParam = values.length + 2;
+    parts.push(` AND fp.created_at BETWEEN $${fromParam} AND $${toParam}`);
+    values.push(params.from, params.to);
+  }
+
+  let orderBy = 'similarity_score DESC';
+  if (params.sort === 'created_at_desc') orderBy = 'similarity_score DESC, fp.created_at DESC';
+  if (params.sort === 'created_at_asc') orderBy = 'similarity_score DESC, fp.created_at ASC';
+
+  const limitParam = values.length + 1;
+  parts.push(` ORDER BY ${orderBy} LIMIT $${limitParam}`);
+  values.push(limit);
+
+  const sql = parts.join('\n');
+
+  const { rows } = await pool.query(sql, values);
+  return rows.map((row) => ({
+    postId: row.post_id,
+    postTitle: row.post_title,
+    postChunk: row.post_chunk,
+    similarityScore: row.similarity_score,
+  }));
+};
+
+export const textSearchChunksV2 = async (params: {
+  userId: string;
+  query?: string;
+  keywords?: string[];
+  categoryId?: number;
+  from?: string;
+  to?: string;
+  topK?: number;
+  sort?: 'created_at_desc' | 'created_at_asc';
+}): Promise<TextSearchHit[]> => {
+  const pool = getDb();
+  const limit = Math.min(10, Math.max(1, params.topK ?? 5));
+
+  const values: any[] = [];
+  values.push(params.userId);
+
+  const hasCategory = typeof params.categoryId === 'number';
+  const hasTime = !!(params.from && params.to);
+  const hasQuery = !!params.query && params.query.trim().length > 0;
+  const keywords = (params.keywords || []).filter((k) => typeof k === 'string' && k.trim().length > 0);
+
+  const withParts: string[] = [];
+  if (hasCategory) {
+    const catParam = values.length + 1;
+    withParts.push(`
+      category_ids AS (
+        SELECT DISTINCT cc.descendant_id FROM category_closure cc WHERE cc.ancestor_id = $${catParam}
+      ),
+      filtered_posts AS (
+        SELECT bp.id AS post_id, bp.title AS post_title, bp.created_at
+        FROM blog_post bp
+        WHERE bp.user_id = $1 AND bp.category_id IN (SELECT descendant_id FROM category_ids)
+      )`);
+    values.push(params.categoryId);
+  } else {
+    withParts.push(`
+      filtered_posts AS (
+        SELECT id AS post_id, title AS post_title, created_at
+        FROM blog_post
+        WHERE user_id = $1
+      )`);
+  }
+
+  let base = `
+    SELECT
+      fp.post_id,
+      fp.post_title,
+      pc.content AS post_chunk,
+      0::float8 AS content_sim,
+      0::float8 AS title_sim,
+      fp.created_at
+    FROM filtered_posts fp
+    JOIN post_chunks pc ON pc.post_id = fp.post_id
+  `;
+  if (hasQuery) {
+    const qParam = values.length + 1;
+    base = `
+      SELECT
+        fp.post_id,
+        fp.post_title,
+        pc.content AS post_chunk,
+        COALESCE(similarity(pc.content, $${qParam}), 0) AS content_sim,
+        COALESCE(similarity(fp.post_title, $${qParam}), 0) AS title_sim,
+        fp.created_at
+      FROM filtered_posts fp
+      JOIN post_chunks pc ON pc.post_id = fp.post_id
+    `;
+    values.push(params.query);
+  }
+
+  const whereParts: string[] = [];
+  if (hasTime) {
+    const fromParam = values.length + 1;
+    const toParam = values.length + 2;
+    whereParts.push(`fp.created_at BETWEEN $${fromParam} AND $${toParam}`);
+    values.push(params.from, params.to);
+  }
+
+  const likePatterns: string[] = [];
+  for (const k of keywords) {
+    likePatterns.push(`%${k}%`);
+  }
+  if (likePatterns.length > 0) {
+    const arrParam = values.length + 1;
+    whereParts.push(`(pc.content ILIKE ANY($${arrParam}) OR fp.post_title ILIKE ANY($${arrParam}))`);
+    values.push(likePatterns);
+  }
+
+  const whereSql = whereParts.length > 0 ? `WHERE ${whereParts.join(' AND ')}` : '';
+
+  let orderBy = 'content_sim DESC';
+  if (params.sort === 'created_at_desc') orderBy = 'content_sim DESC, fp.created_at DESC';
+  if (params.sort === 'created_at_asc') orderBy = 'content_sim DESC, fp.created_at ASC';
+
+  const limitParam = values.length + 1;
+  const sql = `${withParts.length > 0 ? 'WITH ' + withParts.join(',\n') : ''}
+${base}
+${whereSql}
+ORDER BY ${orderBy}
+LIMIT $${limitParam}`;
+  values.push(limit);
+
+  const { rows } = await pool.query(sql, values);
+  return rows.map((row) => ({
+    postId: row.post_id,
+    postTitle: row.post_title,
+    postChunk: row.post_chunk,
+    textScore: Math.max(Number(row.content_sim) || 0, Number(row.title_sim) || 0),
+  }));
+};
diff --git a/src/routes/ai.v2.routes.ts b/src/routes/ai.v2.routes.ts
new file mode 100644
index 0000000..2558023
--- /dev/null
+++ b/src/routes/ai.v2.routes.ts
@@ -0,0 +1,14 @@
+import { Router } from 'express';
+import { askV2Handler } from '../controllers/ai.v2.controller';
+import { authMiddleware } from '../middlewares/auth.middleware';
+
+const aiV2Router = Router();
+
+aiV2Router.get('/health', (req, res) => {
+  res.status(200).json({ status: 'ok', v: 'v2' });
+});
+
+aiV2Router.post('/ask', authMiddleware, askV2Handler);
+
+export default aiV2Router;
+
diff --git a/src/services/hybrid-search.service.ts b/src/services/hybrid-search.service.ts
new file mode 100644
index 0000000..592e723
--- /dev/null
+++ b/src/services/hybrid-search.service.ts
@@ -0,0 +1,104 @@
+import { createEmbeddings } from './embedding.service';
+import * as postRepository from '../repositories/post.repository';
+import { SearchPlan } from '../types/ai.v2.types';
+
+export type HybridSearchResult = {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  similarityScore: number;
+}[];
+
+type Candidate = {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  vecScore?: number;
+  textScore?: number;
+};
+
+export const runHybridSearch = async (
+  question: string,
+  userId: string,
+  plan: SearchPlan
+): Promise<HybridSearchResult> => {
+  const queries = [question, ...((plan.rewrites as string[]) || [])];
+  const alpha = Math.max(0, Math.min(1, (plan.hybrid as any)?.alpha ?? 0.7));
+
+  const from = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.from : undefined;
+  const to = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.to : undefined;
+  const categoryId = (plan.filters as any)?.category_ids?.[0];
+
+  const embeddings = await createEmbeddings(queries);
+
+  const byKey = new Map<string, Candidate>();
+
+  for (let i = 0; i < embeddings.length; i++) {
+    const emb = embeddings[i];
+    const rows = await postRepository.findSimilarChunksV2({
+      userId,
+      embedding: emb,
+      categoryId,
+      from,
+      to,
+      threshold: plan.threshold,
+      topK: plan.top_k,
+      weights: plan.weights,
+      sort: plan.sort,
+    });
+    for (const r of rows) {
+      const key = `${r.postId}:${r.postChunk}`;
+      const prev = byKey.get(key);
+      const curVec = Number(r.similarityScore) || 0;
+      if (!prev) {
+        byKey.set(key, { postId: r.postId, postTitle: r.postTitle, postChunk: r.postChunk, vecScore: curVec });
+      } else {
+        prev.vecScore = Math.max(prev.vecScore || 0, curVec);
+      }
+    }
+  }
+
+  const textRows = await postRepository.textSearchChunksV2({
+    userId,
+    query: question,
+    keywords: (plan.keywords as string[]) || [],
+    categoryId,
+    from,
+    to,
+    topK: plan.top_k,
+    sort: plan.sort,
+  });
+  for (const r of textRows) {
+    const key = `${r.postId}:${r.postChunk}`;
+    const prev = byKey.get(key);
+    const curText = Number(r.textScore) || 0;
+    if (!prev) {
+      byKey.set(key, { postId: r.postId, postTitle: r.postTitle, postChunk: r.postChunk, textScore: curText });
+    } else {
+      prev.textScore = Math.max(prev.textScore || 0, curText);
+    }
+  }
+
+  const list = Array.from(byKey.values());
+  const vecVals = list.map((c) => c.vecScore || 0);
+  const textVals = list.map((c) => c.textScore || 0);
+  const vMin = Math.min(...vecVals, 0);
+  const vMax = Math.max(...vecVals, 0);
+  const tMin = Math.min(...textVals, 0);
+  const tMax = Math.max(...textVals, 0);
+
+  const norm = (v: number, lo: number, hi: number) => (hi > lo ? (v - lo) / (hi - lo) : 0);
+
+  const fused = list
+    .map((c) => {
+      const v = norm(c.vecScore || 0, vMin, vMax);
+      const t = norm(c.textScore || 0, tMin, tMax);
+      const score = alpha * v + (1 - alpha) * t;
+      return { postId: c.postId, postTitle: c.postTitle, postChunk: c.postChunk, similarityScore: score };
+    })
+    .sort((a, b) => b.similarityScore - a.similarityScore)
+    .slice(0, Math.min(10, Math.max(1, plan.top_k || 5)));
+
+  return fused;
+};
+
diff --git a/src/services/qa.service.ts b/src/services/qa.service.ts
index 549a919..e0b308f 100644
--- a/src/services/qa.service.ts
+++ b/src/services/qa.service.ts
@@ -38,6 +38,11 @@ export const answerStream = async (
   llm?: LlmOverride
 ): Promise<PassThrough> => {
   const stream = new PassThrough();
+  try {
+    console.log(
+      JSON.stringify({ type: 'debug.qa.start', questionLen: question?.length || 0, userId, categoryId, postId, speechTone, llm })
+    );
+  } catch {}
 
   let messages: { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] = [];
   let tools:
@@ -64,18 +69,27 @@ export const answerStream = async (
       if (!post) {
         stream.write(`event: error\ndata: ${JSON.stringify({ code: 404, message: 'Post not found' })}\n\n`);
         stream.end();
+        try { console.warn(JSON.stringify({ type: 'debug.qa.post', status: 'not_found', postId })); } catch {}
         return;
       }
-
-      if (post.user_id !== userId) {
-         stream.write(`event: error\ndata: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`);
-         stream.end();
-         return;
+      
+      // Enforce conditional ownership: if post is not public, require owner
+      if (!post.is_public && post.user_id !== userId) {
+        stream.write(`event: error\n`);
+        stream.write(`data: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`);
+        stream.end();
+        try { console.warn(JSON.stringify({ type: 'debug.qa.post', status: 'forbidden', postId })); } catch {}
+        return;
       }
 
       const processedContent = preprocessContent(post.content);
       stream.write(`event: exist_in_post_status\ndata: true\n\n`);
       stream.write(`event: context\ndata: ${JSON.stringify([{ postId: post.id, postTitle: post.title }])}\n\n`);
+      try {
+        console.log(
+          JSON.stringify({ type: 'debug.qa.path', mode: 'post', postId: post.id, processedLen: processedContent.length })
+        );
+      } catch {}
 
       messages = toSimpleMessages(
         qaPrompts.createPostContextPrompt(post, processedContent, question, speechTonePrompt)
@@ -90,6 +104,11 @@ export const answerStream = async (
 
       const context = similarChunks.map(chunk => ({ postId: chunk.postId, postTitle: chunk.postTitle }));
       stream.write(`event: context\ndata: ${JSON.stringify(context)}\n\n`);
+      try {
+        console.log(
+          JSON.stringify({ type: 'debug.qa.path', mode: 'rag', similarChunks: similarChunks.length, contextPreview: context.slice(0, 3) })
+        );
+      } catch {}
 
       messages = toSimpleMessages(
         qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt)
@@ -120,13 +139,39 @@ export const answerStream = async (
       options: llm?.options,
       meta: { userId, categoryId, postId },
     });
+    try {
+      console.log(
+        JSON.stringify({
+          type: 'debug.qa.call',
+          provider: llm?.provider || 'openai',
+          model: llm?.model || config.CHAT_MODEL,
+          messages: messages.length,
+          tools: (tools || []).length,
+          hasOptions: !!llm?.options,
+        })
+      );
+    } catch {}
 
     llmStream.on('data', (chunk) => {
+      try {
+        const str = Buffer.isBuffer(chunk) ? chunk.toString('utf8') : String(chunk);
+        console.log(
+          JSON.stringify({
+            type: 'debug.qa.chunk',
+            at: Date.now(),
+            bytes: Buffer.byteLength(str, 'utf8'),
+            preview: str.slice(0, 40).replace(/\n/g, '\\n'),
+          })
+        );
+      } catch {}
       stream.write(chunk);
     });
     llmStream.on('end', () => {
       stream.end();
     });
+    llmStream.on('error', (e) => {
+      try { console.error(JSON.stringify({ type: 'debug.qa.llmError', message: (e as any)?.message || 'error' })); } catch {}
+    });
 
   })().catch(err => {
       console.error('Stream process error:', err);
diff --git a/src/services/qa.v2.service.ts b/src/services/qa.v2.service.ts
new file mode 100644
index 0000000..8a8e651
--- /dev/null
+++ b/src/services/qa.v2.service.ts
@@ -0,0 +1,271 @@
+import { PassThrough } from 'stream';
+import { generate } from '../llm';
+import config from '../config';
+import * as qaPrompts from '../prompts/qa.prompts';
+import * as postRepository from '../repositories/post.repository';
+import * as personaRepository from '../repositories/persona.repository';
+import { generateSearchPlan } from './search-plan.service';
+import { runSemanticSearch } from './semantic-search.service';
+import { runHybridSearch } from './hybrid-search.service';
+import { createEmbeddings } from './embedding.service';
+
+const preprocessContent = (content: string): string => {
+  const plainText = content.replace(/<[^>]*>/g, ' ').replace(/\s+/g, ' ').trim();
+  return plainText.length > 40000 ? plainText.substring(0, 40000) : plainText;
+};
+
+const getSpeechTonePrompt = async (speechTone: number, userId: string): Promise<string> => {
+  if (speechTone === -1) return '간결하고 명확한 말투로 답변해';
+  if (speechTone === -2)
+    return '아래의 블로그 본문 컨텍스트를 참고하여 본문의 말투를 파악해 최대한 비슷한 말투로 답변해';
+
+  const persona = await personaRepository.findPersonaById(speechTone, userId);
+  if (persona) return `${persona.name}: ${persona.description}`;
+  return '간결하고 명확한 말투로 답변해';
+};
+
+type LlmOverride = {
+  provider?: 'openai' | 'gemini';
+  model?: string;
+  options?: { temperature?: number; top_p?: number; max_output_tokens?: number };
+};
+
+export const answerStreamV2 = async (
+  question: string,
+  userId: string,
+  categoryId?: number,
+  speechTone: number = -1,
+  postId?: number,
+  llm?: LlmOverride
+): Promise<PassThrough> => {
+  const stream = new PassThrough();
+
+  (async () => {
+    const speechTonePrompt = await getSpeechTonePrompt(speechTone, userId);
+
+    let messages: { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] = [];
+    let tools:
+      | {
+          type: 'function';
+          function: { name: string; description?: string; parameters?: Record<string, unknown> };
+        }[]
+      | undefined = undefined;
+
+    const toSimpleMessages = (
+      raw: any[]
+    ): { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] => {
+      return (raw || []).map((m: any) => ({
+        role: m.role,
+        content: typeof m.content === 'string' ? m.content : JSON.stringify(m.content),
+      }));
+    };
+
+    if (postId) {
+      // Post-centric path (same as v1 with added v2 pre-events)
+      const post = await postRepository.findPostById(postId);
+      if (!post) {
+        stream.write(`event: error\n`);
+        stream.write(`data: ${JSON.stringify({ code: 404, message: 'Post not found' })}\n\n`);
+        stream.end();
+        return;
+      }
+      if (!post.is_public && post.user_id !== userId) {
+        stream.write(`event: error\n`);
+        stream.write(`data: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`);
+        stream.end();
+        return;
+      }
+      // Emit plan event for transparency
+      stream.write(`event: search_plan\n`);
+      stream.write(
+        `data: ${JSON.stringify({ mode: 'post', filters: { post_id: postId, user_id: userId } })}\n\n`
+      );
+
+      const processed = preprocessContent(post.content);
+      const ctx = [{ postId: post.id, postTitle: post.title }];
+      stream.write(`event: search_result\n`);
+      stream.write(`data: ${JSON.stringify(ctx)}\n\n`);
+      stream.write(`event: exist_in_post_status\n`);
+      stream.write(`data: true\n\n`);
+      stream.write(`event: context\n`);
+      stream.write(`data: ${JSON.stringify(ctx)}\n\n`);
+
+      messages = toSimpleMessages(
+        qaPrompts.createPostContextPrompt(post, processed, question, speechTonePrompt)
+      );
+    } else {
+      // Plan generation
+      const planPair = await generateSearchPlan(question, { user_id: userId, category_id: categoryId });
+      if (!planPair) {
+        // Fallback to v1 RAG silently
+        const [questionEmbedding] = await createEmbeddings([question]);
+        const similarChunks = await postRepository.findSimilarChunks(userId, questionEmbedding, categoryId);
+        const context = similarChunks.map((c) => ({ postId: c.postId, postTitle: c.postTitle }));
+        stream.write(`event: search_plan\n`);
+        stream.write(`data: ${JSON.stringify({ mode: 'rag', fallback: true })}\n\n`);
+        stream.write(`event: search_result\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+        stream.write(`event: exist_in_post_status\n`);
+        stream.write(`data: ${JSON.stringify(similarChunks.length > 0)}\n\n`);
+        stream.write(`event: context\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+
+        messages = toSimpleMessages(
+          qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt)
+        );
+        if (similarChunks.length === 0) {
+          tools = undefined;
+        } else {
+          tools = [
+            {
+              type: 'function',
+              function: {
+                name: 'report_content_insufficient',
+                description: '카테고리는 맞지만 본문 컨텍스트가 부족할 때 호출',
+                parameters: {
+                  type: 'object',
+                  properties: {
+                    text: {
+                      type: 'string',
+                      description:
+                        '답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다.',
+                    },
+                  },
+                  required: ['text'],
+                },
+              },
+            },
+          ];
+        }
+      } else {
+        const plan: any = planPair.normalized;
+        stream.write(`event: search_plan\n`);
+        stream.write(`data: ${JSON.stringify(plan)}\n\n`);
+        // Console debug for emitted search plan
+        try {
+          console.log(
+            JSON.stringify(
+              {
+                type: 'debug.sse.search_plan',
+                userId,
+                categoryId,
+                plan_summary: {
+                  mode: plan.mode,
+                  top_k: plan.top_k,
+                  threshold: plan.threshold,
+                  weights: plan.weights,
+                  sort: plan.sort,
+                  limit: plan.limit,
+                  hybrid: plan.hybrid,
+                  time: plan?.filters?.time || null,
+                  rewrites_len: Array.isArray(plan.rewrites) ? plan.rewrites.length : 0,
+                  keywords_len: Array.isArray(plan.keywords) ? plan.keywords.length : 0,
+                },
+              },
+              null,
+              0,
+            ),
+          );
+        } catch {}
+
+        let rows: { postId: string; postTitle: string; postChunk: string; similarityScore: number }[] = [];
+        if (plan.hybrid?.enabled) {
+          if (Array.isArray(plan.rewrites) && plan.rewrites.length > 0) {
+            stream.write(`event: rewrite\n`);
+            stream.write(`data: ${JSON.stringify(plan.rewrites)}\n\n`);
+          }
+          if (Array.isArray(plan.keywords) && plan.keywords.length > 0) {
+            stream.write(`event: keywords\n`);
+            stream.write(`data: ${JSON.stringify(plan.keywords)}\n\n`);
+          }
+          rows = await runHybridSearch(
+            question,
+            userId,
+            plan
+          );
+          const hybridContext = rows.map((r) => ({ postId: r.postId, postTitle: r.postTitle }));
+          stream.write(`event: hybrid_result\n`);
+          stream.write(`data: ${JSON.stringify(hybridContext)}\n\n`);
+
+          if (!rows.length) {
+            rows = await runSemanticSearch(question, userId, plan);
+          }
+        } else {
+          rows = await runSemanticSearch(question, userId, plan);
+        }
+
+        const context = rows.map((r) => ({ postId: r.postId, postTitle: r.postTitle }));
+        stream.write(`event: search_result\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+        stream.write(`event: exist_in_post_status\n`);
+        stream.write(`data: ${JSON.stringify(rows.length > 0)}\n\n`);
+        stream.write(`event: context\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+
+        messages = toSimpleMessages(
+          qaPrompts.createRagPrompt(
+            question,
+            rows.map((r) => ({
+              postId: r.postId,
+              postTitle: r.postTitle,
+              postChunk: r.postChunk,
+              similarityScore: r.similarityScore,
+            })) as any,
+            speechTonePrompt
+          )
+        );
+        // If no context was found, avoid tool-calls to force direct natural-language guidance
+        if (rows.length === 0) {
+          tools = undefined;
+        } else {
+          tools = [
+            {
+              type: 'function',
+              function: {
+                name: 'report_content_insufficient',
+                description: '카테고리는 맞지만 본문 컨텍스트가 부족할 때 호출',
+                parameters: {
+                  type: 'object',
+                  properties: {
+                    text: {
+                      type: 'string',
+                      description:
+                        '답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다.',
+                    },
+                  },
+                  required: ['text'],
+                },
+              },
+            },
+          ];
+        }
+      }
+    }
+
+    const llmStream = await generate({
+      provider: llm?.provider || 'openai',
+      model: llm?.model || config.CHAT_MODEL,
+      messages,
+      tools,
+      options: llm?.options,
+      meta: { userId, categoryId, postId },
+    });
+
+    llmStream.on('data', (chunk) => stream.write(chunk));
+    llmStream.on('end', () => stream.end());
+    llmStream.on('error', () => {
+      stream.write(`event: error\n`);
+      stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+      stream.end();
+    });
+  })().catch((err) => {
+    try {
+      console.error('v2 Stream process error:', err);
+    } catch {}
+    stream.write(`event: error\n`);
+    stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+    stream.end();
+  });
+
+  return stream;
+};
diff --git a/src/services/search-plan.service.ts b/src/services/search-plan.service.ts
new file mode 100644
index 0000000..507a1c4
--- /dev/null
+++ b/src/services/search-plan.service.ts
@@ -0,0 +1,376 @@
+import OpenAI from 'openai';
+import config from '../config';
+import { buildSearchPlanPrompt, getSearchPlanSchemaJson } from '../prompts/qa.v2.prompts';
+import { planSchema, type SearchPlan } from '../types/ai.v2.types';
+import { nowUtc, toAbsoluteRangeKst } from '../utils/time';
+
+const openai = new OpenAI({ apiKey: config.OPENAI_API_KEY });
+
+export type PlanContext = {
+  user_id: string;
+  category_id?: number;
+  post_id?: number;
+  timezone?: string; // default Asia/Seoul
+};
+
+export const generateSearchPlan = async (
+  question: string,
+  ctx: PlanContext
+): Promise<{ plan: SearchPlan; normalized: SearchPlan } | null> => {
+  const timezone = ctx.timezone || 'Asia/Seoul';
+  const now = nowUtc();
+  const nowUtcIso = now.toISOString();
+  const nowKstIso = new Date(now.getTime() + 9 * 3600 * 1000).toISOString();
+
+  const prompt = buildSearchPlanPrompt({
+    now_utc: nowUtcIso,
+    now_kst: nowKstIso,
+    timezone,
+    user_id: ctx.user_id,
+    category_id: ctx.category_id,
+    post_id: ctx.post_id,
+    question,
+  });
+
+
+  try {
+    // Debug prompt before request
+    try {
+      console.log(
+        JSON.stringify(
+          {
+            type: 'debug.plan.prompt',
+            model: 'gpt-5-mini',
+            prompt_len: prompt.length,
+            head: prompt.slice(0, 600),
+            tail: prompt.slice(Math.max(0, prompt.length - 600)),
+          },
+          null,
+          0,
+        ),
+      );
+    } catch {}
+
+    const response: any = await (openai as any).responses.create({
+      model: 'gpt-5-mini',
+      input: prompt,
+      text: { format: { type: 'json_schema', name: 'SearchPlan', schema: getSearchPlanSchemaJson()} },
+      reasoning: { effort: 'minimal' },
+      max_output_tokens: 1500,
+    });
+
+    // Debug peek: log response shapes before JSON extraction
+    try {
+      const outputs = (response as any)?.output || [];
+      const outputSummary = outputs.map((o: any) => ({
+        role: o?.role,
+        content: (o?.content || []).map((c: any) => ({
+          type: c?.type,
+          hasText: typeof c?.text === 'string',
+          textLen: typeof c?.text === 'string' ? (c.text as string).length : undefined,
+          hasJson: !!c?.json,
+        })),
+      }));
+      const outputText = (response as any)?.output_text;
+      console.log(
+        JSON.stringify(
+          { type: 'debug.plan.response.peek', has_output_text: !!outputText, output_text_len: typeof outputText === 'string' ? outputText.length : undefined, output_summary: outputSummary },
+          null,
+          0,
+        ),
+      );
+    } catch {}
+
+    // Extract structured JSON if available, otherwise parse text
+    let parsed: any = null;
+    try {
+      const outputs = (response as any)?.output || [];
+      for (const o of outputs) {
+        for (const c of (o?.content || [])) {
+          if (c && (c.type === 'json' || c.type === 'output_json') && c.json) {
+            parsed = c.json;
+            break;
+          }
+        }
+        if (parsed) break;
+      }
+    } catch {}
+    // Also check output_text for JSON string if using Responses API response_format
+    if (!parsed && typeof (response as any)?.output_text === 'string') {
+      const s = ((response as any).output_text as string).trim();
+      if (s.startsWith('{')) {
+        try { parsed = JSON.parse(s); } catch {}
+      }
+    }
+    if (!parsed) {
+      const texts: string[] = [];
+      const ot = (response as any)?.output_text;
+      if (typeof ot === 'string') texts.push(ot);
+      try {
+        const outputs = (response as any)?.output || [];
+        for (const o of outputs) {
+          for (const c of (o?.content || [])) {
+            const t = typeof c?.text === 'string' ? c.text : undefined;
+            if (t) texts.push(t);
+          }
+        }
+      } catch {}
+      const raw = texts.join('').trim();
+      // Debug: log raw text before JSON.parse
+      try {
+        console.log(
+          JSON.stringify(
+            { type: 'debug.plan.raw_text', len: raw.length, head: raw.slice(0, 200) },
+            null,
+            0,
+          ),
+        );
+      } catch {}
+      if (!raw) {
+        // Graceful fallback: unable to parse structured output
+        return null;
+      }
+
+      // Robust extraction of first balanced JSON object
+      const tryParse = (s: string): any | null => {
+        try { return JSON.parse(s); } catch { return null; }
+      };
+      let candidate = tryParse(raw);
+      if (!candidate) {
+        const start = raw.indexOf('{');
+        if (start >= 0) {
+          let depth = 0;
+          let inStr = false;
+          let esc = false;
+          for (let i = start; i < raw.length; i++) {
+            const ch = raw[i];
+            if (inStr) {
+              if (esc) esc = false;
+              else if (ch === '\\') esc = true;
+              else if (ch === '"') inStr = false;
+            } else {
+              if (ch === '"') inStr = true;
+              else if (ch === '{') depth++;
+              else if (ch === '}') {
+                depth--;
+                if (depth === 0) {
+                  const sub = raw.slice(start, i + 1);
+                  candidate = tryParse(sub);
+                  if (candidate) break;
+                }
+              }
+            }
+          }
+          if (!candidate) {
+            const last = raw.lastIndexOf('}');
+            if (last > start) candidate = tryParse(raw.slice(start, last + 1));
+          }
+        }
+      }
+      if (!candidate) {
+        try {
+          console.warn(JSON.stringify({ type: 'debug.plan.parse_fail', note: 'could not extract JSON from raw' }));
+        } catch {}
+        return null;
+      }
+      parsed = candidate;
+    }
+
+    // If still no parsed plan at this point, try a fallback call without text.format
+    if (!parsed) {
+      try {
+        const response2: any = await (openai as any).responses.create({
+          model: config.CHAT_MODEL || 'gpt-5-mini',
+          input: prompt,
+          max_output_tokens: 700,
+        });
+        // Debug peek for fallback
+        try {
+          const outputs = (response2 as any)?.output || [];
+          const outputSummary = outputs.map((o: any) => ({
+            role: o?.role,
+            content: (o?.content || []).map((c: any) => ({ type: c?.type, hasText: typeof c?.text === 'string', textLen: typeof c?.text === 'string' ? (c.text as string).length : undefined }))
+          }));
+          const outputText = (response2 as any)?.output_text;
+          console.log(JSON.stringify({ type: 'debug.plan.fallback.peek', has_output_text: !!outputText, output_text_len: typeof outputText === 'string' ? outputText.length : undefined, output_summary: outputSummary }));
+        } catch {}
+
+        // Parse fallback response
+        let parsed2: any = null;
+        try {
+          const outputs = (response2 as any)?.output || [];
+          for (const o of outputs) {
+            for (const c of (o?.content || [])) {
+              const t = typeof c?.text === 'string' ? c.text : undefined;
+              if (t) {
+                const s = t.trim();
+                if (s.startsWith('{')) { try { parsed2 = JSON.parse(s); } catch {} }
+                if (parsed2) break;
+              }
+            }
+            if (parsed2) break;
+          }
+        } catch {}
+        if (!parsed2) {
+          const ot = (response2 as any)?.output_text;
+          if (typeof ot === 'string') {
+            const s = ot.trim();
+            try { parsed2 = JSON.parse(s); } catch {}
+          }
+        }
+        if (!parsed2) {
+          console.warn(JSON.stringify({ type: 'debug.plan.fallback.parse_fail' }));
+          // proceed to chat completions fallback
+        }
+        if (parsed2) parsed = parsed2;
+      } catch {
+        // continue to chat completions fallback
+      }
+    }
+
+    // Final fallback: Chat Completions with JSON object mode
+    if (!parsed) {
+      try {
+        const sys = 'You output ONLY a single JSON object matching the SearchPlan shape. No extra text.';
+        const userMsg = prompt;
+        const cc: any = await (openai as any).chat.completions.create({
+          model: config.CHAT_MODEL || 'gpt-5-mini',
+          messages: [
+            { role: 'system', content: sys },
+            { role: 'user', content: userMsg },
+          ],
+          response_format: { type: 'json_object' },
+          max_tokens: 700,
+        });
+        // Debug
+        try {
+          console.log(JSON.stringify({ type: 'debug.plan.cc.peek', choices: (cc as any)?.choices?.length || 0 }));
+        } catch {}
+        const content = (cc as any)?.choices?.[0]?.message?.content || '';
+        if (typeof content === 'string' && content.trim().startsWith('{')) {
+          parsed = JSON.parse(content);
+        }
+      } catch (e) {
+        try { console.warn(JSON.stringify({ type: 'debug.plan.cc.error', message: (e as any)?.message || 'error' })); } catch {}
+        return null;
+      }
+    }
+    const plan = planSchema.parse(parsed);
+
+    // Normalize weights sum to 1
+    const sum = (plan.weights?.chunk ?? 0) + (plan.weights?.title ?? 0);
+    const weights = sum > 0 ? { chunk: plan.weights.chunk / sum, title: plan.weights.title / sum } : { chunk: 0.7, title: 0.3 };
+
+    // Normalize time range to absolute if provided
+    let normPlan: SearchPlan = { ...plan, weights };
+
+    const clamp = (n: number, lo: number, hi: number) => Math.min(hi, Math.max(lo, n));
+
+    const stopwords = new Set([
+      '글',
+      '포스트',
+      '블로그',
+      '소개',
+      '정리',
+      '내용',
+      '최신',
+      '최근',
+      '정보',
+    ]);
+
+    const cleanList = (arr: string[] | undefined, max: number) => {
+      const uniq = new Set<string>();
+      for (const s of arr || []) {
+        const t = String(s || '').trim();
+        if (!t) continue;
+        if (t.length < 2) continue;
+        if (stopwords.has(t)) continue;
+        const key = t.toLowerCase();
+        if (uniq.has(key)) continue;
+        uniq.add(key);
+      }
+      return Array.from(uniq).slice(0, max);
+    };
+    if ((plan as any)?.filters?.time) {
+      const abs = toAbsoluteRangeKst(plan.filters?.time as any, now);
+      if (abs) {
+        normPlan = {
+          ...normPlan,
+          filters: { ...normPlan.filters, time: { type: 'absolute', from: abs.from, to: abs.to } as any },
+        };
+      } else {
+        // drop invalid time
+        const { time, ...rest } = normPlan.filters || ({} as any);
+        normPlan = { ...normPlan, filters: rest as any };
+      }
+    }
+
+    // Enforce bounds just in case
+    normPlan.top_k = Math.min(10, Math.max(1, normPlan.top_k || 5));
+    normPlan.limit = Math.min(20, Math.max(1, normPlan.limit || 5));
+    normPlan.threshold = Math.min(1, Math.max(0, normPlan.threshold ?? 0.2));
+    const maxRewrites = clamp(plan.hybrid?.max_rewrites ?? 3, 0, 4);
+    const maxKeywords = clamp(plan.hybrid?.max_keywords ?? 6, 0, 8);
+
+    // Map retrieval_bias -> alpha (fallback to provided alpha or default)
+    const bias = (plan.hybrid as any)?.retrieval_bias || 'balanced';
+    const biasAlpha = bias === 'lexical' ? 0.3 : bias === 'semantic' ? 0.75 : 0.5;
+    const alpha = clamp(((plan.hybrid as any)?.alpha ?? biasAlpha) as number, 0, 1);
+
+    normPlan.hybrid = {
+      enabled: !!plan.hybrid?.enabled,
+      retrieval_bias: bias,
+      alpha,
+      max_rewrites: maxRewrites,
+      max_keywords: maxKeywords,
+    } as any;
+    normPlan.rewrites = cleanList(plan.rewrites, maxRewrites) as any;
+    normPlan.keywords = cleanList(plan.keywords, maxKeywords) as any;
+    if (!normPlan.mode) normPlan.mode = (ctx.post_id ? 'post' : 'rag') as any;
+
+    // Note: Only filters.time is kept here to satisfy the SearchPlan schema.
+    //       user_id/category_ids/post_id will be injected later by the query layer.
+
+    // Console debug: final parsed + normalized plan
+    try {
+      const timeInfo = (normPlan as any)?.filters?.time;
+      console.log(
+        JSON.stringify(
+          {
+            type: 'debug.plan.final',
+            ctx: { user_id: ctx.user_id, category_id: ctx.category_id, post_id: ctx.post_id },
+            summary: {
+              mode: normPlan.mode,
+              top_k: normPlan.top_k,
+              threshold: normPlan.threshold,
+              weights: normPlan.weights,
+              sort: normPlan.sort,
+              limit: normPlan.limit,
+              hybrid: {
+                enabled: !!normPlan.hybrid?.enabled,
+                retrieval_bias: normPlan.hybrid?.retrieval_bias,
+                alpha: normPlan.hybrid?.alpha,
+                max_rewrites: normPlan.hybrid?.max_rewrites,
+                max_keywords: normPlan.hybrid?.max_keywords,
+              },
+              time: timeInfo ? { type: timeInfo.type, from: timeInfo.from, to: timeInfo.to } : null,
+              rewrites_len: (normPlan.rewrites || []).length,
+              keywords_len: (normPlan.keywords || []).length,
+            },
+            plan,
+            normalized: normPlan,
+          },
+          null,
+          0,
+        ),
+      );
+    } catch {}
+
+    return { plan, normalized: normPlan };
+  } catch (e) {
+    try {
+      console.error(JSON.stringify({ type: 'debug.plan.error', message: (e as any)?.message || 'error' }));
+    } catch {}
+    return null;
+  }
+};
diff --git a/src/services/semantic-search.service.ts b/src/services/semantic-search.service.ts
new file mode 100644
index 0000000..05561ac
--- /dev/null
+++ b/src/services/semantic-search.service.ts
@@ -0,0 +1,36 @@
+import { createEmbeddings } from './embedding.service';
+import * as postRepository from '../repositories/post.repository';
+import { SearchPlan } from '../types/ai.v2.types';
+
+export type SemanticSearchResult = {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  similarityScore: number;
+}[];
+
+export const runSemanticSearch = async (
+  question: string,
+  userId: string,
+  plan: SearchPlan
+): Promise<SemanticSearchResult> => {
+  const [embedding] = await createEmbeddings([question]);
+
+  const from = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.from : undefined;
+  const to = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.to : undefined;
+
+  const rows = await postRepository.findSimilarChunksV2({
+    userId,
+    embedding,
+    categoryId: (plan.filters as any)?.category_ids?.[0],
+    from,
+    to,
+    threshold: plan.threshold,
+    topK: plan.top_k,
+    weights: plan.weights,
+    sort: plan.sort,
+  });
+
+  return rows;
+};
+
diff --git a/src/types/ai.v2.types.ts b/src/types/ai.v2.types.ts
new file mode 100644
index 0000000..6e3353b
--- /dev/null
+++ b/src/types/ai.v2.types.ts
@@ -0,0 +1,106 @@
+import { z } from 'zod';
+
+// ===== Plan JSON Schema (Zod) =====
+
+// Time filter schema: support multiple shapes to reduce LLM fragility
+export const timeFilterSchema = z.discriminatedUnion('type', [
+  // Absolute ISO range
+  z
+    .object({ type: z.literal('absolute'), from: z.string(), to: z.string() })
+    .strict(),
+  // Relative window: N units up to today (KST)
+  z
+    .object({
+      type: z.literal('relative'),
+      unit: z.enum(['day', 'week', 'month', 'year']),
+      value: z.number().int().min(1).max(365),
+    })
+    .strict(),
+  // Month of a year (default year=now)
+  z
+    .object({ type: z.literal('month'), year: z.number().int().optional(), month: z.number().int().min(1).max(12) })
+    .strict(),
+  // Quarter of a year (default year=now)
+  z
+    .object({ type: z.literal('quarter'), year: z.number().int().optional(), quarter: z.number().int().min(1).max(4) })
+    .strict(),
+  // Single year
+  z.object({ type: z.literal('year'), year: z.number().int() }).strict(),
+  // Named presets (limited set)
+  z
+    .object({
+      type: z.literal('named'),
+      preset: z.enum([
+        'all_time',
+        'all',
+        'today',
+        'yesterday',
+        'last_7_days',
+        'last_14_days',
+        'last_30_days',
+        'this_month',
+        'last_month',
+      ]),
+    })
+    .strict(),
+  // Free-form label, e.g., "2006_to_now", "2024-Q3", "2019-2022", "2024-09"
+  z.object({ type: z.literal('label'), label: z.string().min(1) }).strict(),
+]);
+
+export const planSchema = z.object({
+  mode: z.enum(['rag', 'post']).default('rag'),
+  top_k: z.number().int().min(1).max(10).default(5),
+  threshold: z.number().min(0).max(1).default(0.2),
+  weights: z
+    .object({ chunk: z.number().min(0).max(1), title: z.number().min(0).max(1) })
+    .default({ chunk: 0.7, title: 0.3 }),
+  rewrites: z.array(z.string()).default([]),
+  keywords: z.array(z.string()).default([]),
+  hybrid: z
+    .object({
+      enabled: z.boolean().default(false),
+      // LLM outputs retrieval_bias label; server maps to alpha
+      retrieval_bias: z.enum(['lexical', 'balanced', 'semantic']).default('balanced'),
+      alpha: z.number().min(0).max(1).optional(),
+      max_rewrites: z.number().int().min(0).max(4).default(3),
+      max_keywords: z.number().int().min(0).max(8).default(6),
+    })
+    .default({ enabled: false, retrieval_bias: 'balanced', max_rewrites: 3, max_keywords: 6 }),
+  filters: z
+    .object({
+      time: timeFilterSchema.optional(),
+    })
+    .strict()
+    .optional(),
+  sort: z.enum(['created_at_desc', 'created_at_asc']).default('created_at_desc'),
+  limit: z.number().int().min(1).max(20).default(5),
+});
+
+export type SearchPlan = z.infer<typeof planSchema>;
+
+// ===== API: /ai/v2/ask =====
+
+export const askV2Schema = z.object({
+  body: z.object({
+    question: z.string(),
+    user_id: z.string(),
+    category_id: z.number().optional(),
+    post_id: z.number().optional(),
+    speech_tone: z.number().optional(),
+    llm: z
+      .object({
+        provider: z.enum(['openai', 'gemini']).optional(),
+        model: z.string().optional(),
+        options: z
+          .object({
+            temperature: z.number().optional(),
+            top_p: z.number().optional(),
+            max_output_tokens: z.number().optional(),
+          })
+          .optional(),
+      })
+      .optional(),
+  }),
+});
+
+export type AskV2Request = z.infer<typeof askV2Schema>['body'];
diff --git a/src/utils/time.ts b/src/utils/time.ts
new file mode 100644
index 0000000..b950068
--- /dev/null
+++ b/src/utils/time.ts
@@ -0,0 +1,211 @@
+// Minimal KST time utilities and range normalization
+
+const KST_OFFSET_MINUTES = 9 * 60; // UTC+9
+
+const toDate = (isoOrDate: string | Date): Date => (isoOrDate instanceof Date ? isoOrDate : new Date(isoOrDate));
+
+export const nowUtc = (): Date => new Date();
+
+export const toKst = (d: Date): Date => {
+  // Convert UTC date to KST by adding offset
+  return new Date(d.getTime() + KST_OFFSET_MINUTES * 60 * 1000);
+};
+
+export const fromKstToUtc = (d: Date): Date => {
+  return new Date(d.getTime() - KST_OFFSET_MINUTES * 60 * 1000);
+};
+
+const startOfDay = (d: Date): Date => new Date(d.getFullYear(), d.getMonth(), d.getDate(), 0, 0, 0, 0);
+const endOfDay = (d: Date): Date => new Date(d.getFullYear(), d.getMonth(), d.getDate(), 23, 59, 59, 999);
+
+export const startOfMonth = (year: number, monthIndex0: number): Date => new Date(year, monthIndex0, 1, 0, 0, 0, 0);
+export const endOfMonth = (year: number, monthIndex0: number): Date => new Date(year, monthIndex0 + 1, 0, 23, 59, 59, 999);
+
+export const startOfQuarter = (year: number, quarter: number): Date => {
+  const m0 = (quarter - 1) * 3; // 0-based month index
+  return new Date(year, m0, 1, 0, 0, 0, 0);
+};
+export const endOfQuarter = (year: number, quarter: number): Date => {
+  const m0 = quarter * 3 - 1; // end month index
+  return new Date(year, m0 + 1, 0, 23, 59, 59, 999);
+};
+
+export type AbsoluteRange = { from: string; to: string };
+
+export const toAbsoluteRangeKst = (input: { type: string; [k: string]: any }, base: Date = nowUtc()): AbsoluteRange | null => {
+  try {
+    const baseKst = toKst(base);
+    const year = baseKst.getFullYear();
+    // Named presets
+    if (input.type === 'named') {
+      const p = String(input.preset || '').toLowerCase();
+      const endK = endOfDay(baseKst);
+      const beginOfTodayK = startOfDay(baseKst);
+      const endUtc = fromKstToUtc(endK).toISOString();
+      const todayStartUtc = fromKstToUtc(beginOfTodayK).toISOString();
+      if (p === 'all' || p === 'all_time') return null; // no time filter
+      if (p === 'today') return { from: todayStartUtc, to: endUtc };
+      if (p === 'yesterday') {
+        const yK = new Date(beginOfTodayK.getTime());
+        yK.setDate(yK.getDate() - 1);
+        return { from: fromKstToUtc(startOfDay(yK)).toISOString(), to: fromKstToUtc(endOfDay(yK)).toISOString() };
+      }
+      const daysBack = (n: number) => {
+        const toK = endK;
+        const fromK = new Date(toK.getTime());
+        fromK.setDate(fromK.getDate() - (n - 1));
+        return { from: fromKstToUtc(startOfDay(fromK)).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      };
+      if (p === 'last_7_days') return daysBack(7);
+      if (p === 'last_14_days') return daysBack(14);
+      if (p === 'last_30_days') return daysBack(30);
+      if (p === 'this_month') {
+        const fromK = startOfMonth(year, baseKst.getMonth());
+        const toK = endOfMonth(year, baseKst.getMonth());
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      if (p === 'last_month') {
+        const m = baseKst.getMonth();
+        const yAdj = m === 0 ? year - 1 : year;
+        const mAdj = m === 0 ? 11 : m - 1;
+        const fromK = startOfMonth(yAdj, mAdj);
+        const toK = endOfMonth(yAdj, mAdj);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      return null; // unknown named: drop filter
+    }
+    if (input.type === 'relative') {
+      const unit = String(input.unit);
+      const value = Math.max(1, parseInt(String(input.value || '1'), 10));
+      const toK = endOfDay(baseKst);
+      const fromK = new Date(toK.getTime());
+      if (unit === 'day') fromK.setDate(fromK.getDate() - value + 1);
+      else if (unit === 'week') fromK.setDate(fromK.getDate() - value * 7 + 1);
+      else if (unit === 'month') fromK.setMonth(fromK.getMonth() - value);
+      else if (unit === 'year') fromK.setFullYear(fromK.getFullYear() - value);
+      const fromUtc = fromKstToUtc(startOfDay(fromK));
+      const toUtc = fromKstToUtc(toK);
+      return { from: fromUtc.toISOString(), to: toUtc.toISOString() };
+    }
+    if (input.type === 'absolute') {
+      const from = new Date(input.from);
+      const to = new Date(input.to);
+      return { from: from.toISOString(), to: to.toISOString() };
+    }
+    if (input.type === 'month') {
+      const m = Math.max(1, Math.min(12, parseInt(String(input.month), 10)));
+      const y = input.year ? parseInt(String(input.year), 10) : year;
+      const fromK = startOfMonth(y, m - 1);
+      const toK = endOfMonth(y, m - 1);
+      return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+    }
+    if (input.type === 'year') {
+      const y = parseInt(String(input.year), 10);
+      const fromK = startOfMonth(y, 0);
+      const toK = endOfMonth(y, 11);
+      return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+    }
+    if (input.type === 'quarter') {
+      const q = Math.max(1, Math.min(4, parseInt(String(input.quarter), 10)));
+      const y = input.year ? parseInt(String(input.year), 10) : year;
+      const fromK = startOfQuarter(y, q);
+      const toK = endOfQuarter(y, q);
+      return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+    }
+    if (input.type === 'label') {
+      const raw = String(input.label || '').trim();
+      if (!raw) return null;
+      const s = raw.replace(/\s+/g, '').toLowerCase();
+      const endK = endOfDay(baseKst);
+      // Support common named tokens expressed as labels
+      const startTodayK = startOfDay(baseKst);
+      const toUtcStr = fromKstToUtc(endK).toISOString();
+      const fromTodayUtcStr = fromKstToUtc(startTodayK).toISOString();
+      const daysBack = (n: number) => {
+        const toK = endK;
+        const fromK = new Date(toK.getTime());
+        fromK.setDate(fromK.getDate() - (n - 1));
+        return { from: fromKstToUtc(startOfDay(fromK)).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      };
+      if (s === 'all' || s === 'all_time') return null; // drop filter
+      if (s === 'today') return { from: fromTodayUtcStr, to: toUtcStr };
+      if (s === 'yesterday') {
+        const yK = new Date(startTodayK.getTime());
+        yK.setDate(yK.getDate() - 1);
+        return { from: fromKstToUtc(startOfDay(yK)).toISOString(), to: fromKstToUtc(endOfDay(yK)).toISOString() };
+      }
+      if (s === 'last_7_days') return daysBack(7);
+      if (s === 'last_14_days') return daysBack(14);
+      if (s === 'last_30_days') return daysBack(30);
+      if (s === 'this_month') {
+        const fromK = startOfMonth(year, baseKst.getMonth());
+        const toK = endOfMonth(year, baseKst.getMonth());
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      if (s === 'last_month') {
+        const m = baseKst.getMonth();
+        const yAdj = m === 0 ? year - 1 : year;
+        const mAdj = m === 0 ? 11 : m - 1;
+        const fromK = startOfMonth(yAdj, mAdj);
+        const toK = endOfMonth(yAdj, mAdj);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 1) YYYY_to_now / YYYY-to-now
+      let m = s.match(/^(\d{4})(?:_|-|to)+now$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const fromK = startOfMonth(y, 0);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(endK).toISOString() };
+      }
+      // 2) YYYY-YYYY / YYYY..YYYY / YYYY_to_YYYY
+      m = s.match(/^(\d{4})(?:\.|_|-|to){1,2}(\d{4})$/);
+      if (m) {
+        const y1 = parseInt(m[1], 10);
+        const y2 = parseInt(m[2], 10);
+        const a = Math.min(y1, y2);
+        const b = Math.max(y1, y2);
+        const fromK = startOfMonth(a, 0);
+        const toK = endOfMonth(b, 11);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 3) YYYY-Qn or Qn-YYYY
+      m = s.match(/^(\d{4})(?:-|_)q([1-4])$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const q = parseInt(m[2], 10);
+        const fromK = startOfQuarter(y, q);
+        const toK = endOfQuarter(y, q);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      m = s.match(/^q([1-4])(?:-|_)?(\d{4})$/);
+      if (m) {
+        const q = parseInt(m[1], 10);
+        const y = parseInt(m[2], 10);
+        const fromK = startOfQuarter(y, q);
+        const toK = endOfQuarter(y, q);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 4) YYYY-MM
+      m = s.match(/^(\d{4})(?:-|_)?(\d{1,2})$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const month = Math.max(1, Math.min(12, parseInt(m[2], 10)));
+        const fromK = startOfMonth(y, month - 1);
+        const toK = endOfMonth(y, month - 1);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 5) YYYY
+      m = s.match(/^(\d{4})$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const fromK = startOfMonth(y, 0);
+        const toK = endOfMonth(y, 11);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      return null; // unrecognized label
+    }
+  } catch {
+    // ignore
+  }
+  return null;
+};