Skill Review: Strong framework, needs structural hardening for consistent output

## Context

I maintain a collection of Claude Code skills for SDLC automation (story workflows, bug investigation, QA support, multi-agent consultation). I came across your skill and ran it through our multi-agent review process — a 3-agent panel (Codex CLI as Technical Reviewer, Gemini CLI as Product Owner, Claude as Architect) scoring against defined acceptance criteria. Below is our synthesis, written to be useful to you as the creator.

---

## What's genuinely good

**The core insight is sharp.** "Can end users *know* what value they'll achieve?" is a better framing than the standard "what's the value proposition?" Most product analysis tools stop at features-vs-benefits. Yours goes deeper: it asks whether the user can *articulate* the value to themselves. That's a meaningful distinction that changes how you think about adoption.

**The 4-dimension framework is well-designed.** Value Clarity, Timeline, Perception, and Discovery are orthogonal dimensions that don't overlap. Each one asks a different question, and together they cover the full adoption journey. This is better than most product frameworks I've seen in skill form.

**The case studies are above average.** Real products, real numbers, data sources cited. The Dropbox vs Google Wave contrast is particularly effective — same era, similar ambition, opposite outcomes explained through the framework. The `references/real-cases.md` file is thorough and well-structured.

**The honest scoping is rare.** The "When This Framework Applies" section that explicitly says "less applicable for enterprise software, monopoly products, inherently delayed value" shows maturity. Most skill authors claim universality. You don't.

**The User vs End User distinction matters.** This is subtle but important — the person using the skill (PM, founder) is not the person using the product. This prevents the common trap of analyzing from the creator's perspective.

---

## Structural issues (high impact)

### 1. No structured output format — biggest gap

This is the single most impactful improvement you could make. Right now, the skill says:

> *"provide status assessment using status indicators (🔴🟡🟢) with specific description of current state"*

But there's no template for what the final output should look like. Two agents analyzing the same product would produce completely different report structures. One might give a 3-paragraph essay, another a bullet list, another a table.

**Suggestion:** Add an output template at the end of the skill:

```markdown
## Output Template

After completing all four dimensions, present the analysis in this format:

### Value Realization Analysis: [Product Name]

| Dimension | Status | Key Finding |
|-----------|--------|-------------|
| Value Clarity | 🔴/🟡/🟢 | [one-sentence assessment] |
| Value Timeline | 🔴/🟡/🟢 | [one-sentence assessment] |
| Value Perception | 🔴/🟡/🟢 | [one-sentence assessment] |
| Value Discovery | 🔴/🟡/🟢 | [one-sentence assessment] |

**Overall Assessment:** [Go / Conditional / No-Go]
**Confidence:** [High / Medium / Low]

### Critical Questions
1. [The sharpest question from the analysis]
2. [Second most important question]
3. [Third]

### Detailed Analysis
[Per-dimension analysis follows...]
```

This alone would make the skill 2x more useful. It gives stakeholders something scannable, comparable across products, and actionable.

### 2. Status indicator criteria are undefined

The README says "status indicators (🔴🟡🟢)" but the SKILL.md never defines what each color means. When does Value Clarity earn a 🟢 vs 🟡? Without defined thresholds, the indicators are subjective decoration.

**Suggestion:** Add criteria per dimension:

```
Value Clarity:
  🟢 End users can explain in one sentence what they'll achieve
  🟡 End users understand the category but not the specific outcome
  🔴 End users cannot articulate what they'll achieve
```

### 3. 20KB monolith is a context budget problem

The entire SKILL.md is loaded into the agent's context window every time. At ~20KB, it consumes significant budget before the agent even starts working. The case studies (Dropbox, Instagram, etc.) are repeated both in SKILL.md *and* in `references/real-cases.md`, which means the agent gets them twice if it reads both.

**Suggestion:** Follow a reference-file pattern:
- `SKILL.md` — Framework + output template + enforcement rules (~8KB)
- `references/real-cases.md` — Case studies (loaded on demand)
- `references/scoring-rubric.md` — Status indicator criteria (loaded on demand)

The SKILL.md should reference these files with "See `references/real-cases.md` for detailed case studies" and let the agent pull them when needed, rather than embedding summaries inline.

### 4. Trigger conditions are too broad

The description includes triggers like `"is this idea good?"`, `"what do you think of this product?"`, `"how about my idea?"`. These are so generic that the skill could fire on casual conversation that has nothing to do with product strategy. A user asking "is this idea good?" about a code refactoring approach would incorrectly trigger a value realization analysis.

**Suggestion:** Narrow to product-specific triggers:
```
"evaluate this product idea", "will users adopt this", "why aren't users retaining",
"analyze the value proposition", "product-market fit", "user adoption analysis"
```

---

## Enforcement gaps (medium impact)

### 5. No completion verification

The skill says "must complete analysis of all four dimensions" but there's nothing preventing an agent from doing 2 dimensions and calling it done. In our experience, agents routinely skip steps when there's no tracking mechanism. Even a simple instruction like:

```
Before presenting results, verify:
- [ ] All 4 dimensions analyzed with status indicator
- [ ] At least 1 sharp question per dimension
- [ ] Real product comparison cited (verified or marked as needing verification)
- [ ] Output follows the template format
```

...would significantly improve compliance.

### 6. Research methodology is guidance, not a gate

The Research Methodology section is thorough (primary vs secondary sources, verification approach, etc.) but it's advisory. An agent can skip verification entirely and present unverified claims as analysis. The instruction to "base on verifiable information" has no enforcement.

**Suggestion:** Make verification a visible step:

```
For each real product cited in the analysis:
- If data was verified via WebSearch/WebFetch: [VERIFIED: source]
- If data is from references/real-cases.md: [REFERENCE: case file]
- If data could not be verified: [UNVERIFIED: needs confirmation]
```

This makes the evidence chain visible to the user reading the output.

---

## Minor observations

### 7. Case study applicability section is smart but buried

The "Evaluating Case Study Applicability" section (product type match, market context match, user behavior match, value delivery match) is one of the best parts of the skill. It prevents the common failure of "Duolingo uses streaks, so we should too." But it's buried deep in the Research Methodology section where agents might not reach it during analysis. Consider promoting it closer to the framework dimensions.

### 8. The Common Pitfalls section doubles as a good self-test

Pitfalls 1-4 (Assuming end users know what they want, Features instead of value, Copying patterns without context, Invisible value) are exactly the mistakes the *skill itself* should guard against. Consider turning them into a "Pre-Analysis Checklist" that the agent runs *before* starting the 4-dimension analysis.

### 9. Bilingual support is a nice touch

Having both English and Chinese versions shows thoughtfulness about the audience. Most skills are English-only.

---

## Summary

| Area | Assessment |
|------|-----------|
| Core framework | Strong — genuinely insightful, well-differentiated |
| Case studies | Above average — real data, cited sources |
| Output consistency | Weak — no template, no rubric, high agent-to-agent variance |
| Enforcement | Weak — no completion verification, no evidence tracking |
| Structure | Needs work — monolith file, redundant content, broad triggers |
| Overall | **Solid v0.5 that needs hardening to become a reliable tool** |

The intellectual work is done. The framework is sound. What's missing is the engineering that turns a good framework into a consistent, reproducible analytical tool. The gap between "this makes the agent think better" and "this produces reliable, comparable output" is where the next iteration should focus.

---




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skill Review: Strong framework, needs structural hardening for consistent output #3

Context

What's genuinely good

Structural issues (high impact)

1. No structured output format — biggest gap

2. Status indicator criteria are undefined

3. 20KB monolith is a context budget problem

4. Trigger conditions are too broad

Enforcement gaps (medium impact)

5. No completion verification

6. Research methodology is guidance, not a gate

Minor observations

7. Case study applicability section is smart but buried

8. The Common Pitfalls section doubles as a good self-test

9. Bilingual support is a nice touch

Summary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Area	Assessment
Core framework	Strong — genuinely insightful, well-differentiated
Case studies	Above average — real data, cited sources
Output consistency	Weak — no template, no rubric, high agent-to-agent variance
Enforcement	Weak — no completion verification, no evidence tracking
Structure	Needs work — monolith file, redundant content, broad triggers
Overall	Solid v0.5 that needs hardening to become a reliable tool

Skill Review: Strong framework, needs structural hardening for consistent output #3

Description

Context

What's genuinely good

Structural issues (high impact)

1. No structured output format — biggest gap

2. Status indicator criteria are undefined

3. 20KB monolith is a context budget problem

4. Trigger conditions are too broad

Enforcement gaps (medium impact)

5. No completion verification

6. Research methodology is guidance, not a gate

Minor observations

7. Case study applicability section is smart but buried

8. The Common Pitfalls section doubles as a good self-test

9. Bilingual support is a nice touch

Summary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions