[ENHANCEMENT] Improve prompt consistency to reduce hallucinated and inconsistent LLM outputs

## Context

While working on the extraction pipeline and experimenting with different inputs, I noticed that the current prompts sometimes lead to inconsistent responses from the LLM.

In a few cases:
- the model returns fields that were not requested
- values are slightly inferred instead of strictly extracted from the input
- formatting is mostly correct but not always fully consistent

The recent prompt improvements helped structure the output better, but there are still some edge cases where the model “over-interprets” the input.

---

## Problem

Right now, the prompts mainly focus on *what to extract*, but not strongly enough on *what NOT to do*.

Because of this, the LLM sometimes:
- hallucinates values that are not explicitly present
- adds extra information beyond the defined schema
- makes assumptions when a field is unclear instead of leaving it empty

This can create issues for downstream processing, especially when strict structure is expected.

---

## Proposed Improvement

Refine the prompt instructions to make behavior more strict and predictable. Specifically:

- Add clearer constraints like:
  - “Do not infer values that are not explicitly present”
  - “Do not include fields outside the defined schema”
- Include a small negative example (showing incorrect output vs correct output)
- Make the expected output rules more explicit (strict JSON / strict field boundaries)
- Slightly improve the few-shot example to cover edge cases (missing fields, ambiguous input)

---

## Expected Outcome

- More deterministic and consistent outputs
- Reduced hallucination in extracted values
- Better alignment with the expected schema
- Cleaner integration with downstream validation logic

---

## Scope

This is a lightweight improvement focused only on prompt design.  
No changes to the extraction pipeline, schema, or architecture are required.

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Improve prompt consistency to reduce hallucinated and inconsistent LLM outputs #418

Context

Problem

Proposed Improvement

Expected Outcome

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENHANCEMENT] Improve prompt consistency to reduce hallucinated and inconsistent LLM outputs #418

Description

Context

Problem

Proposed Improvement

Expected Outcome

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions