Official organization for research paper One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue — a novel turn-level monitor that identifies the earliest turn where multi-turn interactions become sufficient for harm, providing a robust defense against state-of-the-art adaptive attackers such as the CKA-Agent.
- Code Repository: TurnGate
- Project Homepage: turn-gate.github.io
- Paper:
- Title: One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue
- Read more on arXiv
- Defending Against State-of-the-art Multi Turn Attack: CKA-Agent