Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,10 @@
<version.jakarta.validation>3.0.2</version.jakarta.validation>
<version.jupiter>6.1.0</version.jupiter>
<version.maven-compiler-plugin>3.15.0</version.maven-compiler-plugin>
<!-- Workaround: quarkus-langchain4j-ollama:1.10.0 in the Quarkus 3.36.1 platform BOM was compiled
against Mutiny 3.1.x (Context.put(String,Object)) but quarkus-bom:3.36.1 ships Mutiny 3.2.0
which changed the signature to put(Object,Object), causing a NoSuchMethodError at runtime. -->
<version.mutiny>3.1.1</version.mutiny>
<version.nimbus.jose.jwt>10.9.1</version.nimbus.jose.jwt>
<version.org.apache.logging.log4j>2.26.0</version.org.apache.logging.log4j>
<version.org.jboss.arquillian>1.10.2.Final</version.org.jboss.arquillian>
Expand Down
247 changes: 247 additions & 0 deletions rts/lra/lra-ai-dashboard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# LRA AI Dashboard

A Quarkus application that puts an LLM in front of the Narayana LRA coordinator REST API.
Operators ask natural-language questions; the LLM calls coordinator endpoints as tools, correlates the results, and explains what is happening — including root causes and remediation steps — in plain English.

---

## Architecture

```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

Lines 13 and 212 use unlabeled fenced blocks (MD040). Add explicit languages (for example text) to satisfy linting and improve rendering/tooling.

Also applies to: 212-212

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 13-13: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rts/lra/lra-ai-dashboard/README.md` at line 13, Several fenced code blocks in
the README are unlabeled (MD040); locate the unlabeled triple-backtick blocks
(the "unlabeled fenced code blocks" occurrences) and add explicit language
identifiers (e.g., ```text or ```bash) to each opening fence—apply this to both
occurrences mentioned so linting passes and rendering/tooling improves.

Browser / curl
│ POST /chat {"message": "Why is LRA X stuck?"}
LraAiChatResource (JAX-RS endpoint on port 8082)
│ assistant.chat(message)
LraAssistant (LangChain4j AI Service interface)
│ System prompt encodes the full LRA state machine and recovery protocol
│ LangChain4j dispatches tool calls to LraTools as needed
LLM (Ollama / OpenAI)
│ selects and calls one or more tools
LraTools (@ApplicationScoped CDI bean)
│ java.net.http.HttpClient — plain HTTP GET requests
Comment on lines +25 to +26

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the HTTP method description.

Line 28 says "plain HTTP GET requests" but LraTools actually uses multiple HTTP methods: GET (read tools), PUT (close/cancel), POST (start), and DELETE (deleteFailedLRA). This description should reflect the full range of HTTP operations.

📝 Suggested fix
-LraTools                   (`@ApplicationScoped` CDI bean)
-      │  java.net.http.HttpClient — plain HTTP GET requests
+LraTools                   (`@ApplicationScoped` CDI bean)
+      │  java.net.http.HttpClient — HTTP requests (GET, PUT, POST, DELETE)
       ▼
 LRA Coordinator            (http://localhost:8080/lra-coordinator)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
│ java.net.http.HttpClient — plain HTTP GET requests
│ java.net.http.HttpClient — HTTP requests (GET, PUT, POST, DELETE)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rts/lra/lra-ai-dashboard/README.md` around lines 28 - 29, Update the HTTP
method description that currently reads "plain HTTP GET requests" to accurately
reflect all HTTP operations used by LraTools: GET for reading tools, POST for
start (e.g., start), PUT for close/cancel actions (e.g., close/cancel), and
DELETE for deleteFailedLRA; edit the LraTools README entry (the line referencing
java.net.http.HttpClient) to list these methods and their corresponding actions
so readers can see the full range of operations.

LRA Coordinator (http://localhost:8080/lra-coordinator)
```

Tool calls (read or write) execute synchronously before the LLM begins streaming its text
response. The LLM may call multiple tools in sequence, correlating results across calls to
diagnose multi-participant failure cascades that are not apparent from any single API call.

---

## Prerequisites

| Component | Version | Notes |
|-----------|---------|-------|
| Java | 17+ | |
| Maven | 3.9+ | |
| LRA Coordinator | any | `quay.io/jbosstm/lra-coordinator:latest`, running on `localhost:8080` by default |
| Ollama | any | Running on `localhost:11434` by default |
| llama3.1 (or compatible) | — | Must support tool/function calling |

### Install and start Ollama

```bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start the server (if not already running as a service)
ollama serve &

# Pull the configured model
ollama pull llama3.1

```

> **Tool-calling requirement:** The LLM must support Ollama's tool-calling API.
> `llama3.1` (the default) and `llama3.2`, `mistral-nemo`, `qwen2.5` all work.
> `llama3` (without `.1`) does **not** support tool calling and will return HTTP 400.
> To switch model: change `quarkus.langchain4j.ollama.chat-model.model-id` in
> `application.properties` and run `ollama pull <model-name>`.

---

## Quick start

### Step 1 — Start the LRA coordinator

```bash
podman run --network host quay.io/jbosstm/lra-coordinator:latest

# Confirm it is running (should return a JSON array)
curl http://localhost:8080/lra-coordinator
```

### Step 2 — Start Ollama

```bash
ollama serve & # no-op if already running as a service
```

### Step 3 — Start the AI dashboard

```bash
cd rts/lra/lra-ai-dashboard
mvn quarkus:dev
```

Open **http://localhost:8082** for the browser chat UI, or use curl:

```bash
curl -s -X POST http://localhost:8082/chat \
-H "Content-Type: application/json" \
-d '{"message": "Are there any stuck transactions?"}' | jq .
```
Comment on lines +97 to +101

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify jq usage with plain-text streaming response.

The curl example pipes the response to jq, but based on the implementation in LraAiChatResource.java, the endpoint returns @Produces(MediaType.TEXT_PLAIN) with Multi<String> streaming, not JSON. The jq command will likely fail or produce unexpected output. Consider removing | jq . or clarifying that the response is plain text.

📝 Suggested fix
 curl -s -X POST http://localhost:8082/chat \
   -H "Content-Type: application/json" \
-  -d '{"message": "Are there any stuck transactions?"}' | jq .
+  -d '{"message": "Are there any stuck transactions?"}'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rts/lra/lra-ai-dashboard/README.md` around lines 96 - 100, The README's curl
example pipes output to jq but the chat endpoint implemented in
LraAiChatResource.java produces plain-text streaming (annotated with
`@Produces`(MediaType.TEXT_PLAIN) and returns a Multi<String>), so jq is
inappropriate; update the example to remove "| jq ." or add a note that the
endpoint streams plain text (not JSON) and show using curl alone or redirecting
to a file for streaming output to match the LraAiChatResource.java behavior.


---

## Configuration

All settings are in `src/main/resources/application.properties`.

| Property | Default | Purpose |
|----------|---------|---------|
| `quarkus.http.port` | `8082` | Avoids conflict with coordinator on 8080 |
| `lra.coordinator.url` | `http://localhost:8080/lra-coordinator` | Injected into `LraTools` as the base URL |
| `quarkus.langchain4j.ollama.chat-model.model-id` | `llama3.1` | Ollama model name (must support tool calling) |
| `quarkus.langchain4j.ollama.base-url` | `http://localhost:11434` | Ollama server URL |
| `quarkus.langchain4j.ollama.timeout` | `120s` | Generous timeout for local inference |

To override at startup without editing the file:

```bash
mvn quarkus:dev \
-Dlra.coordinator.url=http://coordinator-host:8080/lra-coordinator \
-Dquarkus.langchain4j.ollama.chat-model.model-id=mistral-nemo
```

---

## Switching to OpenAI

For cloud deployments where Ollama is unavailable:

1. In `pom.xml`, swap the commented/active LangChain4j dependency:

```xml
<!-- comment out: -->
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
</dependency>

<!-- uncomment: -->
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-openai</artifactId>
</dependency>
```

2. In `application.properties`, comment out the Ollama block and uncomment the OpenAI block.

3. Export your key and start:

```bash
export LRA_AI_API_KEY=sk-...
mvn quarkus:dev
```

---

## Source files

### `LraTools.java`

Six `@Tool`-annotated methods that form the saga-domain tool schema described in the patent.
Each method makes a single blocking HTTP GET to the coordinator and returns the raw JSON response
for the LLM to reason over.
Comment on lines +162 to +164

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix tool inventory and HTTP-method description to match implementation.

Line 164 says “Six @Tool-annotated methods” and Line 165 says “single blocking HTTP GET,” but the table already lists more methods and LraTools includes mutating PUT/POST/DELETE operations too (including deleteFailedLRA). Please update this section to reflect the actual full tool surface and method types.

Also applies to: 168-179

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rts/lra/lra-ai-dashboard/README.md` around lines 164 - 166, The README text
incorrectly states "Six `@Tool`-annotated methods" and "single blocking HTTP
GET" but the actual tool surface (LraTools) contains more than six `@Tool`
methods and includes mutating HTTP methods (PUT/POST/DELETE), e.g.
deleteFailedLRA; update the paragraph and the following lines (168–179) to
enumerate or generically describe the full tool inventory and accurately state
that methods use a mix of HTTP verbs (GET, POST, PUT, DELETE) and include
mutating operations, and ensure `@Tool` usage and examples match the
implementation in LraTools and specifically mention deleteFailedLRA as a
DELETE/mutating operation.


| Method | Coordinator endpoint | When the LLM calls it |
|--------|---------------------|----------------------|
| `listAllLRAs()` | `GET /lra-coordinator/` | Overview of all transactions |
| `listLRAsByStatus(status)` | `GET /lra-coordinator/?Status=X` | Narrow focus to a specific state |
| `getLRADetails(lraId)` | `GET {lraId}` | Full participant breakdown for one LRA |
| `getLRAStatus(lraId)` | `GET {lraId}/status` | Cheap status-only check |
| `listRecoveringLRAs()` | `GET /lra-coordinator/recovery` | Confirm auto-recovery is running |
| `listFailedLRAs()` | `GET /lra-coordinator/recovery/failed` | Find transactions needing manual action |
| `closeLRA(lraId)` | `PUT {lraId}/close` | Operator-requested completion |
| `cancelLRA(lraId)` | `PUT {lraId}/cancel` | Operator-requested compensation |
| `startLRA(clientId, timeLimitMs)` | `POST /lra-coordinator/start` | Operator-requested new transaction |

The write tools (`closeLRA`, `cancelLRA`) are guarded in the system prompt: the LLM will
only call them when the operator explicitly requests it and will echo the target LRA ID
before acting.
Comment on lines +178 to +180

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Document all write tools in the guardrail note.

The guardrail paragraph only mentions closeLRA and cancelLRA, but startLRA and deleteFailedLRA are also state-changing tools. Line 180 should include all mutating tools so operator expectations match runtime behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rts/lra/lra-ai-dashboard/README.md` around lines 180 - 182, Update the
guardrail paragraph to list all mutating/write tools so it matches runtime
behavior: include startLRA and deleteFailedLRA alongside closeLRA and cancelLRA
in the sentence that describes write tools being guarded; ensure the wording
still states that the LLM will only call these tools when the operator
explicitly requests it and that the LLM will echo the target LRA ID before
acting (refer to symbols startLRA, deleteFailedLRA, closeLRA, cancelLRA).


The LRA ID is the full resource URI (e.g. `http://localhost:8080/lra-coordinator/0_ffff...`),
so `getLRADetails` and `getLRAStatus` are plain GETs on that URI with no path manipulation.
`java.net.http.HttpClient` is used directly to avoid classpath conflicts with the `lra-client`
module's RESTEasy dependency.

### `LraAssistant.java`

A LangChain4j AI Service interface. The `@SystemMessage` annotation encodes:
- All LRA lifecycle states and their transitions
- All participant status values and what each means
- The recovery protocol (automatic retry, when manual intervention is needed)
- Nested LRA failure propagation
- How to approach diagnosis (gather data first, then correlate and explain)

LangChain4j binds this interface to the configured LLM at startup and injects `LraTools`
as the tool provider. The result is a CDI bean injectable anywhere in the application.

### `LraAiChatResource.java`

A single `POST /chat` endpoint.
No `@Blocking` is needed: `Multi<String>` is handled natively by RESTEasy Reactive without
occupying the I/O thread.
Comment thread
coderabbitai[bot] marked this conversation as resolved.

---

## Example queries

```
Show me all active LRAs.

How many LRAs are currently in each state?

Are there any failed or stuck transactions?

Why is LRA http://localhost:8080/lra-coordinator/0_ffff7f000001_... stuck?

Is the recovery coordinator doing anything right now?

Which LRAs need manual intervention?
```

### Multi-step reasoning example

**Query:** *"Is there anything wrong right now?"*

A typical LLM reasoning chain:
1. Calls `listAllLRAs()` → spots several in `FailedToCancel`
2. Calls `listLRAsByStatus("FailedToCancel")` → gets IDs
3. Calls `getLRADetails(id)` for each → finds one participant with `FailedToCompensate`
4. Calls `listRecoveringLRAs()` → confirms that LRA is in the recovery queue
5. **Response:** *"There are 3 LRAs in FailedToCancel state. LRA `0_ffff...` has been
stuck since participant `https://payment-service/compensate` returned FailedToCompensate.
The recovery coordinator is actively retrying it. If the payment service is still down
you will need to restore it and wait for the next retry cycle (approx. 2 minutes),
or use the recovery coordinator API to force a terminal state manually."*

---

## Extending the PoC

| Extension | What to add |
|-----------|-------------|
| **forceRecovery tool** | `PUT /lra-coordinator/recovery/{id}` to force a stuck participant to a terminal state |
| **Proactive alerts** | Quarkus `@Scheduled` job calls `listFailedLRAs()` periodically; LLM generates alert if count > threshold |
| **Chat memory** | Add `@MemoryId` parameter for multi-turn operator sessions |
| **Multi-coordinator** | Aggregate state from all cluster nodes (see HA patent) for a cluster-wide view |
92 changes: 92 additions & 0 deletions rts/lra/lra-ai-dashboard/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>org.jboss.narayana.quickstart.rts.lra</groupId>
<artifactId>lra-quickstarts</artifactId>
<version>7.3.5.Final-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

<artifactId>lra-ai-dashboard</artifactId>
<packaging>jar</packaging>
<name>LRA AI Dashboard</name>
<description>LLM-Tool-Assisted Distributed Saga Coordination — chat-driven LRA operator dashboard</description>

<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-bom</artifactId>
<version>${quarkus.platform.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>io.quarkus.platform</groupId>
<artifactId>quarkus-langchain4j-bom</artifactId>
<version>${quarkus.platform.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- Pin Mutiny to 3.1.1: quarkus-langchain4j-ollama:1.10.0 was compiled against the
String-key Context.put API that was reverted to Object-key in Mutiny 3.2.0. -->
<dependency>
<groupId>io.smallrye.reactive</groupId>
<artifactId>mutiny</artifactId>
<version>${version.mutiny}</version>
</dependency>
</dependencies>
</dependencyManagement>

<dependencies>
<!-- JAX-RS (RESTEasy Reactive) -->
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest-jackson</artifactId>
</dependency>
<!-- LangChain4j with Ollama backend (local / air-gapped mode) -->
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
</dependency>
<!-- Uncomment to use OpenAI instead of Ollama (cloud mode):
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-openai</artifactId>
</dependency>
-->
</dependencies>

<build>
<plugins>
<plugin>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-maven-plugin</artifactId>
<version>${quarkus.platform.version}</version>
<extensions>true</extensions>
<executions>
<execution>
<goals>
<goal>build</goal>
<goal>generate-code</goal>
<goal>generate-code-tests</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${version.maven-compiler-plugin}</version>
</plugin>
</plugins>
</build>
</project>
Loading