Skip to content

Commit dfae7c3

Browse files
committed
docs(debug-inference): generalize firewall section to cover any host firewall tooling
1 parent 77de986 commit dfae7c3

1 file changed

Lines changed: 16 additions & 32 deletions

File tree

.agents/skills/debug-inference/SKILL.md

Lines changed: 16 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -224,15 +224,15 @@ This failure commonly appears on Linux hosts that:
224224

225225
- Run the OpenShell gateway in Docker
226226
- Route `inference.local` to a host-local OpenAI-compatible endpoint such as Ollama
227-
- Use UFW or another host firewall with default incoming or routed traffic denied
227+
- Have a host firewall or networking configuration that denies container-to-host traffic by default
228228

229229
In this case, OpenShell routing is usually working correctly. The failing hop is container-to-host traffic on the backend port.
230230

231231
### Why CoreDNS Is Not the Cause
232232

233233
This is not the same issue as the Colima CoreDNS fix.
234234

235-
OpenShell injects `host.docker.internal` and `host.openshell.internal` into sandbox pods with `hostAliases`. That path bypasses cluster DNS lookup. If the request still times out, the usual cause is host firewall policy, not CoreDNS.
235+
OpenShell injects `host.docker.internal` and `host.openshell.internal` into sandbox pods with `hostAliases`. That path bypasses cluster DNS lookup. If the request still times out, the usual cause is host firewall or network policy, not CoreDNS.
236236

237237
### Verify the Problem
238238

@@ -254,29 +254,27 @@ OpenShell injects `host.docker.internal` and `host.openshell.internal` into sand
254254
docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models
255255
```
256256

257-
If steps 1 and 2 succeed but step 3 times out, host firewall policy is blocking the container-to-host path.
257+
If steps 1 and 2 succeed but step 3 times out, the host firewall or network configuration is blocking the container-to-host path.
258258

259259
### Fix
260260

261-
Allow the OpenShell cluster bridge network to reach the host-local inference port.
261+
Allow the Docker bridge network used by the OpenShell cluster to reach the host-local inference port. The exact command depends on your firewall tooling (iptables, nftables, firewalld, UFW, etc.), but the rule should allow:
262262

263-
Example narrow UFW rule:
263+
- **Source**: the Docker bridge subnet used by the OpenShell cluster container (commonly `172.18.0.0/16`)
264+
- **Destination**: the host gateway IP injected into sandbox pods for `host.docker.internal` (commonly `172.17.0.1`)
265+
- **Port**: the inference server port (e.g. `11434/tcp` for Ollama)
264266

265-
```bash
266-
sudo ufw allow proto tcp \
267-
from 172.18.0.0/16 \
268-
to 172.17.0.1 \
269-
port 11434 \
270-
comment 'OpenShell local inference'
271-
```
267+
To find the actual values on your system:
272268

273-
This example matches a common local layout:
269+
```bash
270+
# Docker bridge subnet for the OpenShell cluster network
271+
docker network inspect $(docker network ls --filter name=openshell -q) --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}'
274272

275-
- `172.18.0.0/16`: the Docker bridge subnet used by the OpenShell cluster container
276-
- `172.17.0.1`: the host gateway IP injected into sandbox pods for `host.docker.internal`
277-
- `11434/tcp`: the default Ollama port
273+
# Host gateway IP visible from inside the container
274+
docker exec openshell-cluster-<gateway> cat /etc/hosts | grep host.docker.internal
275+
```
278276

279-
Adjust the source subnet, destination IP, or port if your local Docker network differs.
277+
Adjust the source subnet, destination IP, or port to match your local Docker network layout.
280278

281279
### Verify the Fix
282280

@@ -294,27 +292,13 @@ Adjust the source subnet, destination IP, or port if your local Docker network d
294292

295293
Both commands should return the upstream model list.
296294

297-
### Check the Active Firewall Rule
298-
299-
```bash
300-
sudo ufw status numbered
301-
```
302-
303-
You should see a rule similar to:
304-
305-
```
306-
172.17.0.1 11434/tcp ALLOW IN 172.18.0.0/16
307-
```
308-
309295
### If It Still Fails
310296

311297
- Confirm the backend listens on a host-reachable address: `ss -ltnp | rg ':11434\b'`
312298
- Confirm the provider points at the host alias path you expect: `openshell provider get <provider-name>`
313299
- Confirm the active inference route: `openshell inference get`
314300
- Inspect sandbox logs for upstream timeout details: `openshell logs <sandbox-name> --since 10m`
315301

316-
If the host uses a firewall other than UFW, apply the equivalent allow rule for traffic from the Docker bridge network to the host-local inference port.
317-
318302
## Common Failure Patterns
319303

320304
| Symptom | Likely cause | Fix |
@@ -327,7 +311,7 @@ If the host uses a firewall other than UFW, apply the equivalent allow rule for
327311
| `no compatible route` | Provider type does not match request shape | Switch provider type or change the client API |
328312
| Direct call to external host is denied | Missing policy or provider attachment | Update `network_policies` and launch sandbox with the right provider |
329313
| SDK fails on empty auth token | Client requires a non-empty API key even though OpenShell injects the real one | Use any placeholder token such as `test` |
330-
| Upstream timeout from container to host-local backend | Host firewall (UFW or similar) blocks container-to-host traffic | Allow the Docker bridge subnet to reach the inference port on the host gateway IP (see firewall fix section above) |
314+
| Upstream timeout from container to host-local backend | Host firewall or network config blocks container-to-host traffic | Allow the Docker bridge subnet to reach the inference port on the host gateway IP (see firewall fix section above) |
331315

332316
## Full Diagnostic Dump
333317

0 commit comments

Comments
 (0)