Skip to content

Backend services with network.connect: true can't resolve each other via service discovery in same env #6094

@dhruv1707

Description

@dhruv1707

Description:

I’m having trouble getting service discovery / Service Connect working between two backend services in the same Copilot application + environment.

Both services:

  • Are type: Backend Service
  • Have network.connect: true
  • Are deployed to the same env (test) of the same app (visionai)
  • Run in the same VPC & private subnets
  • Show as healthy in ECS / copilot svc status

However, from one service (final-summary) I cannot resolve the other service (chroma-db) via DNS at all, even though:

  • The environment’s service discovery endpoint is test.visionai.local
  • There is a Cloud Map service named chroma-db with Type: DNS_HTTP in the same namespace
  • The chroma-db ECS service reports healthy tasks

The failure mode is:

  • Inside final-summary’s ECS task, curl http://chroma-db:8000/... → Could not resolve host: chroma-db
  • getent hosts chroma-db → no output
  • getent hosts chroma-db.test.visionai.local → no output
  • chromadb.HttpClient(host="chroma-db.test.visionai.local", port=8000) → httpx.ConnectError: [Errno -2] Name or service not known

So it looks like no DNS record is being created for the backend service, even though the service itself is running fine.

Details:

copilot version: v1.34.1

Environment / setup

Region: ca-central-1

App: visionai

Env: test

Env service discovery endpoint inside tasks:
test.visionai.local

Service manifests

copilot/chroma-db/manifest.yml

name: chroma-db
type: Backend Service

image:
  build:
    context: .
    dockerfile: chroma_db/Dockerfile
    port: 8000

cpu: 1024
memory: 4096
platform: linux/x86_64
count: 2
exec: true

network:
  connect: true # Enable Service Connect for intra-environment traffic between services.

storage:
  volumes:
    chromaData:
      efs: true
      path: /chroma-data
      read_only: false

environments:
  test:
    network:
      vpc:
        placement: "private"

chroma_db/Dockerfile

FROM python:3.12-slim

WORKDIR /app

RUN mkdir -p /chroma-data && chmod 777 /chroma-data

RUN apt-get update && \
    rm -rf /var/lib/apt/lists/* && \
    pip install --no-cache-dir chromadb

EXPOSE 8000

CMD ["sh", "-c", "chroma run --host 0.0.0.0 --port 8000 --path /chroma-data"]

The chroma-db ECS service shows tasks as RUNNING/healthy.

copilot/final-summary/manifest.yml

name: final-summary
type: Backend Service

image:
  build:
    context: .
    dockerfile: final_summary/Dockerfile

cpu: 1024
memory: 8192
platform: linux/x86_64
count: 1
exec: true

network:
  connect: true

variables:
  PORT: 8000

environments:
  test:
    network:
      vpc:
        placement: "private"

Inside the final-summary container, I attempt to connect to Chroma via:

import chromadb

client = chromadb.HttpClient(
    host="chroma-db",              # also tried chroma-db.test.visionai.local
    port=8000,
)
client.heartbeat()

and get httpx.ConnectError: [Errno -2] Name or service not known.

Observed result:

From inside the final-summary ECS task:

getent hosts chroma-db
# (no output)

getent hosts chroma-db.test.visionai.local
# (no output)
curl http://chroma-db:8000/api/v1/heartbeat
# curl: (6) Could not resolve host: chroma-db

aws servicediscovery list-services \
  --region ca-central-1 \
  --query 'Services[?Name==`chroma-db`].[Id,Name]' \
  --output table
# -> shows a service named chroma-db with Type: DNS_HTTP

aws servicediscovery get-service \
  --region ca-central-1 \
  --id <chroma-db-service-id> \
  --query 'Service.NamespaceId'

aws servicediscovery get-namespace \
  --region ca-central-1 \
  --id <namespace-id> \
  --query 'Namespace.Name'
# -> "test.visionai.local"

aws servicediscovery list-instances \
  --region ca-central-1 \
  --service-id <chroma-db-service-id>
# -> []
# (no instances returned, even though ECS shows chroma-db tasks as RUNNING/healthy)

So from final-summary:

chroma-db and chroma-db.test.visionai.local do not resolve via DNS.
Cloud Map has a chroma-db service in test.visionai.local, but list-instances is empty.

Expected result:

Given both services are:

  • type: Backend Service
  • network.connect: true
  • In the same app/env (visionai / test)
  • In the same VPC and private subnets,

I expected that from the final-summary task I would be able to:

  • Resolve chroma-db or chroma-db.test.visionai.local via DNS, and
  • Successfully call http://chroma-db:8000/... (or the appropriate FQDN) and have the request routed to the chroma-db backend service.

Debugging:

All code cells above :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugIssues that are bugs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions