Skip to content

Improve environment configuration strategy for better maintainability and safety #217

@eberrigan

Description

@eberrigan

Overview

This issue proposes optional improvements to the environment configuration strategy to make the codebase more maintainable, safer, and less brittle when adding new environments.

Priority: Low (nice-to-have improvements)
Scope: Code quality, developer experience, safety

Current Issues

1. Hardcoded Environment Lists

Problem: Adding new environments requires finding and updating multiple hardcoded lists.

Current code:

# main.py:759
if ENVIRONMENT not in ["prod", "test"]:  # Easy to forget to update

# Multiple workflow files with similar checks
if [ "${{ inputs.environment }}" == "prod" ]; then
elif [ "${{ inputs.environment }}" == "test" ]; then

Risk: When ci-test was added, it fell through to dev behavior because the list wasn't updated (see #216).

2. Unsafe Default Environment

Problem: ENVIRONMENT defaults to "prod" if not set.

Current code:

# main.py:64
ENVIRONMENT = os.getenv("ENVIRONMENT", "prod")

Risk: If someone forgets to set the environment variable, they could accidentally modify production resources.

3. No Environment Validation

Problem: Typos in environment names fail silently or cause unexpected behavior.

Example:

export ENVIRONMENT=pruduction  # Typo!
# Falls through to dev behavior (local state)

Risk: Silent failures, resources created in wrong environment.

4. Backend File Manipulation at Runtime

Problem: Deleting backend.tf at runtime is fragile.

Current code:

# main.py:760
(TERRAFORM_DIR / "backend.tf").unlink(missing_ok=True)

Risk: If process crashes during initialization, file state is inconsistent.

5. Generic Lock Table Name

Problem: DynamoDB table name "lock-table" could conflict with other projects.

Current:

dynamodb_table = "lock-table"

Risk: If multiple projects share an AWS account, table conflicts or unintended sharing.

6. Duplicated Logic Across Workflows

Problem: Environment decision logic is duplicated in multiple places in lablink-images.yml.

Locations:

  • Lines 69-98: Dockerfile selection
  • Lines 104-113: Environment suffix
  • Lines 219-225, 260, 271, 309-316, 352, 363: Verification conditionals

Risk: Easy to update one place but miss others (inconsistent behavior).

Proposed Improvements

Priority 1: Centralize Environment Configuration

Create packages/allocator/src/lablink_allocator_service/conf/environments.yaml:

environments:
  dev:
    backend_type: local
    description: "Local development"

  test:
    backend_type: s3
    description: "Automated staging environment"

  ci-test:
    backend_type: s3
    description: "Manual CI testing environment"

  prod:
    backend_type: s3
    description: "Production environment"

Benefits:

  • Single source of truth for valid environments
  • Easy to add new environments
  • Self-documenting

Usage in code:

from .conf.environments import load_environment_config, VALID_ENVIRONMENTS

if ENVIRONMENT not in VALID_ENVIRONMENTS:
    raise ValueError(f"Invalid environment: {ENVIRONMENT}")

env_config = load_environment_config(ENVIRONMENT)
if env_config.backend_type == "s3":
    # Use S3 backend

Priority 2: Add Environment Validation

VALID_ENVIRONMENTS = ["dev", "test", "ci-test", "prod"]

ENVIRONMENT = os.getenv("ENVIRONMENT", "dev")  # Safer default
if ENVIRONMENT not in VALID_ENVIRONMENTS:
    raise ValueError(
        f"Invalid ENVIRONMENT: {ENVIRONMENT}. "
        f"Must be one of: {', '.join(VALID_ENVIRONMENTS)}"
    )

Benefits:

  • Fails fast on typos
  • Clear error messages
  • Prevents silent failures

Priority 3: Improve DynamoDB Lock Table Naming

# backend-client-*.hcl files
dynamodb_table = "lablink-terraform-lock"  # Or "tf-lock-lablink-<environment>"

Benefits:

  • Avoids conflicts with other projects
  • Clear ownership
  • Better for multi-project AWS accounts

Priority 4: Add Resource Tagging

# In Terraform client VM creation
tags = {
  Project     = "lablink"
  Environment = var.environment
  ManagedBy   = "terraform"
  Purpose     = "compute"
}

Benefits:

  • Easy cost tracking per environment
  • Helps identify orphaned resources
  • Better AWS resource management

Priority 5: Safer Backend Handling

Instead of deleting backend.tf, consider:

  • Option A: Use -backend=false flag for dev
  • Option B: Use Terraform workspaces
  • Option C: Keep separate backend files and symlink based on environment

Benefits:

  • No runtime file manipulation
  • More robust initialization
  • Clearer intent

Priority 6: Consolidate Workflow Logic

Create a workflow-level environment configuration or use a matrix strategy to reduce duplicated environment checks throughout the workflow file.

Benefits:

  • Single source of truth
  • Easier to update
  • Less duplication

Implementation Approach

Phase 1 (Quick wins):

  1. Add environment validation
  2. Change default to "dev" instead of "prod"
  3. Update lock table name

Phase 2 (Architecture):

  1. Create environments.yaml config
  2. Refactor code to use centralized config
  3. Add resource tagging

Phase 3 (Workflow refactor):

  1. Consolidate workflow decision logic
  2. Update documentation

Non-Goals

  • This does NOT propose changing the number of environments
  • This does NOT propose separate S3 buckets per environment (current single bucket strategy is fine for manual workflows)
  • This does NOT propose auto-cleanup or complex lifecycle management

Related

Acceptance Criteria

  • Environment validation prevents invalid environment names
  • Adding a new environment requires updates in only 1-2 places
  • Default environment is safe (dev, not prod)
  • Lock table has project-specific name
  • Documentation updated with new patterns
  • All existing tests pass
  • No breaking changes to current functionality

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions