Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,15 @@ This guide provides a quick way to get started with our project. Please see our

### Build Instructions (if applicable)

N/A
When building in a new AWS account or for a new venue, need to create the s3 bucket `mdps-airflow-{venue}-dag-sources` that contains the JSON dag_repos_airflow.json to specify where DAGs should be read from. Schema example is:
```
{
"url": "https://github.com/MAAP-Project/airflow-dags.git",
"ref": "main", (branch)
"path": ".", (don't need to include repo name)
"name": "MAAP_DAGs" (needs to be unique across other entries)
},
```

### Test Instructions (if applicable)

Expand Down
30 changes: 30 additions & 0 deletions airflow/docker/multi-git-sync/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM python:3.11-slim

# Install git and other required tools
RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*

# Install Python dependencies
RUN pip install --no-cache-dir boto3

# Create gitsync user (non-root)
RUN useradd -m -u 1000 -s /bin/bash gitsync && \
mkdir -p /dag-catalog/repos /dag-catalog/current && \
chown -R gitsync:gitsync /dag-catalog

# Copy sync script
COPY sync-repos.py /usr/local/bin/sync-repos.py
RUN chmod 755 /usr/local/bin/sync-repos.py && \
chown gitsync:gitsync /usr/local/bin/sync-repos.py

# Switch to non-root user
USER gitsync

# Set working directory
WORKDIR /dag-catalog

# Set entrypoint
ENTRYPOINT ["/usr/local/bin/sync-repos.py"]
92 changes: 92 additions & 0 deletions airflow/docker/multi-git-sync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Multi-Git-Sync Container

A custom container that syncs multiple git repositories based on configuration stored in AWS S3.

## Features

- **Dynamic repository configuration**: Read repository list from S3
- **Automatic polling**: Checks for configuration changes every 60 seconds (configurable)
- **Multi-repo support**: Syncs multiple repositories to separate subdirectories
- **No restart required**: Automatically picks up new repositories without pod restart, just edit the s3 file
- **IRSA support**: Uses IAM Roles for Service Accounts (IRSA) for AWS authentication

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `S3_BUCKET` | S3 bucket containing repo configuration file | `unity-dev-sps-config-smce` |
| `S3_KEY` | S3 object key for repo configuration file | `dag_repos_airflow.json` |
| `AWS_REGION` | AWS region for S3 | `us-west-2` |
| `SYNC_ROOT` | Root directory for synced repositories | `/dag-catalog` |
| `POLL_INTERVAL` | Polling interval in seconds | `60` |

## S3 Configuration File Format

The S3 object dag_repos_airflow.json should contain a JSON array of repository configurations:

```json
[
{
"url": "https://github.com/unity-sds/unity-sps.git",
"ref": "main", (branch)
"path": "airflow/dags", (dont need to include repo name and "." for root)
"name": "unity-sps"
},
{
"url": "https://github.com/org/another-repo.git",
"ref": "develop",
"path": "dags",
"name": "another-repo"
}
]
```

### Configuration Fields

- `url`: Git repository URL (HTTPS)
- `ref`: Git ref to checkout (branch, tag, or commit)
- `path`: Subdirectory within the repo to expose (relative path)
- `name`: Unique name for the repository (used as directory name)

## Directory Structure

```
/dag-catalog/
├── repos/
│ ├── unity-sps/ # Git clone of unity-sps repo
│ └── another-repo/ # Git clone of another-repo
└── current/
├── unity-sps -> ../repos/unity-sps/airflow/dags
└── another-repo -> ../repos/another-repo/dags
```

## Building the Image

```bash
docker build -t jplmdps/multi-git-sync:v1.0.0 .
docker push jplmdps/multi-git-sync:v1.0.0
```

## Troubleshooting

### Check container logs
```bash
kubectl logs <pod-name> -c multi-git-sync -f
```

### Verify S3 configuration file
```bash
aws s3 cp s3://unity-dev-sps-config-smce/dag_repos_airflow.json - | jq .
```

### Check synced directories
```bash
kubectl exec <pod-name> -c multi-git-sync -- ls -la /dag-catalog/current/
```

### Common Issues

1. **"S3 object not found"**: Ensure the S3 object exists at the specified bucket and key, and the IAM role has access
2. **"S3 bucket not found"**: Verify the bucket name is correct and exists in the AWS account
3. **"Failed to clone repository"**: Check git URL is correct and accessible (public repo or credentials configured)
4. **"Source path does not exist"**: Verify the `path` field in the config points to a valid directory in the repo
14 changes: 14 additions & 0 deletions airflow/docker/multi-git-sync/dag_repos_airflow.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[
{
"url": "https://github.com/MAAP-Project/airflow-dags.git",
"ref": "main",
"path": ".",
"name": "MAAP_DAGs"
},
{
"url": "https://github.com/grallewellyn/unity-dags-2.git",
"ref": "main",
"path": "folder1/folder2",
"name": "Project_2"
}
]
Loading
Loading