Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
a296859
Merge pull request #140 from IATI/main
simon-20 Apr 27, 2026
d14296d
test: make dataset expiry test check all fields
simon-20 May 5, 2026
312ed1c
fix: ensure cache URL/ETag fields blanked expiry
simon-20 May 5, 2026
d6eb171
test: checks *_error_occurred flags created false
simon-20 May 6, 2026
65f75a7
fix: adds db migrations to set error flag columns
simon-20 May 6, 2026
7584f2d
fix: create new datasets with error flags = false
simon-20 May 6, 2026
70e3c74
feat: update IATI Design System to 4.9.0
simon-20 May 6, 2026
9c92c2c
docs: improve README
simon-20 May 6, 2026
a71b39f
docs: update CHANGELOG
simon-20 May 6, 2026
3d2e52a
build: bump version
simon-20 May 6, 2026
741d82b
ci: deploy to a dedicated vnet with subnet
simon-20 May 11, 2026
0cbfaed
feat: script to create vnets for public IP
simon-20 May 11, 2026
680329c
fix(ci): add missing env var for MQ
simon-20 May 11, 2026
180eb3b
build: bump version number
simon-20 May 11, 2026
cdad6ad
docs: update README, CHANGELOG - vnet setup
simon-20 May 11, 2026
64992d2
Merge pull request #141 from IATI/sk-fixes
simon-20 May 12, 2026
4778a49
Merge pull request #142 from IATI/sk-public-ip
simon-20 May 12, 2026
35738fe
fix: alters public ip deploy to use full names
simon-20 May 12, 2026
c43dc6c
feat: manual deploy scripts
simon-20 May 12, 2026
f62c3ab
build: bump version, and small change to gitignore
simon-20 May 12, 2026
3b98d69
docs: update CHANGELOG
simon-20 May 12, 2026
2634890
Merge pull request #143 from IATI/sk-fix-public-ip-deploy
simon-20 May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions .github/workflows/build-and-deploy-job.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Generic build and deploy (called by other workflows)

on: # yamllint disable-line rule:truthy
on: # yamllint disable-line rule:truthy
workflow_call:
inputs:
APP_NAME:
Expand All @@ -25,21 +25,26 @@ jobs:
ACR_USERNAME: ${{ secrets.ACR_USERNAME }}
ACR_PASSWORD: ${{ secrets.ACR_PASSWORD }}

AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

steps:
- name: 'Generate/build derived environment variables'
run: |
echo "TARGET_ENVIRONMENT_UPPER=${TARGET_ENVIRONMENT^^}" >> ${GITHUB_ENV}
echo "CONTAINER_INSTANCE_BASE_NAME=aci-${APP_NAME}" >> ${GITHUB_ENV}
echo "RESOURCE_GROUP_BASE_NAME=rg-${APP_NAME}" >> ${GITHUB_ENV}
echo "STORAGE_ACCOUNT_NAME=sa${APP_NAME//-/}$TARGET_ENVIRONMENT" >> ${GITHUB_ENV}
echo "APP_NAME=${APP_NAME}" >> ${GITHUB_ENV}
echo "AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}" >> ${GITHUB_ENV}

- name: 'Print calculated environment variables'
run: |
echo $TARGET_ENVIRONMENT_UPPER
echo $CONTAINER_INSTANCE_BASE_NAME
echo $RESOURCE_GROUP_BASE_NAME
echo $STORAGE_ACCOUNT_NAME

echo $APP_NAME
echo $AZURE_SUBSCRIPTION_ID
- name: 'Checkout GitHub Action'
uses: actions/checkout@v4

Expand Down Expand Up @@ -105,6 +110,7 @@ jobs:
LOG_WORKSPACE_KEY: ${{ secrets[format('{0}_{1}', env.TARGET_ENVIRONMENT_UPPER, 'LOG_WORKSPACE_KEY')] }}

# Variables which configure the app
AZURE_SERVICE_BUS_DATASET_CHECK_RESULTS_TOPIC_NAME: ${{ vars[format('{0}_{1}', env.TARGET_ENVIRONMENT_UPPER, 'AZURE_SERVICE_BUS_DATASET_CHECK_RESULTS_TOPIC_NAME')] }}
AZURE_SERVICE_BUS_REGISTRY_SUB_NAME: ${{ vars[format('{0}_{1}', env.TARGET_ENVIRONMENT_UPPER, 'AZURE_SERVICE_BUS_REGISTRY_SUB_NAME')] }}
AZURE_SERVICE_BUS_REGISTRY_TOPIC_NAME: ${{ vars[format('{0}_{1}', env.TARGET_ENVIRONMENT_UPPER, 'AZURE_SERVICE_BUS_REGISTRY_TOPIC_NAME')] }}
AZURE_SERVICE_BUS_WAIT_TIME: ${{ vars[format('{0}_{1}', env.TARGET_ENVIRONMENT_UPPER, 'AZURE_SERVICE_BUS_WAIT_TIME')] }}
Expand Down Expand Up @@ -137,6 +143,8 @@ jobs:
az -v
az container create --debug \
--resource-group "${{ env.RESOURCE_GROUP_BASE_NAME }}-${{ env.TARGET_ENVIRONMENT }}" \
--vnet "/subscriptions/${{ env.AZURE_SUBSCRIPTION_ID }}/resourceGroups/rg-${{ env.APP_NAME }}-vnets/providers/Microsoft.Network/virtualNetworks/${{ env.APP_NAME }}-${{ env.TARGET_ENVIRONMENT }}-vnet" \
--subnet "/subscriptions/${{ env.AZURE_SUBSCRIPTION_ID }}/resourceGroups/rg-${{ env.APP_NAME }}-vnets/providers/Microsoft.Network/virtualNetworks/${{ env.APP_NAME }}-${{ env.TARGET_ENVIRONMENT }}-vnet/subnets/${{ env.APP_NAME }}-${{ env.TARGET_ENVIRONMENT }}-subnet" \
--file ./azure-deployment/azure-resource-manager-deployment-manifest.yml

- name: 'Re-generate the website links'
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -196,5 +196,7 @@ __marimo__/

/azure-deployment/azure-resource-manager-deployment-manifest.yml
/azure-deployment/nginx-reverse-proxy/htpasswd
/azure-deployment/manual-azure-deploy-secrets.env
/azure-deployment/manual-azure-deploy-variables.env

/web/index.html
31 changes: 31 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,37 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

### Removed

## [1.4.7] - 2026-05-11

### Added

- Added back in the manual deploy script

### Fixed

- Fixed Azure deploy to use full resource identifiers for dedicated vnet & subnet

## [1.4.6] - 2026-05-11

### Changed

- Updated deploy to use dedicated vnet & subnet

### Fixed

- Added env var for the MQ topic name to the GitHub workflow.

## [1.4.5] - 2026-05-06

### Changed

- Updated IATI Design System to 4.9.0

### Fixed

- Bug where the dataset's cached URLs were not being blanked after dataset expiry. (Resolves #137)
- Bug where `most_recent_head_attempt.error_occurred` was being set to `null` instead of `false`. (Resolves #136).

## [1.4.4] - 2026-04-22

### Added
Expand Down
35 changes: 32 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,26 +61,43 @@ The `.env` file is used when running things locally to store environment variabl

Running the app successfully requires a Postgres database and a connection to an Azure blob storage account. There is a docker compose setup which can be used to start an instance of each service locally, that can be run with:

```
```bash
docker compose up -d
```

The example `.env` file (`.env-example`) is configured to use the above docker compose setup. If you don't use the docker compose setup, then you will need to change the values in the `.env` file accordingly.

Once the docker compose setup is running, you can run the dataset updater part of the app with (this will download the datasets and upload them to Azurite):

```
```bash
dotenv run python src/iati_bulk_data_service.py -- --operation checker --single-run --run-for-n-datasets=50
```

You can run the zipper operation with:

```
```bash
dotenv run python src/iati_bulk_data_service.py -- --operation zipper --single-run
```

It will store the ZIP files in the directory defined in the `ZIP_WORKING_DIR` environment variable.

The full range of command line arguments is listed below:

```
usage: iati_bulk_data_service.py [-h] --operation {checker,zipper,registry-changes-processor} [--single-run] [--run-for-n-datasets RUN_FOR_N_DATASETS] [--run-for-single-reporting-org RUN_FOR_SINGLE_REPORTING_ORG] [--skip-safety]

options:
-h, --help show this help message and exit
--operation {checker,zipper,registry-changes-processor}
Operation to run: checker, downloader, registry-changes-processor
--single-run Perform a single run, then exit
--run-for-n-datasets RUN_FOR_N_DATASETS
Run on the first N datasets from registration service (useful for testing)
--run-for-single-reporting-org RUN_FOR_SINGLE_REPORTING_ORG
Run only for the datasets belonging to the specified reporting org short name (useful for testing)
--skip-safety Skip safety checks during the run (useful for testing)
```

To shutdown the docker compose setup, use (the Azure Service Bus emulator
appears to be a bit sensitive to Ctrl-C shutdowns, so always best to shutdown
with `docker compose down`):
Expand Down Expand Up @@ -222,6 +239,8 @@ pytest-watcher .

### Initial Provisioning

#### Bulk Data Service App

You can create an Azure-based instance of Bulk Data Service using the `azure-create-resources.sh` script. It must be run from the root of the repository, and it requires (i) the environment variable `BDS_DB_ADMIN_PASSWORD` to be set with the password for the database, and (ii) a single parameter which is the name of the environment/instance. For instance, the following command will create a dev instance:

```bash
Expand All @@ -232,6 +251,16 @@ This will create a resource group on Azure called `rg-bulk-data-service-dev`, an

At the end of its run, the `azure-create-resources.sh` script will print out various secrets which need to be added to Github Actions.

**NOTE**: This is only really useful for temporary deployment or initial setup; once you're setup with CI/CD, the GitHub action does all this.

#### Bulk Data Service Network and Public IP

The Bulk Data Service is deployed to a dedicated vnet with subnet and attached NAT Gateway which has a public IP. To ensure the IP remains, these are not destroyed and re-created on every release (like the Azure Container Instances are). To create the networks and public IPs for dev and production, run:

```bash
./azure-provision/create-vnets-public-ips.sh
```

### Deployment - Versioning

The app version is set in `pyproject.toml`, and this is read by the app to use in the `User-Agent` header. When making a new release, set the version here to the appropriate value. Then, when releasing the app using the normal IATI Python app deployment process, choose the tag name to match the version chosen.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,9 @@ properties: # Properties of container group
requests:
cpu: 1.0
memoryInGB: 0.5
subnetIds:
- id: "/subscriptions/#AZURE_SUBSCRIPTION_ID#/resourceGroups/rg-#APP_NAME#-vnets/providers/Microsoft.Network/virtualNetworks/#APP_NAME#-#TARGET_ENVIRONMENT#-vnet/subnets/#APP_NAME#-#TARGET_ENVIRONMENT#-subnet"
ipAddress:
type: "public"
dnsNameLabel: "#APP_NAME#-#TARGET_ENVIRONMENT#"
type: "private"
ports:
- port: 9158
9 changes: 5 additions & 4 deletions azure-deployment/generate-manifest-from-template.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
# by the generic 'build-and-deploy' Github action

if [ "$LOCAL_DEPLOY" == "true" ]; then
echo "Deploying from local environment..."
source ./azure-deployment/manual-azure-deploy-secrets.env
source ./azure-deployment/manual-azure-deploy-variables.env
echo "Deploying from local environment..."
source ./azure-deployment/manual-azure-deploy-secrets.env
source ./azure-deployment/manual-azure-deploy-variables.env
fi

# Copy the template to the manifest
Expand All @@ -21,6 +21,8 @@ sed -i "s^#APP_NAME#^$APP_NAME^g" ./azure-deployment/azure-resource-manager-depl
sed -i "s^#TARGET_ENVIRONMENT#^$TARGET_ENVIRONMENT^g" ./azure-deployment/azure-resource-manager-deployment-manifest.yml
sed -i "s^#DOCKER_IMAGE_TAG#^$DOCKER_IMAGE_TAG^g" ./azure-deployment/azure-resource-manager-deployment-manifest.yml

sed -i ''s^#AZURE_SUBSCRIPTION_ID#^$AZURE_SUBSCRIPTION_ID^g'' ./azure-deployment/azure-resource-manager-deployment-manifest.yml

sed -i ''s^#ACR_LOGIN_SERVER#^$ACR_LOGIN_SERVER^g'' ./azure-deployment/azure-resource-manager-deployment-manifest.yml
sed -i ''s^#ACR_USERNAME#^$ACR_USERNAME^g'' ./azure-deployment/azure-resource-manager-deployment-manifest.yml
sed -i ''s^#ACR_PASSWORD#^$ACR_PASSWORD^g'' ./azure-deployment/azure-resource-manager-deployment-manifest.yml
Expand All @@ -38,7 +40,6 @@ sed -i ''s^#DB_NAME#^$DB_NAME^g'' ./azure-deployment/azure-resource-manager-depl
sed -i ''s^#DB_SSL_MODE#^$DB_SSL_MODE^g'' ./azure-deployment/azure-resource-manager-deployment-manifest.yml
sed -i ''s^#DB_CONNECTION_TIMEOUT#^$DB_CONNECTION_TIMEOUT^g'' ./azure-deployment/azure-resource-manager-deployment-manifest.yml


# Variables which configure the behaviour of the Bulk Data Service

sed -i ''s^#DATA_REGISTRATION#^$DATA_REGISTRATION^g'' ./azure-deployment/azure-resource-manager-deployment-manifest.yml
Expand Down
91 changes: 91 additions & 0 deletions azure-deployment/manual-azure-deploy-from-local.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#!/usr/bin/env bash

# This script is for deploying the Bulk Data Service to Azure from a local machine, without using GitHub Actions.
# It is useful for testing significant changes to the Azure manifest and deployment procedure, which for dev and
# prod is normally run through GitHub Actions.

# NOTE: You will need to fill in the AZURE_SUBSCRIPTION_ID variable below before using.

set -uo pipefail

if [ ! -v "1" ]; then
echo "usage: $0 TARGET_ENVIRONMENT"
echo " TARGET_ENVIRONMENT should likely be 'test', 'dev', or 'prod'"
exit 1
fi

if [ ! -d ".git" ]; then
echo "$0: script must be run from the root of the bulk-data-service repository"
exit 1
fi

if [ ! -f "./azure-deployment/manual-azure-deploy-secrets.env" ]; then
echo "$0: there must be a file 'manual-azure-deploy-secrets.env' in"
echo "'azure-deployment' containing the secrets. See the examples in manual-azure-deploy-secrets-example.env'"
exit 1
fi

if [ ! -f "./azure-deployment/manual-azure-deploy-variables.env" ]; then
echo "$0: there must be a file 'manual-azure-deploy-variables.env' in"
echo "'azure-deployment' containing the config variables. See example: manual-azure-deploy-variables-example.env'"
exit 1
fi

(git remote -v 2>/dev/null | grep "IATI/bulk-data-service.git" >/dev/null) || (
echo "$0: script must be run from the root of the bulk-data-service repository"
exit 1
)

. ./azure-deployment/manual-azure-deploy-secrets.env

AZURE_SUBSCRIPTION_ID=" **** FILL IN **** "

TARGET_ENVIRONMENT=$1

APP_NAME=bulk-data-service

RESOURCE_GROUP_NAME="rg-${APP_NAME}-${TARGET_ENVIRONMENT}"

CONTAINER_GROUP_INSTANCE_NAME="aci-${APP_NAME}-${TARGET_ENVIRONMENT}"

DOCKER_IMAGE_TAG=$(git log -n1 --format=format:"%H")

LOCAL_DEPLOY=true

echo "Generating Azure ARM deployment manifest from template"
. ./azure-deployment/generate-manifest-from-template.sh

# build the docker image for the Bulk Data Service
docker build . -t "criati.azurecr.io/bulk-data-service-$TARGET_ENVIRONMENT:$DOCKER_IMAGE_TAG"

# push Bulk Data Service image to Azure
docker push "criati.azurecr.io/bulk-data-service-$TARGET_ENVIRONMENT:$DOCKER_IMAGE_TAG"

# now configure, build and push the docker image for the nginx reverse proxy

# create password file
htpasswd -BC 10 -c -b ./azure-deployment/nginx-reverse-proxy/htpasswd prom "$PROM_NGINX_REVERSE_PROXY_PASSWORD"

# make the image for the nginx reverse proxy (for putting HTTP basic auth on the
# prom client)
docker build ./azure-deployment/nginx-reverse-proxy -t "criati.azurecr.io/bds-prom-nginx-reverse-proxy-$TARGET_ENVIRONMENT:$DOCKER_IMAGE_TAG"

docker push "criati.azurecr.io/bds-prom-nginx-reverse-proxy-$TARGET_ENVIRONMENT:$DOCKER_IMAGE_TAG"

echo az container delete \
--resource-group "$RESOURCE_GROUP_NAME" \
--name "$CONTAINER_GROUP_INSTANCE_NAME"
az container delete \
--resource-group "$RESOURCE_GROUP_NAME" \
--name "$CONTAINER_GROUP_INSTANCE_NAME"

echo az container create \
--resource-group "$RESOURCE_GROUP_NAME" \
--vnet "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/rg-${APP_NAME}-vnets/providers/Microsoft.Network/virtualNetworks/${APP_NAME}-${TARGET_ENVIRONMENT}-vnet" \
--subnet "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/rg-${APP_NAME}-vnets/providers/Microsoft.Network/virtualNetworks/${APP_NAME}-${TARGET_ENVIRONMENT}-vnet/subnets/${APP_NAME}-${TARGET_ENVIRONMENT}-subnet" \
--file ./azure-deployment/azure-resource-manager-deployment-manifest.yml
az container create \
--resource-group "$RESOURCE_GROUP_NAME" \
--vnet "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/rg-${APP_NAME}-vnets/providers/Microsoft.Network/virtualNetworks/${APP_NAME}-${TARGET_ENVIRONMENT}-vnet" \
--subnet "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/rg-${APP_NAME}-vnets/providers/Microsoft.Network/virtualNetworks/${APP_NAME}-${TARGET_ENVIRONMENT}-vnet/subnets/${APP_NAME}-${TARGET_ENVIRONMENT}-subnet" \
--file ./azure-deployment/azure-resource-manager-deployment-manifest.yml
31 changes: 31 additions & 0 deletions azure-deployment/manual-azure-deploy-secrets-example.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This file is used when doing a manual Azure deploy from a local machine. It should
# contain the equivalent of the secrets that are stored in Github actions

ACR_LOGIN_SERVER=
ACR_USERNAME=
ACR_PASSWORD=

DOCKER_HUB_USERNAME=
DOCKER_HUB_TOKEN=

AZURE_STORAGE_CONNECTION_STRING=

AZURE_SERVICE_BUS_CONNECTION_STRING=

LOG_WORKSPACE_ID=
LOG_WORKSPACE_KEY=

DB_USER=
DB_PASS=
DB_HOST=
DB_PORT=
DB_NAME=
DB_SSL_MODE=require
DB_CONNECTION_TIMEOUT=30

PROM_NGINX_REVERSE_PROXY_PASSWORD=

DATA_REGISTRY_SUITECRM_API_URL=
DATA_REGISTRY_SUITECRM_CLIENT_ID=
DATA_REGISTRY_SUITECRM_CLIENT_SECRET=

Loading
Loading