Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,17 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

### Removed

## [1.4.5] - 2026-05-06

### Changed

- Updated IATI Design System to 4.9.0

### Fixed

- Bug where the dataset's cached URLs were not being blanked after dataset expiry. (Resolves #137)
- Bug where `most_recent_head_attempt.error_occurred` was being set to `null` instead of `false`. (Resolves #136).

## [1.4.4] - 2026-04-22

### Added
Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,26 +61,43 @@ The `.env` file is used when running things locally to store environment variabl

Running the app successfully requires a Postgres database and a connection to an Azure blob storage account. There is a docker compose setup which can be used to start an instance of each service locally, that can be run with:

```
```bash
docker compose up -d
```

The example `.env` file (`.env-example`) is configured to use the above docker compose setup. If you don't use the docker compose setup, then you will need to change the values in the `.env` file accordingly.

Once the docker compose setup is running, you can run the dataset updater part of the app with (this will download the datasets and upload them to Azurite):

```
```bash
dotenv run python src/iati_bulk_data_service.py -- --operation checker --single-run --run-for-n-datasets=50
```

You can run the zipper operation with:

```
```bash
dotenv run python src/iati_bulk_data_service.py -- --operation zipper --single-run
```

It will store the ZIP files in the directory defined in the `ZIP_WORKING_DIR` environment variable.

The full range of command line arguments is listed below:

```
usage: iati_bulk_data_service.py [-h] --operation {checker,zipper,registry-changes-processor} [--single-run] [--run-for-n-datasets RUN_FOR_N_DATASETS] [--run-for-single-reporting-org RUN_FOR_SINGLE_REPORTING_ORG] [--skip-safety]

options:
-h, --help show this help message and exit
--operation {checker,zipper,registry-changes-processor}
Operation to run: checker, downloader, registry-changes-processor
--single-run Perform a single run, then exit
--run-for-n-datasets RUN_FOR_N_DATASETS
Run on the first N datasets from registration service (useful for testing)
--run-for-single-reporting-org RUN_FOR_SINGLE_REPORTING_ORG
Run only for the datasets belonging to the specified reporting org short name (useful for testing)
--skip-safety Skip safety checks during the run (useful for testing)
```

To shutdown the docker compose setup, use (the Azure Service Bus emulator
appears to be a bit sensitive to Ctrl-C shutdowns, so always best to shutdown
with `docker compose down`):
Expand Down
5 changes: 5 additions & 0 deletions db-migrations/20260505_01_7kh1j.rollback.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
alter table iati_datasets
alter column most_recent_head_attempt_error_occurred drop default;

alter table iati_datasets
alter column most_recent_get_attempt_error_occurred drop default;
16 changes: 16 additions & 0 deletions db-migrations/20260505_01_7kh1j.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
--
-- depends: 20250827_01_Dt6Ow

alter table iati_datasets
alter column most_recent_head_attempt_error_occurred set default false;

alter table iati_datasets
alter column most_recent_get_attempt_error_occurred set default false;

update iati_datasets
set most_recent_head_attempt_error_occurred = false
where most_recent_head_attempt_error_occurred is null;

update iati_datasets
set most_recent_get_attempt_error_occurred = false
where most_recent_get_attempt_error_occurred is null;
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "bulk-data-service"
version = "1.4.4"
version = "1.4.5"
requires-python = ">= 3.12.6"
readme = "README.md"
dependencies = [
Expand Down
4 changes: 3 additions & 1 deletion src/bulk_data_service/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,11 @@
def create_empty_dataset() -> dict[str, Any]:
empty_ds = {
k: None for k in DATASET_REGISTRATION_FIELDS + DATASET_NON_REGISTRATION_FIELDS
} # type: dict[str, str | None]
} # type: dict[str, str | bool | None]
empty_ds["most_recent_get_attempt_error_details"] = make_http_attempt_error_details()
empty_ds["most_recent_get_attempt_error_occurred"] = False
empty_ds["most_recent_head_attempt_error_details"] = make_http_attempt_error_details()
empty_ds["most_recent_head_attempt_error_occurred"] = False
return empty_ds


Expand Down
4 changes: 4 additions & 0 deletions src/bulk_data_service/dataset_remover.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ def remove_download_for_expired_dataset(
"last good download from Bulk Data Service".format(bds_dataset["id"], max_hours)
)

bds_dataset["last_known_good_dataset_cached_dataset_xml_url"] = None
bds_dataset["last_known_good_dataset_cached_dataset_xml_etag"] = None
bds_dataset["last_known_good_dataset_cached_dataset_zip_url"] = None
bds_dataset["last_known_good_dataset_cached_dataset_zip_etag"] = None
bds_dataset["last_known_good_dataset_downloaded"] = None
bds_dataset["last_known_good_dataset_hash"] = None
bds_dataset["last_known_good_dataset_hash_excluding_generated_timestamp"] = None
Expand Down
2 changes: 2 additions & 0 deletions tests/integration/test_dataset_add.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ def test_add_downloadable_dataset_for_various_encodings(

check_most_recent_http_attempt_for_success("get", datasets_in_bds[dataset_id])

assert datasets_in_bds[dataset_id]["most_recent_head_attempt_error_occurred"] is False

check_last_known_good_dataset_values_are_set(datasets_in_bds[dataset_id])

check_dataset_fields(
Expand Down
5 changes: 5 additions & 0 deletions tests/integration/test_dataset_expiry.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ def test_dataset_expiry_after_72_hours_failed_downloads(get_and_clear_up_context
dataset = datasets_in_bds[uuid.UUID("c8a40aa5-9f31-4bcf-a36f-51c1fc2cc159")]

assert len(datasets_in_bds) == 1

assert dataset["last_known_good_dataset_cached_dataset_xml_url"] is None
assert dataset["last_known_good_dataset_cached_dataset_xml_etag"] is None
assert dataset["last_known_good_dataset_cached_dataset_zip_url"] is None
assert dataset["last_known_good_dataset_cached_dataset_zip_etag"] is None
assert dataset["last_known_good_dataset_downloaded"] is None
assert dataset["last_known_good_dataset_hash"] is None
assert dataset["last_known_good_dataset_hash_excluding_generated_timestamp"] is None
Expand Down
7 changes: 7 additions & 0 deletions tests/unit/test_dataset_registration.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import pytest

from bulk_data_service.dataset import create_empty_dataset
from dataset_registration.iati_registry_ckan import clean_datasets_metadata, convert_datasets_metadata


Expand Down Expand Up @@ -42,6 +43,12 @@ def test_incomplete_necessary_data_from_ckan(field_blanker, attribute_value):
assert(len(ckan_datasets) == 0)


def test_create_empty_dataset_error_occurred_defaults_to_false():
ds = create_empty_dataset()
assert ds["most_recent_head_attempt_error_occurred"] is False
assert ds["most_recent_get_attempt_error_occurred"] is False


@pytest.mark.parametrize("resources_value", [None, [], {"url": None}])
def test_missing_url_from_ckan(resources_value):

Expand Down
2 changes: 1 addition & 1 deletion web/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<head>
<meta charset="UTF-8">
<title>IATI Bulk Data Service</title>
<link href="https://cdn.jsdelivr.net/npm/iati-design-system@4.0.0/dist/css/iati.min.css" rel="stylesheet" />
<link href="https://cdn.jsdelivr.net/npm/iati-design-system@4.9.0/dist/css/iati.min.css" rel="stylesheet" />
</head>

<body class="iati-design-system--enabled">
Expand Down
2 changes: 1 addition & 1 deletion web/index-template.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<head>
<meta charset="UTF-8">
<title>IATI Bulk Data Service</title>
<link href="https://cdn.jsdelivr.net/npm/iati-design-system@4.6.0/dist/css/iati.min.css" rel="stylesheet" />
<link href="https://cdn.jsdelivr.net/npm/iati-design-system@4.9.0/dist/css/iati.min.css" rel="stylesheet" />
</head>

<body class="iati-design-system--enabled">
Expand Down
Loading