Skip to content

Feat/add bulk presigned url#1344

Open
jacob50231 wants to merge 62 commits into
masterfrom
feat/add-bulk-presigned-url
Open

Feat/add bulk presigned url#1344
jacob50231 wants to merge 62 commits into
masterfrom
feat/add-bulk-presigned-url

Conversation

@jacob50231
Copy link
Copy Markdown
Contributor

Link to JIRA ticket if there is one:

New Features

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

Comment thread fence/blueprints/ga4gh.py Outdated
for access_id, object_ids in access_to_object_ids.items():
try:
result = bulk_get_signed_url_for_file(
file_ids=object_ids, requested_protocol=access_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to pass passports in here and make sure they are supported.
ga4gh_passports=bulk_request.passports

And we should reject if a request tries to give both a bearer token and a passport. See how the single-object drs endpoint works as reference e.g.

if request.headers.get("Authorization") and bulk_request.passports: raise Forbidden("Sending both a bearer token and passports is not allowed")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unmock bulk_signed_url_for_file in tests and test both token and passports

Comment thread fence/blueprints/ga4gh.py Outdated


class BulkObjectAccessRequest(BaseModel):
passports: Optional[list[str]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to be passports: Optional[list[str]] = None or else it will still require the field I think. Write a test to that doesn't pass a passports key just to make sure.

Comment thread fence/blueprints/data/indexd.py Outdated
indexd_server = config.get("INDEXD") or config["BASE_URL"] + "/index"
url = indexd_server + "/index/"
try:
res = requests.get(url + file_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main idea was to use a single POST to indexd's /bulk/documents endpoint here

Comment thread fence/blueprints/ga4gh.py
unresolved_by_code[error_code].extend(object_ids)
except NotFound as e:
logger.debug(f"NotFound fetching signed URLs for {access_id}: {e}")
unresolved_by_code[404].extend(object_ids)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A request for [valid_guid, missing_guid] under the same access ID returns both as unresolved even though one could have been resolved.

use indexd's bulk documents endpoint and split returned/missing IDs into resolved and unresolved lists

return {"url": signed_url}


@enable_audit_logging
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function that runs within this decorator ( /fence/resources/audit/utils.py -> create_audit_log_for_request()) only will send logs for certain requests. We need to add support for the new endpoints to actually sned the logs to audit service

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking into this and would like a bit of clarification on what it usually looks like in our codebase. Should have pushed changes for the rest of the comments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/uc-cdis/fence/blob/master/fence/resources/audit/utils.py#L66-L95 This segment of code checks the path and makes decisions on auditing based on it, so you'll need to update the handler here to account for it.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Integration Tests

Test summary after running integration tests

filepath passed failed skipped SUBTOTAL
tests/test_drs_endpoint.py 4 0 20 24
tests/test_audit_service.py 3 0 3 6
tests/test_ras_authn.py 0 0 3 3
tests/test_data_upload.py 8 0 1 9
tests/test_dbgap.py 4 0 1 5
tests/test_user_login_activation.py 2 0 1 3
tests/test_oauth2.py 15 0 0 15
tests/test_centralized_auth.py 16 0 0 16
tests/test_presigned_url.py 7 1 0 8
tests/test_user_token.py 5 0 0 5
tests/test_fence_admin.py 2 0 0 2
tests/test_oidc_client.py 2 0 0 2
tests/test_google_data_access.py 1 0 0 1
tests/test_client_credentials.py 1 0 0 1
tests/test_register_user.py 2 0 0 2
TOTAL 72 1 29 102

Test summary after rerunning failed integration tests

filepath failed SUBTOTAL
tests/test_presigned_url.py 1 1
TOTAL 1 1

Please find the detailed integration test report here

Please find the detailed integration test report after rerunning failed tests here

Please find the Github Action logs here



@blueprint.route("/download/bulk", methods=["POST"])
def download_bulk_files():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the swagger / openapi yamls to include this endpoint.

return {"url": signed_url}


@enable_audit_logging
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/uc-cdis/fence/blob/master/fence/resources/audit/utils.py#L66-L95 This segment of code checks the path and makes decisions on auditing based on it, so you'll need to update the handler here to account for it.

@github-actions
Copy link
Copy Markdown

Integration Tests

filepath SUBTOTAL
TOTAL 0

Please find the detailed integration test report here

Please find the Github Action logs here

@github-actions
Copy link
Copy Markdown

Integration Tests

Failed to Prepare CI environment.

The following pods are not in healthy state:
ohdsi-webapi-deployment-78c9c7b857-jslkw

Please find the Github Action logs here

@github-actions
Copy link
Copy Markdown

Integration Tests

filepath error SUBTOTAL
tests/test_aggregate_mds.py 1 1
tests/test_discoverypage.py 1 1
tests/test_etl.py 1 1
tests/test_pfb_export.py 1 1
tests/test_ras_authn.py 1 1
tests/test_study_viewer.py 1 1
TOTAL 6 6

Please find the detailed integration test report here

Please find the Github Action logs here

@github-actions
Copy link
Copy Markdown

Integration Tests

Test summary after running integration tests

filepath passed failed SUBTOTAL
tests/test_presigned_url.py 7 1 8
TOTAL 7 1 8

Test summary after rerunning failed integration tests

filepath failed SUBTOTAL
tests/test_presigned_url.py 1 1
TOTAL 1 1

Please find the detailed integration test report here

Please find the detailed integration test report after rerunning failed tests here

Please find the Github Action logs here

@github-actions
Copy link
Copy Markdown

Integration Tests

Test summary after running integration tests

filepath passed failed SUBTOTAL
tests/test_presigned_url.py 7 1 8
TOTAL 7 1 8

Test summary after rerunning failed integration tests

filepath failed SUBTOTAL
tests/test_presigned_url.py 1 1
TOTAL 1 1

Please find the detailed integration test report here

Please find the detailed integration test report after rerunning failed tests here

Please find the Github Action logs here

@github-actions
Copy link
Copy Markdown

Integration Tests

Test summary after running integration tests

filepath passed failed SUBTOTAL
tests/test_presigned_url.py 7 1 8
TOTAL 7 1 8

Test summary after rerunning failed integration tests

filepath failed SUBTOTAL
tests/test_presigned_url.py 1 1
TOTAL 1 1

Please find the detailed integration test report here

Please find the detailed integration test report after rerunning failed tests here

Please find the Github Action logs here

@github-actions
Copy link
Copy Markdown

Integration Tests

filepath passed SUBTOTAL
tests/test_presigned_url.py 8 8
TOTAL 8 8

Please find the detailed integration test report here

Please find the Github Action logs here

@github-actions
Copy link
Copy Markdown

Integration Tests

Test summary after running integration tests

filepath passed failed SUBTOTAL
tests/test_presigned_url.py 7 1 8
TOTAL 7 1 8

Test summary after rerunning failed integration tests

filepath failed SUBTOTAL
tests/test_presigned_url.py 1 1
TOTAL 1 1

Please find the detailed integration test report here

Please find the detailed integration test report after rerunning failed tests here

Please find the Github Action logs here


if file_authz:
authz_key = (
tuple(file_authz) if isinstance(file_authz, list) else file_authz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need tuple(sorted(file_authz)) so it's the same as the line 644

default=default_expires_in,
)

if ga4gh_passports and not config["GA4GH_PASSPORTS_TO_DRS_ENABLED"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move this check to the blueprint level before calling bulk_get_signed_url_for_file, ill put example there

Comment thread fence/blueprints/ga4gh.py
"You cannot supply both GA4GH passports and a token "
"in the Authorization header of a request."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the passports enabled check here:

if bulk_request.passports and not config["GA4GH_PASSPORTS_TO_DRS_ENABLED"]:
    raise NotSupported(
        "Using GA4GH Passports as a means of authentication and authorization "
        "is not supported by this instance of Gen3."
    )

auth_info = _get_auth_info_for_id_or_from_request(
sub_type=int, db_session=db_session
)
flask.g.audit_data = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so I think there is a specific case here where if they authenticate with a Passport but no bearer token, _get_auth_info_for_id_or_from_request will set "username": auth_info["username"] for the audit log to be anonymous. So we should check for passports before here, and if they are using passports set the flask.g.audit_data username using users_from_passports.items()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants