Skip to content

fix: initialize large-binary S3 bucket at startup to prevent NoSuchBucketException#4241

Merged
kunwp1 merged 4 commits intoapache:mainfrom
kunwp1:chris-fix-4239
Mar 19, 2026
Merged

fix: initialize large-binary S3 bucket at startup to prevent NoSuchBucketException#4241
kunwp1 merged 4 commits intoapache:mainfrom
kunwp1:chris-fix-4239

Conversation

@kunwp1
Copy link
Contributor

@kunwp1 kunwp1 commented Feb 26, 2026

What changes were proposed in this PR?

Previously, the texera-large-binaries S3 bucket was created lazily (only when a workflow first used a large-binary operator). If no large-binary had ever been used, the bucket would not exist. This caused a NoSuchBucketException warning every time WorkflowService.clearExecutionResources ran at workflow initialization, since it unconditionally attempted to delete objects from the bucket.

This fix mirrors the existing pattern used for the dataset bucket in FileService. The S3StorageClient.createBucketIfNotExist is now called once during FileService startup, ensuring the bucket always exists before any workflow execution or cleanup is attempted. The redundant per-call createBucketIfNotExist that was previously inside LargeBinaryManager.create() has also been removed, as it is no longer needed.

Any related issues, documentation, discussions?

Fixes #4239

How was this PR tested?

Check if the error message disappears after running any workflow

Was this PR authored or co-authored using generative AI tooling?

Claude-4.6

@kunwp1
Copy link
Contributor Author

kunwp1 commented Feb 26, 2026

I suggest @mengw15 to review this PR as he filed the issue.

@chenlica
Copy link
Contributor

@mengw15 Please review it.

Copy link
Contributor

@mengw15 mengw15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenlica
Copy link
Contributor

@aicam Please do another review.

@kunwp1 kunwp1 requested a review from aicam February 26, 2026 06:18
Copy link
Contributor

@aicam aicam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all S3 related operations should be in File Service for consistency

@github-actions github-actions bot added service and removed engine labels Mar 19, 2026
@kunwp1
Copy link
Contributor Author

kunwp1 commented Mar 19, 2026

@aicam Addressed. Can you review it again?

Copy link
Contributor

@aicam aicam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kunwp1 kunwp1 merged commit a7c3341 into apache:main Mar 19, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running a workflow triggers S3 NoSuchBucketException for 'texera-large-binaries'

4 participants