User Story
As a developer, I would like to retry a different daemon if one fails so that the job can proceed uninterrupted.
Detailed Description
From the client's perspective, failures can be separated into the following categories:
- "Successfully delivered" failure: the remote task completed, but returned a non-zero exit code. This should not result in a retry, as the entire graph has transitively failed.
- Failure to execute the task: the daemon reports to the client that the job could not be ran, for example because the Docker container could not be started. The client should retry with a different daemon.
- Timeout: this should be interpreted as network failure, and the client should retry with a different daemon, but not be surprised if a result does come in later from the daemon it gave up on.
Tasks
Acceptance Criteria
Estimated points
3
User Story
As a developer, I would like to retry a different daemon if one fails so that the job can proceed uninterrupted.
Detailed Description
From the client's perspective, failures can be separated into the following categories:
Tasks
Acceptance Criteria
Estimated points
3