Skip to content

Add retry logic for failed BEAST2 simulations with cleanup#51

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/improve-simulation-error-handling
Draft

Add retry logic for failed BEAST2 simulations with cleanup#51
Copilot wants to merge 5 commits into
mainfrom
copilot/improve-simulation-error-handling

Conversation

Copy link
Copy Markdown

Copilot AI commented Jan 5, 2026

Failed simulations were not being handled gracefully—they would leave behind intermediate files and not attempt recovery. This adds automatic retry with seed increment and proper cleanup.

Changes

  • Retry loop: Failed simulations (timeout or error) automatically retry up to max_retries times (default: 3) with an incremented seed
  • Cleanup on failure: Removes .tree and .traj files before each retry; cleans up .time file only on final failure
  • Seed validation: Validates beast_seed_str is a valid integer upfront
  • Time tracking: .time file now records total elapsed time across all attempts (only written on success)

Usage

The function signature adds an optional max_retries parameter:

run_beast2_simulations_parallel(
    simulation_xml_list, 
    num_jobs=NUM_WORKERS, 
    beast_seed_str=str(CONFIG["seed"]),
    max_retries=3  # optional, defaults to 3
)

Each retry uses seed base_seed + attempt - 1, so attempts get seeds 42, 43, 44 for a base seed of 42.

Original prompt

This section details on the original issue you should resolve

<issue_title>What happens to the failed simulations?</issue_title>
<issue_description>At the moment, we simulate more epidemics than needed so that even if a couple of fail we still have enough to form the desired dataset. It would be better to have some safer code that checked if a simulation had worked and either clean up and retry, or moved on depending upon the results.

The following subfunction, run_beast2(simulation_xml), probably needs to be expanded to check if a simulation has failed and if so, then to re-run it with a different seed (resampling the parameters could get very tricky so seems too risky). But this is made much harder because there is a seed parameter that is shared across all the subruns and would need to be handled in a different way. Potentially creating a local copy and then calling itself if it fails but with the local copy incremented or something. Intermediate files from the failed run would also need to be cleaned up.

def run_beast2(simulation_xml):

</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits January 5, 2026 13:53
Co-authored-by: aezarebski <36244258+aezarebski@users.noreply.github.com>
Co-authored-by: aezarebski <36244258+aezarebski@users.noreply.github.com>
…ssues

Co-authored-by: aezarebski <36244258+aezarebski@users.noreply.github.com>
Co-authored-by: aezarebski <36244258+aezarebski@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve error handling for failed simulations Add retry logic for failed BEAST2 simulations with cleanup Jan 5, 2026
Copilot AI requested a review from aezarebski January 5, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

What happens to the failed simulations?

2 participants