Skip to content

RFC: Environment Wrap Actions#130

Open
leongdl wants to merge 1 commit into
OpenJobDescription:mainlinefrom
leongdl:taskRunWrap
Open

RFC: Environment Wrap Actions#130
leongdl wants to merge 1 commit into
OpenJobDescription:mainlinefrom
leongdl:taskRunWrap

Conversation

@leongdl
Copy link
Copy Markdown

@leongdl leongdl commented Apr 17, 2026

PR for RFC

Tracking Issue: #132

This is a request for comments about RFC 0008 — Environment Wrap Actions, which
extends <Environment> with three new session actions — onWrapEnter,
onWrapTaskRun, and onWrapExit — that let an environment template intercept and
wrap the lifecycle actions of inner environments and tasks. A companion opt-out,
runOnHost: true on <Action>, lets individual actions bypass wrapping when they
must run on the host (credential fetching, mount setup, cleanup that must always
run).

The primary motivation is portable container support: a Docker or Apptainer
environment template can start a container in onEnter, route every inner
environment's onEnter/onExit and every task's onRun into the container via the
three wrap hooks, and stop the container in onExit. Job templates and inner
environments remain portable across Conda, Rez, Docker, and Apptainer. The design
also generalizes to remote execution, session-wide instrumentation, and privilege
isolation.

This RFC is gated by the new WRAP_ACTIONS extension and depends on RFC 0002
(Model Extensions), RFC 0005 (Expression Language), and RFC 0006 (Expression
Function Library — for repr_sh() and friends).


By submitting this pull request, I confirm that you can use, modify, copy, and
redistribute this contribution, under the terms of your choice.

@leongdl leongdl changed the title [WIP] RFC: Environment On Task run wrap RFC: Environment On Task run wrap May 9, 2026
@leongdl leongdl changed the title RFC: Environment On Task run wrap RFC: Environment Wrap Actions May 9, 2026
@leongdl leongdl marked this pull request as ready for review May 9, 2026 21:46
@leongdl leongdl requested a review from a team as a code owner May 9, 2026 21:46
Signed-off-by: David Leong <116610336+leongdl@users.noreply.github.com>
intercept and wrap the lifecycle actions of *inner* environments and tasks. The runtime
supplies each wrap action with the wrapped action's command, args, timeout, cancelation
method, and environment variables as template variables. A companion opt-out,
`runOnHost: true` on `<Action>`, lets individual actions bypass wrapping when they
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this runOnHost also be controlled by the wrapping environment? Then an author of an external environment could have more control over this. If we leave this as is we might want to find a different name, because the distinguishing factor is that it skips the wrapping - both wrapped and non-wrapped actions are running on the host.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, unless the wrapping environment inspects and has logic for every line in the wrapped action?

Something like exporting environment variables on the host would be one case that is hard to detect

(And agree on the naming is hard)

`runOnHost: true` on `<Action>`, lets individual actions bypass wrapping when they
must run on the host (credential fetching, mount setup, cleanup that must always run).

The primary motivation is container support: a Docker or Apptainer environment template
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the primary motivation, but it's worth mentioning that we want this to be a general composable feature to the extent possible. If we can avoid all references to containers in the structure of the extension (vs the docs and examples which should have many), I think that will be useful.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, and makes sense.

onWrapTaskRun:
command: "bash"
args: ["{{Env.File.WrapTaskRun}}"]
timeout: "{{Task.Timeout}}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between Env.Wrapped.Timeout and Task.Timeout feels unnecessary. Switching to either Task.Wrapped.Timeout, or Env.Timeout would make it consistent. Since the timeout is on the action itself, maybe it could be Task.Action.Timeout and Env.Action.Timeout?

We should also think about the general forwarding mechanism. How will we forward #118 when we make that? Can there be a general mechanism, or will we give the template author responsibility to plumb through the features?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I did not see 118, let me read and incorporate it.

### Apptainer environment template

The same pattern applies to Apptainer (daemonless, each wrap hook invokes
`apptainer exec` directly rather than exec'ing into a running container). See
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it run as a daemon? This seems necessary for many use cases.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think daemonless was meaning the "apptainer" daemon, like the persistent "docker daemon", not necessarily referring to the container as a daemon. Maybe a framing difference here.

[Appendix A: Apptainer environment template](#appendix-a-apptainer-environment-template)
for the full example.

### Job template that works with any environment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be able to point to our existing samples. Maybe add a technical requirement that the way we've been writing jobs already must work with or without wrapping to the extent we can do that.

I guess the purpose of this is to show the runOnHost option?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - exactly. The RunOnHost was an escape hatch.

Lets chat offline how we can offer the escape hatch. Either within the hook, or as an explicit declaration.

often where performance problems hide.

5. **Privilege isolation.** Run inner actions as a different user or with reduced
capabilities by wrapping the command with `sudo -u`, `unshare`, or a jailed shell.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the inner environments and task runs can select runOnHost then this use case is weaker.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the runOnHost escape hatch would change this. But nonetheless, examples such as running a container would be able to use the sudo -u.

I'll re-frame this as a addendum use case, not necessarily an argument.

4. **Cross-OS wrapping.** The same-path bind-mount requirement assumes the host and
the wrapped execution context share path-separator conventions (Linux host with
Linux container, or Windows with Windows). Cross-OS wrapping (e.g., a Windows host
launching a Linux container) is not supported by this RFC.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The responsibility of the RFC would be to provide all information necessary to do this Cross-OS wrapping, not to assist in any way. I think it does that?

Copy link
Copy Markdown
Author

@leongdl leongdl May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, let me delete it.

Along the way I was thinking - why wrap windows containers in linux, and vice versa. Although linux containers on windows absolutely work but is terrible thorugh WSL.

Was originall thinking to cut some scope or defer this family of cases for later.


### Security: container isolation boundaries

Each container instance should be scoped to a single security boundary. In the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec isn't about containers, it's about the ability to perform action wrapping reliably. I think the scheduler doesn't have any responsibility about "container," its responsibility is purely about providing all the specified arguments to the wrap* actions. The RFC should focus exclusively on the wrapping part where it is about the OpenJD behavior, and talk about containers in example contexts that are about containers.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - I'll do a pass and move this into my separate design doc instead.

2. The container does not run with elevated privileges (`--privileged`) unless explicitly
configured by the environment template author.
3. Bind mounts are scoped to the session working directory and explicitly declared paths,
not the entire host filesystem.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are scheduler responsibilities - they are the responsibility of the environment implementer writing support for containers. The spec should ensure that the environment has sufficient capabilities to do what it needs to do.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I think here, AI here thought "deadline" overall as the scheduler, I'll delete these.

```

1. *onEnter* — The action to run when entering the environment.
2. *onWrapEnter* — If provided, this action is run instead of the `onEnter` action of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an all or nothing rule. Either all onWrap* must be included, or none of them. This would help against accidentally implementing just part of the wrapping, then getting hard to debug results because not everything is running where expected.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I'll add that into the spec.

While the template yaml / json is slightly larger it is much harder to do the wrong thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants