Add support for Markovian Text Environments by DhananjayAshok · Pull Request #1094 · NovaSky-AI/SkyRL

DhananjayAshok · 2026-02-12T23:53:49Z

The Markov assumption considers the state of an environment to be fully described by the present state of the system (i.e. given the present state, the past is irrelevant for assessing transition dynamics, etc.).

This is a useful framework that applies in a range of scenarios:

Chat domains where there is an external summarisation system independent of the agent, that provides complete information in the context of each individual prompt
Text-based games where the agent sees an "observation" that fully describes its current situation.
Long context domains where the agent can summarise its own history and hence avoid a blow-up in input length.

Currently, SkyRL always provides the full chat history to the generator, even in the step_wise_trajectories mode. This change adds an option to set a flag: generator.previous_observations_only that will modify this to only provide the previous observation as input.

Merging in most recent

gemini-code-assist

Code Review

This pull request introduces support for Markovian text environments by adding a previous_observation_only flag. The changes are well-implemented, including necessary configuration updates, validation logic, and modifications to the agent loop to use only the previous observation when the flag is enabled. The fix to use initial_input_ids for prompt_ids is a good catch. I have one suggestion to improve the robustness of an assertion.

gemini-code-assist · 2026-02-12T23:56:10Z

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

+                    assert (
+                        use_chat_history != []
+                    ), "previous_observation_only is enabled, but recieved an empty observation from the environment. Ensure that the environment always returns at least one observation message when `step` is called."


For better robustness, it's recommended to use a ValueError check instead of an assert. assert statements are disabled when Python is run in optimized mode (e.g., with the -O flag), which could lead to this important check being skipped in production. Since this validates input from a potentially user-defined environment, a ValueError ensures the check is always performed. I've also corrected a typo in the message.

if not use_chat_history: raise ValueError("previous_observation_only is enabled, but received an empty observation from the environment. Ensure that the environment always returns at least one observation message when `step` is called.")

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

devin-ai-integration

Devin Review found 1 new potential issue.

View 9 additional findings in Devin Review.

devin-ai-integration · 2026-02-20T00:31:07Z

skyrl-train/skyrl_train/utils/trainer_utils.py

                    "output_response": output_responses[i],
                    "score": concat_generator_outputs["rewards"][i],
                    "stop_reason": concat_generator_outputs.get("stop_reasons", [None] * len(input_prompts))[i],
+                    "is_last_step": concat_generator_outputs.get("is_last_step", [None] * len(input_prompts))[i],


🔴 None[i] TypeError crash in dump_per_dataset_eval_results for non-step-wise evaluation

When dump_per_dataset_eval_results is called from the non-step-wise evaluate() path (i.e., step_wise_trajectories=False), the new "is_last_step" line will crash with TypeError: 'NoneType' object is not subscriptable.

Root Cause

In non-step-wise mode, the generator output includes "is_last_step": None (skyrl-train/skyrl_train/generators/skyrl_gym_generator.py:761). When this None value propagates through concatenate_generator_outputs into concat_generator_outputs, the new line:

"is_last_step": concat_generator_outputs.get("is_last_step", [None] * len(input_prompts))[i],

attempts dict.get("is_last_step", default). Since the key "is_last_step" exists in the dict (with value None), Python's dict.get() returns None — NOT the default list. The subsequent [i] indexing on None raises TypeError.

This is in contrast with the "trajectory_ids" handling added in the same PR, which correctly uses a truthy guard:

"instance_id": ( concat_generator_outputs["trajectory_ids"][i].instance_id if concat_generator_outputs.get("trajectory_ids") else None ),

Impact: Any non-step-wise evaluation run with dump_eval_results=True (the default) will crash when writing eval results, preventing eval output from being saved.

Suggested change

"is_last_step": concat_generator_outputs.get("is_last_step", [None] * len(input_prompts))[i],

"is_last_step": (

concat_generator_outputs["is_last_step"][i]

if concat_generator_outputs.get("is_last_step")

else None

),

Was this helpful? React with 👍 or 👎 to provide feedback.

DhananjayAshok and others added 3 commits February 12, 2026 14:27

Added config set up for previous_observation_only

3a4bd06

Added previous_observation_only logic to agent_loop

6e68e7c

Merge pull request #1 from NovaSky-AI/main

f34d3c8

Merging in most recent

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

devin-ai-integration bot reviewed Feb 13, 2026

View reviewed changes

Trying to set up wandb. Might break

e6872e9

devin-ai-integration bot reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Markovian Text Environments#1094

Add support for Markovian Text Environments#1094
DhananjayAshok wants to merge 4 commits intoNovaSky-AI:mainfrom
DhananjayAshok:main

DhananjayAshok commented Feb 12, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-                    "is_last_step": concat_generator_outputs.get("is_last_step", [None] * len(input_prompts))[i],
+                    "is_last_step": (
+                        concat_generator_outputs["is_last_step"][i]
+                        if concat_generator_outputs.get("is_last_step")
+                        else None
+                    ),

Conversation

DhananjayAshok commented Feb 12, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DhananjayAshok commented Feb 12, 2026 •

edited by devin-ai-integration bot

Loading