Skip to content

WIP: Just for reference#1735

Draft
paul-nechifor wants to merge 2 commits intodevfrom
paul/feat/dimsim
Draft

WIP: Just for reference#1735
paul-nechifor wants to merge 2 commits intodevfrom
paul/feat/dimsim

Conversation

@paul-nechifor
Copy link
Copy Markdown
Contributor

No description provided.

@paul-nechifor paul-nechifor marked this pull request as draft April 3, 2026 04:02
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR introduces a new DimSimConnection backend for the Unitree Go2 robot, refactors GlobalConfig.simulation from a bool to a named str (e.g. \"mujoco\", \"dimsim\"), adds a JPEG fast-path in Image.lcm_decode, and wires DimSimConnection into the existing make_connection factory and PersonFollowSkillContainer.

  • P1 \u2014 stderr consumed before error handler reads it (dimsim_connection.py lines 204\u2013215): _start_log_reader() drains the stderr pipe in a background thread before the readiness-wait loop; the early-exit handler\u2019s stderr.read() will almost always return empty bytes.
  • P1 \u2014 @functools.cache on instance methods (dimsim_connection.py lines 357\u2013385): the cache lives at the class level with self as key, causing memory leaks and making cache_clear() in stop() clear entries for all instances globally.

Confidence Score: 3/5

Not safe to merge as-is; two P1 bugs in the new DimSimConnection affect error diagnostics and memory/cache correctness.

The two P1 findings in dimsim_connection.py — the stderr/log-reader race that silently drops error output on startup failure, and the class-level functools.cache that leaks memory and incorrectly clears other instances’ caches — both represent present defects in the changed code that should be fixed before merging.

dimos/robot/unitree/dimsim_connection.py requires attention for the stderr drain race and the @functools.cache instance-method misuse.

Important Files Changed

Filename Overview
dimos/robot/unitree/dimsim_connection.py New DimSim simulator connection; has two P1 issues: stderr pipe is drained by the log-reader thread before the early-exit error handler can read it, and @functools.cache on instance methods causes a class-level cache leak and incorrect cross-instance invalidation in stop().
dimos/core/global_config.py Changes simulation from bool to str; adds dimsim_scene and dimsim_port fields. The unitree_connection_type property correctly returns the string value when set.
dimos/agents/skills/person_follow.py Updated to branch on string comparisons for "mujoco"/"dimsim" and wraps value in bool() for VisualServoing2D; straightforward adaptation.
dimos/msgs/sensor_msgs/Image.py Adds a JPEG fast-path in lcm_decode that delegates to lcm_jpeg_decode when encoding is "jpeg", avoiding incorrect raw-array reshape.
dimos/robot/unitree/go2/connection.py Adds a "dimsim" branch in make_connection that instantiates DimSimConnection; mirrors the existing "mujoco" pattern cleanly.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant DimSimConnection
    participant subprocess as DimSim Process
    participant LCMThread as LCM Listener Thread
    participant LogReader as Log Reader Thread
    participant StreamThread as Stream Polling Thread

    Caller->>DimSimConnection: start()
    DimSimConnection->>subprocess: Popen(dimsim dev ...)
    DimSimConnection->>LogReader: _start_log_reader() [reads stdout/stderr]
    DimSimConnection->>LCMThread: _start_lcm_listener()
    loop Wait for odom (60s timeout)
        DimSimConnection->>subprocess: process.poll()
        subprocess-->>DimSimConnection: None (still running)
        LCMThread-->>DimSimConnection: _on_lcm_message("/odom") _odom_seq++
    end
    DimSimConnection->>Caller: ready
    Caller->>DimSimConnection: video_stream()
    DimSimConnection->>StreamThread: _create_stream(getter, 20fps)
    loop 20 FPS
        StreamThread-->>Caller: observer.on_next(Image)
    end
    Caller->>DimSimConnection: stop()
    DimSimConnection->>StreamThread: stop_event.set()
    DimSimConnection->>LCMThread: _stop_event.set()
    DimSimConnection->>subprocess: terminate() / kill()
Loading

Reviews (1): Last reviewed commit: "002" | Re-trigger Greptile

Comment on lines +204 to +217
self._start_log_reader()
self._start_lcm_listener()

# Wait for first odom message as readiness signal.
timeout = 60.0
start_time = time.time()
while time.time() - start_time < timeout:
if self.process.poll() is not None:
exit_code = self.process.returncode
stderr = ""
if self.process.stderr:
stderr = self.process.stderr.read().decode(errors="replace")
self.stop()
raise RuntimeError(f"DimSim process exited early (code {exit_code})\n{stderr}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 stderr already consumed by log reader thread

_start_log_reader() launches a daemon thread at line 204 that iterates over self.process.stderr. Then, when the early-exit branch is reached (line 211), self.process.stderr.read() at line 215 races with that thread — the log reader has likely already drained the pipe, so stderr will almost always be empty, making the error message useless for diagnosing why DimSim failed to start.

Either start the log reader after the readiness-wait loop, or buffer the subprocess output with a deque in the log reader thread so the error handler can read from that buffer instead of the pipe directly.

Comment on lines +357 to +385
@functools.cache
def lidar_stream(self) -> Observable[PointCloud2]:
def getter() -> PointCloud2 | None:
if self._lidar_seq > self._last_lidar_seq:
self._last_lidar_seq = self._lidar_seq
return self._latest_lidar
return None

return self._create_stream(getter, _LIDAR_FPS, "Lidar")

@functools.cache
def odom_stream(self) -> Observable[PoseStamped]:
def getter() -> PoseStamped | None:
if self._odom_seq > self._last_odom_seq:
self._last_odom_seq = self._odom_seq
return self._latest_odom
return None

return self._create_stream(getter, _ODOM_FPS, "Odom")

@functools.cache
def video_stream(self) -> Observable[Image]:
def getter() -> Image | None:
if self._image_seq > self._last_image_seq:
self._last_image_seq = self._image_seq
return self._latest_image
return None

return self._create_stream(getter, _VIDEO_FPS, "Video")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 @functools.cache on instance methods — class-level cache causes leaks and cross-instance invalidation

functools.cache (an alias for lru_cache(maxsize=None)) stores its cache on the function object, not on self. The result is:

  1. Memory leak — the cache keeps a strong reference to every self that was ever passed as a key, preventing garbage collection.
  2. cache_clear() in stop() is global — calling self.lidar_stream.cache_clear() clears the entries for all DimSimConnection instances, not just the one being stopped.

The standard fix is to store the Observable in an instance dict on first call and reset it in stop() instead of relying on cache_clear().

Comment on lines +250 to +252
if self.process:
if self.process.stderr:
self.process.stderr.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 stdout pipe is never closed in stop()

stop() closes self.process.stderr but leaves self.process.stdout open, leaking the file descriptor. Consider closing both:

Suggested change
if self.process:
if self.process.stderr:
self.process.stderr.close()
if self.process.stderr:
self.process.stderr.close()
if self.process.stdout:
self.process.stdout.close()

Comment on lines +359 to +365
def getter() -> PointCloud2 | None:
if self._lidar_seq > self._last_lidar_seq:
self._last_lidar_seq = self._lidar_seq
return self._latest_lidar
return None

return self._create_stream(getter, _LIDAR_FPS, "Lidar")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unsynchronised access to shared state in stream getters

The getter closures read and write self._lidar_seq, self._last_lidar_seq, and self._latest_lidar from polling threads while the LCM callback thread writes them concurrently. There is a TOCTOU window where _latest_lidar can be None even after the sequence check passes. A threading.Lock protecting reads and writes to these pairs would close the window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant