Skip to content

Zenoh Investigation #1726

@leshy

Description

@leshy

Intro

Dimos is distributed multiprocess architecture with predefined message types exchanged between processes using predefined transports

Most of our code is in Python, python is our prototyping and orchestration language, but we support modules written in multiple third party languages

Problem

Our transports until now have been on easy mode - local RPC/pubsub only, so we've had guaranteed delivery, and we were mostly concerned with backpressure, throughput and latency. LCM protocol (local UDP multicast worked great)

As we start working with teleop, multi embodiment coordination (on shared networks of different underlying capabilities) heavy real time computation offloaded into remote clusters and similar, we need to start thinking about more mature communication protocols.

Zenoh seems promising!

some intro questions (I'm sure there are many more critical ones, this is just to kickstart the way we think about this)

  • can we get reliable query/response multi robot system on a local unreliable networks (assume from WIFI to GSM to LORA mesh)
  • can we get a pubsub? how does this work? autodiscovery, distributed vs broker, how would we run a broker if multirobot? how does distributed pubsub work?
  • can we run a single global mapper node via that pubsub (collecting lidar frames from all robots, assembling global map, planning, streaming paths back to robots and their local planners)
  • assume we need to upload video from multiple robots sharing a local robot network, with single (or just a few) robots having an internet connection
  • assume we need to run teleop, so transport efficiently encoded low latency video in one direction, and light control messages in other direction (NAT penetration?)
  • assume we are on a high throughput network with internet connection, can we upload all sensor data in realtime into a cluster? can we stream control commands to the robot?
  • assume we want to run requests to a remote server, potentially stream responses, requests could be image frames, LLM queries etc
  • can we have global, local, internet level pubsubs, how do we control what goes where and when?
  • where are backpressure strategies executed, broker, receiver, emitter? (and which ones) what is the appropriate backpressure approach in different scenarios?
  • what are the QOS strategies available?

other useful features

Zenoh is fancy, idk what it has, but stuff like this is interesting to know about:

  • etcd style distributed KV store?
  • raft style automatic master node election?
  • meshing?

Example scenario

Write some zenoh transport, configure it well enough to split the system, local control, remote global mapping and path planning on a laptop, or, slow GPU heavy module offloaded to a server

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions