-
Notifications
You must be signed in to change notification settings - Fork 394
Zenoh Investigation #1726
Description
Intro
Dimos is distributed multiprocess architecture with predefined message types exchanged between processes using predefined transports
Most of our code is in Python, python is our prototyping and orchestration language, but we support modules written in multiple third party languages
Problem
Our transports until now have been on easy mode - local RPC/pubsub only, so we've had guaranteed delivery, and we were mostly concerned with backpressure, throughput and latency. LCM protocol (local UDP multicast worked great)
As we start working with teleop, multi embodiment coordination (on shared networks of different underlying capabilities) heavy real time computation offloaded into remote clusters and similar, we need to start thinking about more mature communication protocols.
Zenoh seems promising!
some intro questions (I'm sure there are many more critical ones, this is just to kickstart the way we think about this)
- can we get reliable query/response multi robot system on a local unreliable networks (assume from WIFI to GSM to LORA mesh)
- can we get a pubsub? how does this work? autodiscovery, distributed vs broker, how would we run a broker if multirobot? how does distributed pubsub work?
- can we run a single global mapper node via that pubsub (collecting lidar frames from all robots, assembling global map, planning, streaming paths back to robots and their local planners)
- assume we need to upload video from multiple robots sharing a local robot network, with single (or just a few) robots having an internet connection
- assume we need to run teleop, so transport efficiently encoded low latency video in one direction, and light control messages in other direction (NAT penetration?)
- assume we are on a high throughput network with internet connection, can we upload all sensor data in realtime into a cluster? can we stream control commands to the robot?
- assume we want to run requests to a remote server, potentially stream responses, requests could be image frames, LLM queries etc
- can we have global, local, internet level pubsubs, how do we control what goes where and when?
- where are backpressure strategies executed, broker, receiver, emitter? (and which ones) what is the appropriate backpressure approach in different scenarios?
- what are the QOS strategies available?
other useful features
Zenoh is fancy, idk what it has, but stuff like this is interesting to know about:
- etcd style distributed KV store?
- raft style automatic master node election?
- meshing?
Example scenario
Write some zenoh transport, configure it well enough to split the system, local control, remote global mapping and path planning on a laptop, or, slow GPU heavy module offloaded to a server