perf(generator): use memcpy paths for primitive sequence conversion by azerupi · Pull Request #26 · ros2-rust/rosidl_rust

azerupi · 2026-05-17T23:05:26Z

This is the second of a set of 2 PRs that address issue number 2 described in ros2-rust/ros2_rust#628

ros2-rust/rosidl_runtime_rs#20 has to be merged first.

Problem

2. Element-by-element sequence conversion in `from_rmw_message`

The generated from_rmw_message code converts Sequence<T> to Vec<T> via .into_iter().collect(). The SequenceIterator::next() implementation (rosidl_runtime_rs/src/sequence.rs) does this per element:

let elem = ptr.read();
ptr.write(std::mem::zeroed::<T>());  // writes zero back for EVERY element

For a 64 KB Sequence<u8>, this is 65,536 individual read + zero-write + insert cycles instead of a single memcpy.

Solution

Generated from_rmw_message and into_rmw_message impls for UnboundedSequence<BasicType> fields now route through memcpy-based conversion paths instead of per-element iteration / writes.

Receive side (from_rmw_message) — emits msg.field.into(), which routes through From<Sequence<T>> for Vec<T> (added in ros2-rust/rosidl_runtime_rs#20) → as_slice().to_vec() → memcpy. Replaces into_iter().collect(), which paid a per-element read+zero-write via SequenceIterator::next().
Publish side (into_rmw_message, Owned arm) — emits msg.field.as_slice().into(), which routes through the existing From<&[T]> for Sequence<T> → Sequence::new + clone_from_slice → memcpy (via stdlib's Copy specialization on clone_from_slice). Replaces msg.field.into(), which routed through From<Vec<T>> for Sequence<T> → extend, paying a per-element write through bounds-checked IndexMut.

The Borrowed arm of into_rmw_message already used msg.field.as_slice().into() and is unchanged. Non-primitive sequence branches (String, WString, nested Message-typed sequences) are unchanged — they need real per-element conversion.

Depends on: ros2-rust/rosidl_runtime_rs#20) for the new From<Sequence<T>> for Vec<T> impl. That PR must be released first before this PR's generated code will compile against a published rosidl_runtime_rs.

Benchmarks

For benchmarks, see ros2-rust/rosidl_runtime_rs#20)

For UnboundedSequence<BasicType> fields, route both conversion directions through single-memcpy paths instead of element-by-element loops. - from_rmw_message: emit msg.field.into() instead of msg.field.into_iter().collect(). Routes through rosidl_runtime_rs' Copy-bounded From<Sequence<T>> for Vec<T> -> as_slice().to_vec() -> memcpy. The old path paid a per-element read + zero-write through SequenceIterator::next(). - into_rmw_message Owned arm: emit msg.field.as_slice().into() instead of msg.field.into(). Routes through From<&[T]> for Sequence<T> -> Sequence::new + clone_from_slice -> memcpy (stdlib specializes clone_from_slice for T: Copy). The old path went through From<Vec<T>> -> from_iter -> extend, paying a per-element IndexMut write. The Borrowed arm of into_rmw_message already used as_slice().into() and is unchanged. Non-primitive element types (String, WString, nested messages) still use .into_iter().map(...).collect() -- they need real per-element conversion regardless. Also restructures the from_rmw_message UnboundedSequence branch so each element-type arm emits a single complete expression rather than sharing a chained .into_iter()...collect() across arms. Microbenchmarks (Sequence<u8> 65536 <-> Vec<u8>): receive: 23.3 us -> 513 ns (45.4x) publish: 28.3 us -> 726 ns (39.0x) End-to-end (UInt8MultiArray, single-process, reliable QoS, rmw_fastrtps_cpp, 64 KiB, median of 3 x 5s): rclrs (main): 30,578 msg/s rclrs (this branch): 135,560 msg/s (4.4x over main, ~90% of rclcpp) rclcpp baseline: 151,279 msg/s rclrs + Worker sub: 224,576 msg/s (~1.48x of rclcpp single-process) Requires rosidl_runtime_rs with the new Copy-bounded From<Sequence<T>> for Vec<T> impl (companion PR). Refs ros2-rust/ros2_rust#628 improvement (1).

azerupi mentioned this pull request May 17, 2026

perf(sequence): add From<Sequence<T>> for Vec<T> with Copy bound ros2-rust/rosidl_runtime_rs#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(generator): use memcpy paths for primitive sequence conversion#26

perf(generator): use memcpy paths for primitive sequence conversion#26
azerupi wants to merge 1 commit into
ros2-rust:mainfrom
azerupi:perf/fast-copy-basic-sequence-from-rmw

azerupi commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

azerupi commented May 17, 2026

Problem

2. Element-by-element sequence conversion in from_rmw_message

Solution

Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. Element-by-element sequence conversion in `from_rmw_message`