perf(generator): use memcpy paths for primitive sequence conversion#26
Open
azerupi wants to merge 1 commit into
Open
perf(generator): use memcpy paths for primitive sequence conversion#26azerupi wants to merge 1 commit into
azerupi wants to merge 1 commit into
Conversation
For UnboundedSequence<BasicType> fields, route both conversion directions through single-memcpy paths instead of element-by-element loops. - from_rmw_message: emit msg.field.into() instead of msg.field.into_iter().collect(). Routes through rosidl_runtime_rs' Copy-bounded From<Sequence<T>> for Vec<T> -> as_slice().to_vec() -> memcpy. The old path paid a per-element read + zero-write through SequenceIterator::next(). - into_rmw_message Owned arm: emit msg.field.as_slice().into() instead of msg.field.into(). Routes through From<&[T]> for Sequence<T> -> Sequence::new + clone_from_slice -> memcpy (stdlib specializes clone_from_slice for T: Copy). The old path went through From<Vec<T>> -> from_iter -> extend, paying a per-element IndexMut write. The Borrowed arm of into_rmw_message already used as_slice().into() and is unchanged. Non-primitive element types (String, WString, nested messages) still use .into_iter().map(...).collect() -- they need real per-element conversion regardless. Also restructures the from_rmw_message UnboundedSequence branch so each element-type arm emits a single complete expression rather than sharing a chained .into_iter()...collect() across arms. Microbenchmarks (Sequence<u8> 65536 <-> Vec<u8>): receive: 23.3 us -> 513 ns (45.4x) publish: 28.3 us -> 726 ns (39.0x) End-to-end (UInt8MultiArray, single-process, reliable QoS, rmw_fastrtps_cpp, 64 KiB, median of 3 x 5s): rclrs (main): 30,578 msg/s rclrs (this branch): 135,560 msg/s (4.4x over main, ~90% of rclcpp) rclcpp baseline: 151,279 msg/s rclrs + Worker sub: 224,576 msg/s (~1.48x of rclcpp single-process) Requires rosidl_runtime_rs with the new Copy-bounded From<Sequence<T>> for Vec<T> impl (companion PR). Refs ros2-rust/ros2_rust#628 improvement (1).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the second of a set of 2 PRs that address issue number 2 described in ros2-rust/ros2_rust#628
ros2-rust/rosidl_runtime_rs#20 has to be merged first.
Problem
2. Element-by-element sequence conversion in
from_rmw_messageThe generated
from_rmw_messagecode convertsSequence<T>toVec<T>via.into_iter().collect(). TheSequenceIterator::next()implementation (rosidl_runtime_rs/src/sequence.rs) does this per element:For a 64 KB
Sequence<u8>, this is 65,536 individual read + zero-write + insert cycles instead of a singlememcpy.Solution
Generated
from_rmw_messageandinto_rmw_messageimpls forUnboundedSequence<BasicType>fields now route throughmemcpy-based conversion paths instead of per-element iteration / writes.from_rmw_message) — emitsmsg.field.into(), which routes throughFrom<Sequence<T>> for Vec<T>(added inros2-rust/rosidl_runtime_rs#20) →as_slice().to_vec()→memcpy. Replacesinto_iter().collect(), which paid a per-element read+zero-write viaSequenceIterator::next().into_rmw_message, Owned arm) — emitsmsg.field.as_slice().into(), which routes through the existingFrom<&[T]> for Sequence<T>→Sequence::new + clone_from_slice→memcpy(via stdlib'sCopyspecialization onclone_from_slice). Replacesmsg.field.into(), which routed throughFrom<Vec<T>> for Sequence<T>→extend, paying a per-element write through bounds-checkedIndexMut.The Borrowed arm of
into_rmw_messagealready usedmsg.field.as_slice().into()and is unchanged. Non-primitive sequence branches (String,WString, nestedMessage-typed sequences) are unchanged — they need real per-element conversion.Depends on:
ros2-rust/rosidl_runtime_rs#20) for the newFrom<Sequence<T>> for Vec<T>impl. That PR must be released first before this PR's generated code will compile against a publishedrosidl_runtime_rs.Benchmarks
For benchmarks, see
ros2-rust/rosidl_runtime_rs#20)