Skip to content
Merged

fix #23

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/macos-signed-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ jobs:
codesign --verify --deep --strict --verbose=2 "$APP_BUNDLE"
spctl --assess --type execute --verbose "$APP_BUNDLE"
bash client/src-tauri/scripts/verify-macos-portable-linkage.sh "$APP_BUNDLE"
bash client/src-tauri/scripts/verify-macos-desktop-sidecar.sh "$APP_BUNDLE"

- name: Upload signed artifacts
uses: actions/upload-artifact@v4
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,7 @@ jobs:
codesign --verify --deep --strict --verbose=2 "$APP_BUNDLE"
spctl --assess --type execute --verbose "$APP_BUNDLE"
bash client/src-tauri/scripts/verify-macos-portable-linkage.sh "$APP_BUNDLE"
bash client/src-tauri/scripts/verify-macos-desktop-sidecar.sh "$APP_BUNDLE"

- name: Check Azure Windows signing configuration
id: windows_signing_config
Expand Down
98 changes: 98 additions & 0 deletions COMPUTER-USE-MANUAL-DRAG-PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Computer Use Manual Drag Support Plan

## Reader and Goal

Reader: an internal engineer implementing Computer Use manual-control input.

Post-read action: add and verify click-and-drag support for manual desktop control, so a user can drag windows, select file ranges, and perform ordinary drag gestures from the streamed desktop viewport.

## Audit Conclusion

Manual Computer Use does not currently support true click-and-drag from the human-controlled viewport.

The lower layers already have partial drag capability:

- The desktop control action model includes `mouse_drag`.
- The remote manual-control bridge accepts manual-control input actions and maps `x`, `y`, `toX`, `toY`, `sourceWidth`, and `sourceHeight` into the desktop control request.
- The desktop runtime validates drag source and target points, normalizes active stream coordinates to display coordinates, and maps the action to sidecar drag control when available.
- The native and sidecar input paths can emit a left-button drag sequence.

The manual-control UI path is the gap:

- Pointer down sends `mouse_click` or `mouse_right_click` immediately.
- Pointer move while a button is held sends throttled `mouse_move`.
- Pointer up on non-mobile does not send any button-release or drag action.
- Mobile touch gestures are reserved for tap, pan, and pinch behavior; they do not map touch movement into desktop drag.
- The relay client input type does not currently advertise `toX` or `toY`, even though the payload path can forward them.

Because the remote desktop never receives a held mouse button from the manual viewport, the current behavior cannot drag windows or rubber-band select files. It can only click, move the pointer, scroll, and send keyboard/text input.

## Implementation Strategy

Use the existing `mouse_drag` control action first. Do not introduce stateful `mouse_down` / `mouse_up` protocol actions unless one-shot drag proves unreliable during manual QA.

Implement desktop pointer drag as a gesture recognizer in the manual viewport:

1. On primary-button pointer down in manual mode, capture the pointer and store a pending gesture with pointer id, button, click detail, screen start position, and mapped desktop start point.
2. Do not send `mouse_click` immediately. Wait until pointer up so the gesture can be classified as click or drag.
3. On pointer move for the captured pointer, update the latest mapped point. Mark the gesture as dragging once movement exceeds the existing tap/click slop threshold.
4. On pointer up:
- If movement stayed within slop, send the existing click or double-click payload.
- If movement exceeded slop and the button is left, send one `mouse_drag` payload with `x`, `y`, `toX`, `toY`, `sourceWidth`, and `sourceHeight`.
- If movement exceeded slop for right or middle button, keep the initial implementation conservative and do not synthesize unsupported button-drag behavior.
5. On pointer cancel, lost capture, manual-control release, stream change, or unmount, clear the pending gesture without sending a click.
6. Keep click ripples for click gestures only. Do not add temporary debug UI.

This preserves the backend approval, lease, stream-token, and coordinate-normalization paths already used by manual control.

## Type and Contract Updates

Update the relay client manual-input type so drag is first-class:

- Add `toX` and `toY`.
- Prefer a small string union for known manual actions if it fits local style; otherwise keep `action: string` and extend the shape only.
- Add a relay client test proving `mouse_drag` forwards start and target coordinates plus stream security fields.

Add bridge/runtime coverage:

- Add a bridge unit test for manual `mouse_drag` payload mapping, including `toX` and `toY`.
- Add or extend runtime/sidecar mapping tests so drag target coordinates are preserved into the sidecar request.
- If manual QA shows instant two-point drags are flaky, extend the runtime later with interpolated drag duration or a stateful press/drag/release protocol guarded by the same manual-control lease.

## Frontend Tests

Add focused tests around the manual viewport:

- A simple pointer down/up still sends one `mouse_click`.
- A small move within slop still sends a click.
- A left-button move beyond slop sends one `mouse_drag` on pointer up and does not send the old immediate `mouse_click`.
- The drag payload uses mapped desktop stream coordinates for both source and target, including object-contain letterboxing.
- Pointer cancel sends no click or drag.
- Existing mobile pinch, pan, tap, keyboard capture, scroll, and right-click behavior remain covered.

## Verification

Run scoped checks only:

- `pnpm --dir ./cloud test -- src/routes/-desktop-click-ripple.test.tsx src/lib/relay/relay-client.test.ts`
- `cargo test -p xero-desktop manual_control_drag --lib`
- `cargo test -p xero-desktop-sidecar mouse_drag --tests`

Run Cargo commands one at a time.

Manual QA must be performed in the Tauri app, not by opening the app in a browser:

- Start a Computer Use desktop stream.
- Enter manual control.
- Drag a window by its title bar and confirm it moves.
- Drag across files/icons and confirm multi-select works.
- Confirm normal click, double-click, right-click, scroll, and keyboard passthrough still behave normally.

## Acceptance Criteria

- Manual left-button drag works from the streamed desktop viewport.
- Clicks are not accidentally converted into drags.
- Drag gestures do not emit a premature click at the start point.
- Manual-control lease and approval gates remain unchanged.
- No temporary or test-only UI is added.
- Scoped frontend and Rust tests pass.
28 changes: 23 additions & 5 deletions client/src-tauri/crates/xero-desktop-sidecar/tests/ipc_contract.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ use serde_json::json;
use time::{format_description::well_known::Rfc3339, Duration, OffsetDateTime};
use xero_desktop_control_ipc::{
hash_session_token, DesktopSidecarActor, DesktopSidecarAuth, DesktopSidecarAuthScheme,
DesktopSidecarCapabilities, DesktopSidecarHandshake, DesktopSidecarOperation,
DesktopSidecarPermissionsPayload, DesktopSidecarRequest, DesktopSidecarResponse,
DesktopSidecarStreamCapabilitiesPayload, DesktopSidecarStreamPayload,
DesktopSidecarStreamStatus, DesktopSidecarStreamTransport, DESKTOP_SIDECAR_PROTOCOL,
DESKTOP_SIDECAR_SCHEMA_VERSION,
DesktopSidecarCapabilities, DesktopSidecarControlRequest, DesktopSidecarHandshake,
DesktopSidecarMouseButton, DesktopSidecarOperation, DesktopSidecarPermissionsPayload,
DesktopSidecarRequest, DesktopSidecarResponse, DesktopSidecarStreamCapabilitiesPayload,
DesktopSidecarStreamPayload, DesktopSidecarStreamStatus, DesktopSidecarStreamTransport,
DESKTOP_SIDECAR_PROTOCOL, DESKTOP_SIDECAR_SCHEMA_VERSION,
};

struct SidecarHarness {
Expand Down Expand Up @@ -313,3 +313,21 @@ fn sidecar_ipc_rejects_shell_like_payload_keys() {
"sidecar_forbidden_payload"
);
}

#[test]
fn mouse_drag_control_contract_decodes_target_coordinates() {
let request = serde_json::from_value::<DesktopSidecarControlRequest>(json!({
"x": 10,
"y": 20,
"toX": 300,
"toY": 240,
"button": "left"
}))
.expect("mouse drag control request");

assert_eq!(request.x, Some(10));
assert_eq!(request.y, Some(20));
assert_eq!(request.to_x, Some(300));
assert_eq!(request.to_y, Some(240));
assert_eq!(request.button, Some(DesktopSidecarMouseButton::Left));
}
21 changes: 18 additions & 3 deletions client/src-tauri/scripts/sign-macos-target-binaries.sh
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,20 @@ helper_names=(
xero-harness-evals
tool-harness
xero
xero-desktop-sidecar
)
resource_helper_paths=(
"$tauri_dir/resources/xero-desktop-sidecar"
)
codesign_timeout_seconds="${XERO_CODESIGN_TIMEOUT_SECONDS:-300}"

sign_helper_binary() {
local helper_path="$1"
echo "Signing helper binary $helper_path"
run_with_timeout "$codesign_timeout_seconds" codesign --force --options runtime --timestamp --sign "$identity" "$helper_path"
signed_any=1
}

signed_any=0
while IFS= read -r release_dir; do
for helper_name in "${helper_names[@]}"; do
Expand All @@ -128,12 +139,16 @@ while IFS= read -r release_dir; do
continue
fi

echo "Signing target helper binary $helper_path"
run_with_timeout "$codesign_timeout_seconds" codesign --force --options runtime --timestamp --sign "$identity" "$helper_path"
signed_any=1
sign_helper_binary "$helper_path"
done
done < <(find "$tauri_dir/target" -type d -path "*/release" 2>/dev/null | sort)

for helper_path in "${resource_helper_paths[@]}"; do
if [ -f "$helper_path" ]; then
sign_helper_binary "$helper_path"
fi
done

if [ "$signed_any" -eq 0 ]; then
echo "No target helper binaries found to sign."
fi
29 changes: 29 additions & 0 deletions client/src-tauri/scripts/verify-macos-desktop-sidecar.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -ne 1 ]; then
echo "usage: $0 /path/to/App.app" >&2
exit 64
fi

app_bundle="$1"
sidecar="$app_bundle/Contents/Resources/resources/xero-desktop-sidecar"

if [ ! -f "$sidecar" ]; then
echo "::error::Missing bundled desktop sidecar at $sidecar."
exit 66
fi

if [ ! -x "$sidecar" ]; then
echo "::error file=$sidecar::Bundled desktop sidecar is not executable."
exit 66
fi

if ! file "$sidecar" | grep -q "Mach-O"; then
echo "::error file=$sidecar::Bundled desktop sidecar is not a Mach-O binary."
exit 66
fi

codesign --verify --strict --verbose=2 "$sidecar"

echo "macOS desktop sidecar is bundled and signed."
28 changes: 28 additions & 0 deletions client/src-tauri/src/commands/remote_bridge.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3942,6 +3942,34 @@ mod tests {
);
}

#[test]
fn manual_control_drag_payload_maps_to_desktop_control_request() {
let request = manual_control_input_request(&json!({
"action": "mouse_drag",
"x": 42,
"y": 64,
"toX": 320,
"toY": 240,
"sourceWidth": 1280,
"sourceHeight": 720,
"button": "left",
}))
.expect("manual drag input request");

assert_eq!(request.action, AutonomousDesktopControlAction::MouseDrag);
assert_eq!(request.x, Some(42));
assert_eq!(request.y, Some(64));
assert_eq!(request.to_x, Some(320));
assert_eq!(request.to_y, Some(240));
assert_eq!(request.source_width, Some(1280));
assert_eq!(request.source_height, Some(720));
assert_eq!(request.button, Some(AutonomousDesktopMouseButton::Left));
assert_eq!(
request.reason.as_deref(),
Some("cloud_manual_control_input")
);
}

#[test]
fn manual_control_keyboard_payloads_map_to_desktop_control_requests() {
let text_request = manual_control_input_request(&json!({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3959,23 +3959,11 @@ fn resolve_desktop_sidecar_binary() -> Result<PathBuf, String> {
#[cfg(not(test))]
{
let binary_name = desktop_sidecar_binary_name();
let mut candidates = Vec::new();
if let Ok(exe) = std::env::current_exe() {
if let Some(dir) = exe.parent() {
candidates.push(dir.join(&binary_name));
candidates.push(dir.join("../Resources").join(&binary_name));
}
}
if let Some(manifest_dir) = option_env!("CARGO_MANIFEST_DIR") {
let manifest_dir = PathBuf::from(manifest_dir);
candidates.push(manifest_dir.join("resources").join(&binary_name));
if let Some(target_dir) = manifest_dir.parent() {
candidates.push(target_dir.join("target/debug").join(&binary_name));
candidates.push(target_dir.join("target/release").join(&binary_name));
}
}

candidates
desktop_sidecar_binary_candidates(
&binary_name,
std::env::current_exe().ok(),
option_env!("CARGO_MANIFEST_DIR").map(PathBuf::from),
)
.into_iter()
.find_map(|candidate| validate_sidecar_binary_path(candidate).ok())
.ok_or_else(|| {
Expand All @@ -3987,6 +3975,30 @@ fn resolve_desktop_sidecar_binary() -> Result<PathBuf, String> {
}
}

fn desktop_sidecar_binary_candidates(
binary_name: &str,
current_exe: Option<PathBuf>,
manifest_dir: Option<PathBuf>,
) -> Vec<PathBuf> {
let mut candidates = Vec::new();
if let Some(exe) = current_exe {
if let Some(dir) = exe.parent() {
candidates.push(dir.join(binary_name));
let bundled_resources_dir = dir.join("../Resources");
candidates.push(bundled_resources_dir.join(binary_name));
candidates.push(bundled_resources_dir.join("resources").join(binary_name));
}
}
if let Some(manifest_dir) = manifest_dir {
candidates.push(manifest_dir.join("resources").join(binary_name));
if let Some(target_dir) = manifest_dir.parent() {
candidates.push(target_dir.join("target/debug").join(binary_name));
candidates.push(target_dir.join("target/release").join(binary_name));
}
}
candidates
}

#[cfg(not(test))]
fn desktop_sidecar_binary_name() -> String {
if cfg!(windows) {
Expand Down Expand Up @@ -6436,6 +6448,22 @@ mod tests {
.is_some_and(|message| message.contains("closed before sending a response")));
}

#[test]
fn sidecar_candidates_include_tauri_preserved_resource_path() {
let exe = PathBuf::from("Xero.app")
.join("Contents")
.join("MacOS")
.join("xero-desktop");
let resources_dir = PathBuf::from("Xero.app")
.join("Contents")
.join("MacOS")
.join("../Resources");
let candidates = desktop_sidecar_binary_candidates("xero-desktop-sidecar", Some(exe), None);

assert!(candidates.contains(&resources_dir.join("xero-desktop-sidecar")));
assert!(candidates.contains(&resources_dir.join("resources").join("xero-desktop-sidecar")));
}

#[test]
fn cloud_manual_input_requires_active_controller_lease() {
let repo = tempdir().expect("tempdir");
Expand Down Expand Up @@ -7096,6 +7124,48 @@ mod tests {
);
}

#[test]
fn manual_control_drag_sidecar_request_preserves_target_coordinates() {
let request = AutonomousDesktopControlRequest {
action: AutonomousDesktopControlAction::MouseDrag,
display_id: None,
window_id: None,
app_name: None,
bundle_id: None,
element_id: None,
x: Some(10),
y: Some(20),
source_width: Some(1280),
source_height: Some(720),
to_x: Some(300),
to_y: Some(240),
delta_x: None,
delta_y: None,
button: Some(AutonomousDesktopMouseButton::Left),
clicks: None,
key: None,
keys: Vec::new(),
text: None,
value: None,
menu_path: Vec::new(),
reason: Some("cloud_manual_control_input".into()),
sensitivity: None,
};

validate_desktop_control_request(&request).expect("valid drag request");
let sidecar = sidecar_control_request(&request);

assert_eq!(
desktop_control_sidecar_operation(&request.action),
Some(DesktopSidecarOperation::MouseDrag)
);
assert_eq!(sidecar.x, Some(10));
assert_eq!(sidecar.y, Some(20));
assert_eq!(sidecar.to_x, Some(300));
assert_eq!(sidecar.to_y, Some(240));
assert_eq!(sidecar.button, Some(DesktopSidecarMouseButton::Left));
}

#[test]
fn maps_scaled_stream_points_to_display_coordinates() {
let display = AutonomousDesktopDisplay {
Expand Down
Loading
Loading