When the simulator is running, it exposes a set of globals on window for debugging, scripting, and benchmark orchestration. The agent never sees these — they're for the human operator, task author, or benchmark runner.
The available namespaces:
| Namespace | Purpose |
|---|---|
window.__SIM__ |
State snapshot, reset, simulator-level controls |
window.__OS__ |
OS-level controls (open app, back, services) |
window.__SIM_INPUT__ |
Synthesized human-style input (tap, swipe, type, …) |
window.__SIM_QUERY__ |
Read-only DOM queries (find an element's rect by id / selector / trigger) |
window.__SIM_TIME__ |
Simulated wall-clock control |
window.__SIM_LOCATION__ |
Simulated geolocation control |
window.__SIM_FS__ |
Simulated file-system access |
window.__SIM_MEDIA__ |
Simulated media-library access |
window.__SIM_AI__ |
Optional simulated AI service hooks |
window.__STORAGE_ISOLATION__ |
Current storage-isolation mode metadata |
window.__getScrollMeta__ |
Read-only scroll-container state |
You can open the browser DevTools and call any of these directly. The benchmark calls them via Playwright's evaluate().
Returns the entire structured state of the simulator as a JSON-serializable object.
const state = window.__SIM__.getState();
// → { os: {...}, apps: { wechat: {...}, alipay: {...}, ... } }This is the only state the benchmark trusts for judging. The full getState() schema, including per-app snapshots, is documented in app-state-schema.md. Key trees:
state.os.settings—global/system/secure/ app-specificstate.os.hardware— battery, wifi, cellular, sensorsstate.os.permissions/preferencesstate.os.providers— contacts, sms, media (shared content)state.os.clipboard/notifications/shade— runtime singleton snapshotsstate.os.services— registered service snapshots other than clipboard / notificationsstate.apps.<appId>— each app's runtime overlay
Clears all simulator state (localStorage + in-memory stores) and reloads the page.
await window.__SIM__.reset();There is also __SIM__.resetState() which clears state without reloading — useful when the harness has already snapshotted the tab.
Merges a partial state into the live simulator. Use this to inject task-initial conditions.
window.__SIM__.setState({ apps: { wechat: { user: { id: 'u_42' } } } });
window.__SIM__.setState(snapshot, { deep: true, reload: true });setState() is synchronous. If reload: true is passed, it triggers window.location.reload() after applying the patch.
waitForData(appIds?) imports and runs app data-loader modules, with a retry for transient failures. Use it when task setup depends on app data being available before snapshotting or route setup.
preloadAllAppStores() and preloadAppStores(appIds) are currently no-ops because app stores are eagerly loaded. They remain in the runtime API for compatibility with older benchmark scripts and possible future lazy-store builds.
warmUpAllApps() additionally mounts each app once so its first render cost is paid up front; this is slower and usually unnecessary.
window.__OS__.openApp('wechat'); // open with current route
window.__OS__.openApp('wechat', '/chat?tab=hot'); // open at a specific deep link
window.__OS__.closeApp('wechat'); // kill the task
window.__OS__.launchApp('wechat'); // launcher-style start
window.__OS__.goHome();
window.__OS__.showRecents();window.__OS__.handleBack();
// Routes through BackDispatcher according to registered priorities:
// PermissionDialog (1000) > Shade (800) > Keyboard (700) > App (100) > Launcherwindow.__OS__.getAppRoute();
// → { app: 'wechat', path: '/chat?tab=hot' }System services are accessible as sub-properties of __OS__. Representative calls:
// Notifications
window.__OS__.notifications.push({ title: 'New message', body: 'Hi' });
window.__OS__.notifications.dismiss(id);
window.__OS__.notifications.getState();
// Keyboard
window.__OS__.keyboard.show();
window.__OS__.keyboard.hide();
window.__OS__.keyboard.isVisible();
// Permissions
window.__OS__.permissions.checkPermission('wechat', 'LOCATION');
await window.__OS__.permissions.requestPermissions('wechat', ['LOCATION', 'CAMERA']);
window.__OS__.permissions.grantPermission('wechat', 'LOCATION');
// Device preferences (battery, WiFi, brightness, …)
window.__OS__.device.getPreference('brightness');
window.__OS__.device.setPreference('brightness', 80);
window.__OS__.device.connectWifi('Home-5G');
// Quick settings
window.__OS__.quickSettings.set({ airplane: true });Each service's full API surface is declared in os/types/globals.d.ts. The app-facing service rules are documented in ../platform/os/services/README.md.
These are the same gestures the benchmark dispatches via Playwright when an agent emits actions. Coordinates here are CSS pixels (viewport coordinates), not the agent's normalized [0, 1000].
| Method | Signature | Notes |
|---|---|---|
tap(x, y, opts?) |
(number, number, {coords?: 'css' | 'physical'}) |
Single tap |
doubleTap(x, y, opts?) |
(number, number, opts?) |
Two quick taps |
longPress(x, y, ms?, opts?) |
(number, number, number = 800, opts?) |
Hold (default 800 ms) |
swipe(start, end, opts?) |
`({x,y} | [x,y], {x,y} |
drag(start, end, opts?) |
`({x,y} | [x,y], {x,y} |
type(text, opts?) |
(string, {clear?: boolean, perCharMs?: number}) |
Types into the focused field |
back() |
Equivalent to __OS__.handleBack() |
|
home() |
Equivalent to __OS__.goHome() |
|
recent() |
Open the recents / multitask UI | |
enter() |
Synthesize an Enter / return key |
Swipe options:
await __SIM_INPUT__.swipe(
{ x: 200, y: 500 }, { x: 200, y: 200 },
{
ms: 300, // duration of the swipe gesture
steps: 10, // sample points
inertia: true, // continue with inertia after release
inertiaMs: 450,
inertiaDecay: 0.86,
},
);__SIM_INPUT__.tap(200, 400);
await __SIM_INPUT__.type('Hello', { clear: true });
await __SIM_INPUT__.swipe([200, 500], [200, 200]);
__SIM_INPUT__.back();By default, __SIM_INPUT__ expects CSS pixels matching the visible viewport. If you have physical pixels (e.g. coordinates from a 1080×2400 image), pass { coords: 'physical' }:
__SIM_INPUT__.tap(540, 1200, { coords: 'physical' });The mapping is computed from the active device profile and the current CSS-zoom viewport. See os/simInput.ts for the exact resolution logic.
Returns positions and bounding boxes for elements without performing any action. Useful for task authoring, smoke tests, or chaining query → input.
__SIM_QUERY__.getRectById('submit-btn');
__SIM_QUERY__.getRectBySelector('[data-trigger="wechat.settings.open"]');
__SIM_QUERY__.getRectByTrigger('wechat.tab.switch', { tab: 'me' });Each returns either null (not found / not visible) or:
{
rect: { x, y, width, height }, // CSS-pixel bounds
center: { x, y }, // CSS-pixel center
centerPhysical: { x, y }, // physical-pixel center (accounts for DPR + zoom)
}const r = __SIM_QUERY__.getRectByTrigger('wechat.tab.switch', { tab: 'me' });
if (r) __SIM_INPUT__.tap(r.center.x, r.center.y);The simulator's TimeService underlies every now() call in app code. By default it follows the real wall clock; the benchmark overrides it for reproducibility.
__SIM_TIME__.now();
__SIM_TIME__.setSimulatedTime('2026-05-18 09:00');
__SIM_TIME__.setSimulatedTime(1747560000000);
__SIM_TIME__.setRealTime(); // revert
__SIM_TIME__.getConfig();🕐 App code must use
TimeServicefor displayed times, data timestamps, and judge-relevant fields.TimeService.realNow()is the escape hatch for measuring real elapsed time (animations, debouncing, etc.).
Replaces navigator.geolocation and enforces consistent GPS coordinates across runs.
__SIM_LOCATION__.getCoords();
__SIM_LOCATION__.setSimulatedLocation('shanghai');
__SIM_LOCATION__.setSimulatedLocation({ latitude: 31.23, longitude: 121.47 });
__SIM_LOCATION__.simulateError(1); // 1 = permission denied, 2 = unavailable, 3 = timeout
__SIM_LOCATION__.clearError();
__SIM_LOCATION__.setRealLocation();
__SIM_LOCATION__.presets; // → { beijing: { latitude, longitude }, shanghai: {…}, tokyo: {…}, … }
__SIM_LOCATION__.getConfig();🌐 App code must use
LocationServicerather thannavigator.geolocationdirectly. The lint rules will reject the latter.
window.__getScrollMeta__();
// → { main: { position: 120, max: 980, viewport: 600, total: 1580 }, … }Auto-discovers every visible element with data-scroll-container="<name>" and returns its scroll state. Useful when an agent's task requires reaching a specific scroll position.
// State surgery
__SIM__.getState();
await __SIM__.reset();
// OS control
__OS__.openApp('wechat', '/chat');
__OS__.handleBack();
// Find + tap
const r = __SIM_QUERY__.getRectBySelector('[data-trigger="settings.open"]');
__SIM_INPUT__.tap(r.center.x, r.center.y);
// Type
await __SIM_INPUT__.type('Hello MobileGym 👋', { clear: true });
// Reproducibility knobs
__SIM_TIME__.setSimulatedTime('2026-05-18 09:00');
__SIM_LOCATION__.setSimulatedLocation('shanghai');- 📊 The full
__SIM__.getState()schema → app-state-schema.md - 🧪 Use these APIs from inside a task setup → ../guides/add-a-task.md
- 🤖 What the agent sees instead → architecture.md