feat(grok): add image command for grok.com image generation by flizzywine · Pull Request #906 · jackwener/OpenCLI

flizzywine · 2026-04-09T08:23:39Z

Summary

Adds opencli grok image <prompt>, a new command that submits a prompt via the existing grok.com browser session and returns the generated image URLs from the latest assistant message bubble. Optionally downloads the images to a local directory.

This is a natural complement to the existing grok ask command: the current ask only extracts innerText from the latest bubble, so when Grok's response is an image the command silently returns an empty / truncated string. grok image scrapes <img> elements from the same bubble instead.

Design notes

Submission path reuses the same composer interaction as ask.ts — it prefers the ProseMirror composer (.ProseMirror[contenteditable="true"]) when available and falls back to the legacy <textarea> flow otherwise.
Stability detection: polls readLastBubbleImages(page) every ~3s and returns once the set of image URLs has been stable across two consecutive reads. This mirrors the ask stabilization logic.
Downloading: assets.grok.com/users/.../image.jpg is gated by Cloudflare; plain HTTP clients (curl, node-fetch) receive HTTP 403. When --out <dir> is provided, the command performs the fetch inside the page via fetch(url, { credentials: 'include', referrer: 'https://grok.com/' }), converts the blob to base64, and writes the decoded bytes to disk from Node. This is the minimal-viable way to bypass the auth without juggling cookie jars.
Avatar/UI filter: images with naturalWidth < 128 are dropped so UI chrome inside the bubble doesn't pollute the output.

Flags

Flag	Default	Description
`prompt` (positional)	—	Image generation prompt
`--new`	`false`	Start a fresh chat before sending
`--timeout`	`240`	Max seconds to wait
`--count`	`1`	Minimum images to wait for before returning
`--out`	`""`	Directory to save downloaded images (triggers in-page fetch)

Columns: url, width, height, path.

Test plan

npx tsc --noEmit — clean
npx vitest run --project unit — 579 passed, 1 skipped (no regressions)
npx vitest run clis/grok/image.test.ts — 9 passed (new tests for isOnGrok, normalizeBooleanFlag, dedupeBySrc, imagesSignature, extFromContentType, buildFilename)
Manual end-to-end smoke test against a live grok.com session: opencli grok image "a cyberpunk mechanical owl, neon purple and blue" --new true --out /tmp/grok-img --timeout 300 --format json — returned a path pointing at a valid 784×1168 JPEG on disk.

No new runtime dependencies.

Add `opencli grok image <prompt>` which submits a prompt via the existing grok.com browser session and returns the generated image URLs from the latest assistant bubble. Because assets.grok.com URLs are gated by Cloudflare and cannot be downloaded with a plain HTTP client, the --out flag triggers an in-page fetch(credentials: 'include') so the browser session's cookies and referer are attached, then writes the decoded blob to disk. Flags: - --new start a fresh chat before sending - --timeout max seconds to wait for the image (default 240) - --count minimum number of images to wait for before returning - --out directory to save downloaded images Ships with unit tests for the helpers (isOnGrok, normalizeBooleanFlag, dedupeBySrc, imagesSignature, extFromContentType, buildFilename).

hiSandog · 2026-04-09T09:47:25Z

I think sendPrompt() here is a bit less robust than the existing Grok web flow in clis/grok/ask.ts. ask.ts's sendPromptViaExplicitWeb() waits/retries for the ProseMirror composer and for a visible enabled Submit button; this new code does a single DOM query and immediately falls back to textarea.

On a cold session right after goto(GROK_URL) / tryStartFreshChat(), the composer often is not mounted yet but appears a second later, so grok image can return [BLOCKED] send failed: no composer even though the session is healthy. I would reuse the same readiness loop / visible-button check here so grok image behaves more like grok ask --web true.

hiSandog · 2026-04-09T12:41:59Z

I think clis/grok/image.ts needs the same baseline/new-bubble guard that the explicit web flow in clis/grok/ask.ts already has (getBubbleTexts() + pickLatestAssistantCandidate()).

Right now the image loop polls readLastBubbleImages() against whatever the current last bubble is. In an existing chat that already has a previous image reply, if sendPrompt() does not stick immediately or Grok takes a moment to append the new user/assistant bubbles, the recorder can stabilize on the previous assistant image set and return stale URLs after ~6s.

I would capture a baseline bubble count/signature before sendPrompt(), then only accept images from bubbles that appeared after that baseline. That keeps grok image --new false aligned with the stale-response protection that grok ask --web true already uses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(grok): add image command for grok.com image generation#906

feat(grok): add image command for grok.com image generation#906
flizzywine wants to merge 1 commit intojackwener:mainfrom
flizzywine:feat/grok-image

flizzywine commented Apr 9, 2026

Uh oh!

hiSandog commented Apr 9, 2026

Uh oh!

hiSandog commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

flizzywine commented Apr 9, 2026

Summary

Design notes

Flags

Test plan

Uh oh!

hiSandog commented Apr 9, 2026

Uh oh!

hiSandog commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants