Skip to content

feat(grok): add image command for grok.com image generation#906

Open
flizzywine wants to merge 1 commit intojackwener:mainfrom
flizzywine:feat/grok-image
Open

feat(grok): add image command for grok.com image generation#906
flizzywine wants to merge 1 commit intojackwener:mainfrom
flizzywine:feat/grok-image

Conversation

@flizzywine
Copy link
Copy Markdown

Summary

Adds opencli grok image <prompt>, a new command that submits a prompt via the existing grok.com browser session and returns the generated image URLs from the latest assistant message bubble. Optionally downloads the images to a local directory.

This is a natural complement to the existing grok ask command: the current ask only extracts innerText from the latest bubble, so when Grok's response is an image the command silently returns an empty / truncated string. grok image scrapes <img> elements from the same bubble instead.

Design notes

  • Submission path reuses the same composer interaction as ask.ts — it prefers the ProseMirror composer (.ProseMirror[contenteditable="true"]) when available and falls back to the legacy <textarea> flow otherwise.
  • Stability detection: polls readLastBubbleImages(page) every ~3s and returns once the set of image URLs has been stable across two consecutive reads. This mirrors the ask stabilization logic.
  • Downloading: assets.grok.com/users/.../image.jpg is gated by Cloudflare; plain HTTP clients (curl, node-fetch) receive HTTP 403. When --out <dir> is provided, the command performs the fetch inside the page via fetch(url, { credentials: 'include', referrer: 'https://grok.com/' }), converts the blob to base64, and writes the decoded bytes to disk from Node. This is the minimal-viable way to bypass the auth without juggling cookie jars.
  • Avatar/UI filter: images with naturalWidth < 128 are dropped so UI chrome inside the bubble doesn't pollute the output.

Flags

Flag Default Description
prompt (positional) Image generation prompt
--new false Start a fresh chat before sending
--timeout 240 Max seconds to wait
--count 1 Minimum images to wait for before returning
--out "" Directory to save downloaded images (triggers in-page fetch)

Columns: url, width, height, path.

Test plan

  • npx tsc --noEmit — clean
  • npx vitest run --project unit — 579 passed, 1 skipped (no regressions)
  • npx vitest run clis/grok/image.test.ts — 9 passed (new tests for isOnGrok, normalizeBooleanFlag, dedupeBySrc, imagesSignature, extFromContentType, buildFilename)
  • Manual end-to-end smoke test against a live grok.com session: opencli grok image "a cyberpunk mechanical owl, neon purple and blue" --new true --out /tmp/grok-img --timeout 300 --format json — returned a path pointing at a valid 784×1168 JPEG on disk.

No new runtime dependencies.

Add `opencli grok image <prompt>` which submits a prompt via the existing
grok.com browser session and returns the generated image URLs from the
latest assistant bubble.

Because assets.grok.com URLs are gated by Cloudflare and cannot be
downloaded with a plain HTTP client, the --out flag triggers an in-page
fetch(credentials: 'include') so the browser session's cookies and
referer are attached, then writes the decoded blob to disk.

Flags:
- --new       start a fresh chat before sending
- --timeout   max seconds to wait for the image (default 240)
- --count     minimum number of images to wait for before returning
- --out       directory to save downloaded images

Ships with unit tests for the helpers (isOnGrok, normalizeBooleanFlag,
dedupeBySrc, imagesSignature, extFromContentType, buildFilename).
@hiSandog
Copy link
Copy Markdown
Contributor

hiSandog commented Apr 9, 2026

I think sendPrompt() here is a bit less robust than the existing Grok web flow in clis/grok/ask.ts. ask.ts's sendPromptViaExplicitWeb() waits/retries for the ProseMirror composer and for a visible enabled Submit button; this new code does a single DOM query and immediately falls back to textarea.

On a cold session right after goto(GROK_URL) / tryStartFreshChat(), the composer often is not mounted yet but appears a second later, so grok image can return [BLOCKED] send failed: no composer even though the session is healthy. I would reuse the same readiness loop / visible-button check here so grok image behaves more like grok ask --web true.

@hiSandog
Copy link
Copy Markdown
Contributor

hiSandog commented Apr 9, 2026

I think clis/grok/image.ts needs the same baseline/new-bubble guard that the explicit web flow in clis/grok/ask.ts already has (getBubbleTexts() + pickLatestAssistantCandidate()).

Right now the image loop polls readLastBubbleImages() against whatever the current last bubble is. In an existing chat that already has a previous image reply, if sendPrompt() does not stick immediately or Grok takes a moment to append the new user/assistant bubbles, the recorder can stabilize on the previous assistant image set and return stale URLs after ~6s.

I would capture a baseline bubble count/signature before sendPrompt(), then only accept images from bubbles that appeared after that baseline. That keeps grok image --new false aligned with the stale-response protection that grok ask --web true already uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants