Chapter 3 by oeway · Pull Request #9 · aicjanelia/BioImagingAI

oeway · 2025-07-03T15:14:24Z

Hi @ScientistRachel and @opp1231 while I am still working on this, maybe you can give me some hint on how we should attach source code files? In my case I have some python code to generate data for figures, train models, which is not meant for show to the readers, but they can check in details if needed. In this case can I leave it as some python files? where should place them? also where should I place figures? I have them in a separate folders. Please comment if you see changes needed regarding these.

Content-wise, it's a dump of everything right now, need a bit more work on re-org and polishing, will do with a postdoc in my group.

ScientistRachel · 2025-07-11T14:31:04Z

Hi @oeway,

Thank you for the draft PR. Source code files can go in the upper-level notebooks folder of the repo, and your chapter can link to them as a reference. Your assets folder makes sense for your images, and if any of your source code should be less prominent, leaving it there as well is an option.

The draft so far has a lot of interesting information. We suggest expanding the intro paragraphs a bit to give an expanded road map of the chapter. This will be helpful for readers who may know very little about LLMs beyond ChatGPT's existence. We also suggest as you refine the text making sure that terms are well defined for inexperienced readers. We will wait on fully editing the draft until you tell us it is ready for review.

All the best,
Rachel & Owen

ScientistRachel · 2026-01-12T14:14:09Z

Hi @oeway,

We wanted to briefly follow up on the status of your draft PR. You had mentioned wanted to work on the draft more before a full review. Please let us know when this draft is ready, and we will be happy to take a look.

Thank you,
Rachel & Owen

# Conflicts: # docs/3-llms.qmd # docs/references.bib

- Fix critical factual error: LLM training description now correctly describes autoregressive next-token prediction (was incorrectly describing BERT-style masked prediction) - Remove fabricated VLM confidence scores (95.0%, 90.0%, 85.0%); replace with honest qualitative description - Add missing citations: hallucination rates (Chelli et al. 2024, JMIR), EIMS microscope system (Huang et al. 2025, bioRxiv) - Add 2026 citations: Wang et al. (Nat Biomed Eng) for LLM coding benchmarks, Li et al. (Nat Biotech) for agentic AI in biomedical research, Qu et al. (Nat Biomed Eng) for CRISPR-GPT - Reduce redundancy: SWE-bench stats consolidated from 5 mentions to 2, VLM generalization demo deduplicated across sections - Fix AI-generated style: replace ~15 instances of "remarkable", "dramatic", "transformative" with precise language; tighten formulaic transitions and reduce hype throughout - Verified chapter compiles with Quarto 1.6.42 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…reduce hype - Section 3.1: Merged redundant "Complexity Challenge" and "Overwhelming Landscape" subsections; removed overstated claims ("cannot be overstated", "paradigm shift", "profound") - Section 3.2: Improved LLM intro paragraph; renamed "Promise and Current Limitations" to "What LLM Code Generation Can and Cannot Do Today" with more specific content - Section 3.3: Enriched Function Calling section with motivating problem, detailed MCP explanation with microscopy example, expanded Trust Escalation discussion - Section 3.4: Rewrote VLM intro for clarity; expanded limitations subsection with cost/speed specifics; added vision-guided programming paragraph - Section 3.5: Tightened agent intro (removed "qualitative leap"); condensed harness engineering and execution environment paragraphs; improved computer use precision - Section 3.6: Added concrete calcium imaging example; toned down "revolution" language - Sections 3.7-3.9: Minor tightening of opening paragraphs, improved precision - Global: Removed remaining hype words, improved scientific precision throughout - Verified: Quarto build succeeds (only expected @sec-10 warning) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…essibility - Remove vision-guided programming duplication (was described twice in 3.4) - Remove MCP/Linux Foundation duplication between 3.3 and 3.5 (now cross-refs) - Simplify memory systems paragraph with microscopy-specific example - Tighten Dynamic Code Generation subsection (overlapped with 3.2 and 3.6) - Restructure agent ecosystem callout box by category instead of flat list - Tighten conclusion — remove grandiose phrasing - Fix awkward "can barely do extrapolation" phrasing in 3.1 - Replace "quiet revolution" cliché in 3.6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…hter conclusion - Fix overclaiming: 'human-level' → 'expert-level on well-represented types'; logprob ≠ calibrated confidence; 'not science fiction' → measured framing - Tone: remove celebrity attributions (Karpathy, Tobi Lutke, OpenAI team); contextualize 'vibe coding'; replace spreadsheet analogy with microscopy one - Clarify SmartEM/pySTED are ML-based, not LLM agents - Add reproducibility paragraph to Section 3.6 (ephemeral software) - Add validation caveat to 'within minutes' and 'nearly free' claims - Slim Section 3.5 validation callout to cross-reference §3.8 - Tighten conclusion: reduce recap, add synthesis on what changed (narrow task automation → intent-driven interaction), weave cross-refs into prose - Soften 'emerging' tier in hype check; add conditional tense to §3.6 - Hedge VLM benchmark claim; date-stamp Gemini CLI free tier - Quarto build verified (only expected @sec-10 warning) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…, expanded conclusion Major changes: - Section 3.1: Tightened extrapolation discussion, shortened figure captions, moved VLM generalization demo to 3.4 where VLMs are properly introduced - Section 3.2: Retitled ImageJ subsection, merged benchmarking + limits - Section 3.3: Added concrete JSON function-call example for microscopists - Section 3.4: Restructured into two focused subsections (interpretation + capabilities/limits), integrated VLM generalization from 3.1 - Section 3.5: Consolidated agent architecture, rewrote Omega/BioImage.IO descriptions to focus on paradigm not features, noted EIMS is preprint - Section 3.6: Strengthened reproducibility discussion, added institutional knowledge concern, integrated software-obsolescence discussion from 3.8 - Section 3.7: Added prompt engineering tips, fixed FUCCI biology (S-phase = co-expression), added FUCCI citation, added data privacy callout, added coding agent comparison table - Section 3.8: Split reproducibility/bias into separate subsections, developed skills atrophy with concrete FUCCI scenario, sharpened hype-check tiers - Section 3.9: Expanded from 3 paragraphs to proper 5-paragraph synthesis - AI Disclosure: More specific about models and roles - Global: Reduced 'substantially'/'fundamentally' overuse, varied sentence structure, removed formulaic patterns, standardized BioImage.IO naming Net: 267 insertions, 288 deletions (21 lines shorter, richer content) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ures - Added Claude Code, Gemini CLI, Codex to AI Agents row in model evolution table - Added intent-driven microscopy Excalidraw diagram - Updated glossary entries - Added CLAUDE.md project development guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

oeway · 2026-03-19T11:28:43Z

Hi all — apologies for the long delay on this. The draft for Chapter 3 has gone through several rounds of revision and should be in good shape now. Would appreciate your feedback whenever you get a chance. Thanks!

These are local development config files and should not be in the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This commit contains formatting changes suggested as part of my review of the Chapter 3 draft. Those changes include things like removing parenthesis around references, cross-linking to other chapters, etc. This commit also includes new glossary terms and cross-links to the glossary throughout the chapter.

opp1231 · 2026-03-27T18:10:26Z

docs/3-llms-assets/neural_network_extrapolation_failure.png

The "5 + 3 --> 7" textbox covers up both its own arrow and the plot slightly. This is extremely minor. If it is a pain to remake the figure, I would not worry about it.

opp1231 · 2026-03-27T18:11:26Z

docs/3-llms.qmd

+
+Over the past decade, the way we analyze microscopy data has changed markedly. What began with specialized deep learning models for individual tasks — segmentation, denoising, classification — has evolved into a broader landscape where AI systems can assist with experiment planning, code generation, instrument control, and workflow adaptation. This chapter traces that evolution and equips microscopists with the knowledge to navigate it.
+
+We begin by examining the limitations of traditional deep learning and the rise of foundation models that generalize across imaging modalities. We then introduce large language models and explain how they work at an intuitive level (@sec-llm-foundations), before exploring how function calling transformed them from text generators into action-taking agents (@sec-function-calling). The chapter covers vision-language models for microscopy (@sec-multimodal-ai) and the growing ecosystem of AI agents that can orchestrate complex workflows, control instruments, and generate software on demand (@sec-ai-agents, @sec-disposable-software). A hands-on practical guide (@sec-practical-guide) provides concrete workflows for getting started, followed by an honest discussion of challenges, implications for the profession, and an outlook on where the field is headed (@sec-challenges).


This is a great roadmap for the chapter!

opp1231 · 2026-03-27T18:11:59Z

docs/3-llms.qmd

+**Segment Anything Model (SAM)**, released by Meta in 2023, advanced the field of promptable segmentation further. Trained on over one billion masks from 11 million images, SAM demonstrated that a single model could segment essentially any object in any image through simple prompts (points, boxes, or text). Despite being trained entirely on natural images, SAM showed unexpected capability on out-of-domain scientific images, sparking a wave of microscopy-specific adaptations:
+
+- **micro-SAM** @Archit2025 fine-tuned SAM specifically for light and electron microscopy, extending it to support 3D segmentation and cell tracking. micro-SAM provides interactive segmentation tools through a napari plugin, making foundation model capabilities directly accessible to microscopists.
+- **CellSAM** @cellsam2025 combined SAM with a custom cell detection module (CellFinder) to achieve expert-level segmentation performance across mammalian cells, yeast, and bacteria — all with a single unified model.


We suggest adding CellPose-SAM to this list. It is discussed with microSAM and CellSAM later, so adding it here adds some clarity and provides a place to give the citation (already in references.bib as @pachitariu2025cellpose)

opp1231 · 2026-03-27T18:13:03Z

docs/3-llms.qmd

+
+The process begins with **data collection**: gathering domain-specific examples, such as ImageJ macro files (`.ijm`) from public GitHub repositories. Even a modest collection — say, 28 macro files totaling about 82,000 characters — provides enough material for a small model to learn the syntax and patterns of ImageJ macro language.
+
+Rather than training from scratch, this approach uses [transfer learning](glossary.qmd#transfer-learning): starting with a small, pre-trained model (such as SmolLM2 with 135 million parameters — orders of magnitude smaller than commercial frontier models) and [fine-tuning](glossary.qmd#fine-tuning) it on microscopy-specific code. During fine-tuning, the model learns to predict the next token in ImageJ macro sequences, absorbing the syntax, common function calls, and typical workflow patterns. On a modern laptop, this process can complete in 10–20 minutes without specialized hardware.


We suggest adding a link or citation for SmolLM2.

opp1231 · 2026-03-27T18:13:46Z

docs/3-llms.qmd

+
+Function calling created a spectrum of increasing autonomy. At one end sits **assisted mode**, where the AI suggests tool calls but a human approves each one — useful when learning a new workflow or working with unfamiliar data. In the middle is **supervised mode**, where routine actions (loading an image, running a standard segmentation) execute automatically while consequential ones (deleting data, modifying acquisition settings) require confirmation. At the far end is **autonomous mode**, where the AI plans and executes multi-step workflows with minimal human intervention, checking in only for high-stakes decisions.
+
+This progression mirrors how we delegate tasks to human colleagues — starting with close supervision and gradually extending trust as competence is demonstrated. In microscopy, the boundaries matter: an AI agent might autonomously run routine segmentation, but it should pause before modifying acquisition parameters on an expensive experiment, before discarding images that might contain rare biological events, or before committing to an irreversible analysis method. Deciding where to draw these boundaries is a design decision that each laboratory must make based on the stakes of the experiment.


This is good insight.

opp1231 · 2026-03-27T18:15:51Z

docs/3-llms.qmd

+
+## AI Agents for Microscopy and Workflow Orchestration {#sec-ai-agents}
+
+LLMs and VLMs provide powerful individual capabilities — generating text, writing code, interpreting images — but each interaction is essentially one-shot: the user asks, the model responds. **AI agents** extend these models into systems that can take actions (running code, calling APIs, controlling hardware), observe the results, and adapt their approach iteratively. Where a language model answers a question, an agent works through a problem step by step until it reaches a solution — or recognizes that it cannot.


It might be worth discussing somewhere in the chapter how/when AI agents recognize that it cannot answer a query with sufficient confidence. This would give readers intuition about what sort of prompts are suitable, as well as what "I don't know" means from the AI's POV.

opp1231 · 2026-03-27T18:16:34Z

docs/3-llms.qmd

+
+**Claude Code** (Anthropic) is a CLI-based agent that can read and edit files across entire codebases, run shell commands, execute tests, interact with version control, and orchestrate complex multi-step tasks — all while maintaining context across long sessions with million-token context windows. **Devin** (Cognition Labs) operates in a sandboxed development environment with its own code editor, terminal, and web browser. **GitHub Copilot** has evolved from inline code suggestions to agent mode, where it can plan changes across repositories, edit multiple files, and iterate on feedback. **Cursor** and **Windsurf** represent a new category of AI-native code editors with deeply integrated agent capabilities.
+
+The pace of improvement has been notable. On the SWE-bench Verified benchmark — which tests agents on real-world GitHub issues requiring understanding codebases, diagnosing bugs, and implementing fixes — success rates rose from under 5% in early 2024 to over 70% by late 2025. These benchmarks measure general software engineering, not scientific applications specifically, but the underlying capabilities — reading code, debugging errors, composing multi-step solutions — are directly relevant to microscopy analysis pipelines.


Is there a good citation for the SWE-bench success rates?

opp1231 · 2026-03-27T18:17:46Z

docs/3-llms.qmd

+
+The concern is real but addressable with existing practices. The generated code is ordinary Python, R, or ImageJ macro — it can be inspected, version-controlled, and shared like any other analysis script. Researchers using this approach should save the generated code alongside their data, record the prompt and model used to produce it, and include the script in their supplementary materials. Importantly, on-demand tools typically compose and configure established, validated libraries — scikit-image, Cellpose, napari — rather than implementing algorithms from scratch. The generated code calls the same functions that a human programmer would use; the agent handles the boilerplate of loading files, configuring parameters, and creating visualizations.
+
+A subtler concern is long-term reproducibility. If a finding is questioned years later, the analysis must still be runnable. Version-pinning dependencies (recording the exact library versions used), including the generated code in a repository, and documenting any manual steps taken during the analysis all help — the same practices that good computational science has always required, but that are easily neglected when the code feels disposable.


This (and the next paragraph) is an optional opportunity to add a callout box on best practices for what to record when using a model.

opp1231 · 2026-03-27T18:18:28Z

docs/3-llms.qmd

+
+At the same time, this approach sits *on top of* the established software ecosystem, not in place of it. When an agent generates a segmentation script, it calls Cellpose or StarDist underneath. When it builds a visualization, it uses matplotlib or napari. The durable, community-maintained tools become the validated building blocks that agents compose; they become more valuable in an agent-driven world, not less. Software that exposes programmable APIs and command-line interfaces will thrive because agents can call it. Closed, GUI-only tools that cannot be invoked programmatically face a harder path — not because they are technically inferior, but because agents cannot use what they cannot call.
+
+A useful analogy: autofocus did not eliminate the need to understand optics — it freed microscopists to focus on experimental design rather than manual focusing. Similarly, AI agents will not eliminate ImageJ, but they will reduce the need to memorize its menu hierarchy or hand-write batch-processing macros.


We think this analogy to autofocus is really helpful!

opp1231 · 2026-03-27T18:19:11Z

docs/3-llms.qmd

+
+The broader implication is a change in the relationship between microscopists and their computational tools. Traditional software requires the researcher to learn the tool's interface and adapt their workflow to its logic. On-demand software generation inverts this: the researcher describes the goal, and the software adapts to the researcher. This means that researchers who understand their biology deeply but lack extensive programming training can access computational analysis that was previously gated by programming skill — a meaningful expansion of who can do quantitative microscopy.
+
+At the same time, this approach sits *on top of* the established software ecosystem, not in place of it. When an agent generates a segmentation script, it calls Cellpose or StarDist underneath. When it builds a visualization, it uses matplotlib or napari. The durable, community-maintained tools become the validated building blocks that agents compose; they become more valuable in an agent-driven world, not less. Software that exposes programmable APIs and command-line interfaces will thrive because agents can call it. Closed, GUI-only tools that cannot be invoked programmatically face a harder path — not because they are technically inferior, but because agents cannot use what they cannot call.


This also puts the onus on analysts and software engineers to build new and better tools that agents can ultimately call within workflows.

opp1231 · 2026-03-27T18:20:07Z

docs/3-llms.qmd

+
+If AI agents can generate analysis pipelines, write code, and operate microscopes, what becomes of the bioimage analyst?
+
+The role remains essential — but it is evolving. As AI makes advanced analysis accessible to more researchers, the demand for expert guidance on validation, interpretation, and experimental design is likely to increase rather than decrease. The job description is shifting: less "the person who runs CellProfiler" and more "the person who evaluates whether an AI-generated pipeline produces biologically meaningful results, recognizes when a segmentation has failed in subtle ways, and designs experiments amenable to rigorous quantitative analysis."


As bioimage analysts, we really enjoyed this description!

opp1231 · 2026-03-27T18:20:40Z

docs/3-llms.qmd

+For readers looking to go deeper: @sec-4 introduces the neural network architectures underlying the models discussed here, @sec-9 provides guidance on training and fine-tuning models for specific tasks, and @sec-output_quality offers frameworks for evaluating whether AI-generated analysis outputs meet the quality standards that scientific research demands.

-## Practical Guide: Getting Started with LLMs for Microscopy {#sec-practical-guide}
+## AI Disclosure {.unnumbered}


We really appreciate this thoughtful and thorough disclosure.

opp1231

Hi Wei,

Thank you for this draft! We think it is very solid. Rachel and I have gone through and changed some formatting here and there, and added some links and glossary terms, where relevant. We also noticed a few small things that you may consider changing. Please take a look at our changes and comments. Once you have a chance to go over these small things, we think this is essentially ready to be merged into main and put up on the website.

Thank you again for your efforts in making this book possible!

- Add Cellpose-SAM to SAM adaptations list with citation - Add SmolLM2 and SWE-bench Verified citations - Add paragraph on how AI agents handle uncertainty - Add callout box on best practices for recording AI model usage - Expand reproducibility section with technical causes of LLM non-determinism (temperature, sampling, hardware, model versioning) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

oeway · 2026-03-29T01:53:08Z

Thank you @ScientistRachel and @opp1231 for the detailed and thoughtful review! I really appreciate the formatting improvements, glossary links, and the specific suggestions — they all made the chapter stronger.

I've pushed a commit addressing the actionable comments:

Added Cellpose-SAM to the SAM adaptations list with citation
Added a citation for SmolLM2
Added a citation for SWE-bench Verified
Added a paragraph discussing how AI agents handle uncertainty and recognize when they cannot answer confidently
Added a callout box on best practices for what to record when using an AI model
Expanded the reproducibility section with more detail on the technical causes of LLM non-determinism (temperature, sampling, hardware, model versioning)

This should be ready for merge now. Thanks again for the invitation to contribute to this book — it's been a great experience!

oeway added 3 commits July 3, 2025 01:27

Initial draft for llm section

9ab764e

Remove file

d60a352

fix cursorrules

4d6431d

oeway and others added 8 commits February 23, 2026 22:33

small fix

47bfa54

Merge remote-tracking branch 'origin/main' into draft-3-llm

8525b9b

# Conflicts: # docs/3-llms.qmd # docs/references.bib

oeway and others added 2 commits March 19, 2026 12:29

Remove CLAUDE.md and .cursorrules from repository

151ae37

These are local development config files and should not be in the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove invalid quotation characters

be7da27

oeway marked this pull request as ready for review March 23, 2026 18:21

oeway requested review from ScientistRachel and opp1231 as code owners March 23, 2026 18:21

oeway changed the title ~~Initial Draft for Chapter 3~~ Chapter 3 Mar 24, 2026

ScientistRachel and others added 4 commits March 26, 2026 15:50

Merge branch 'main' into draft-3-llm

96704f3

Clean up merged references with BibTeX Tidy

c8b0fd2

Additional formatting changes and glossary terms

5a4c188

opp1231 reviewed Mar 27, 2026

View reviewed changes

opp1231 requested changes Mar 27, 2026

View reviewed changes


		Over the past decade, the way we analyze microscopy data has changed markedly. What began with specialized deep learning models for individual tasks — segmentation, denoising, classification — has evolved into a broader landscape where AI systems can assist with experiment planning, code generation, instrument control, and workflow adaptation. This chapter traces that evolution and equips microscopists with the knowledge to navigate it.

		We begin by examining the limitations of traditional deep learning and the rise of foundation models that generalize across imaging modalities. We then introduce large language models and explain how they work at an intuitive level (@sec-llm-foundations), before exploring how function calling transformed them from text generators into action-taking agents (@sec-function-calling). The chapter covers vision-language models for microscopy (@sec-multimodal-ai) and the growing ecosystem of AI agents that can orchestrate complex workflows, control instruments, and generate software on demand (@sec-ai-agents, @sec-disposable-software). A hands-on practical guide (@sec-practical-guide) provides concrete workflows for getting started, followed by an honest discussion of challenges, implications for the profession, and an outlook on where the field is headed (@sec-challenges).


		The process begins with data collection: gathering domain-specific examples, such as ImageJ macro files (`.ijm`) from public GitHub repositories. Even a modest collection — say, 28 macro files totaling about 82,000 characters — provides enough material for a small model to learn the syntax and patterns of ImageJ macro language.

		Rather than training from scratch, this approach uses [transfer learning](glossary.qmd#transfer-learning): starting with a small, pre-trained model (such as SmolLM2 with 135 million parameters — orders of magnitude smaller than commercial frontier models) and [fine-tuning](glossary.qmd#fine-tuning) it on microscopy-specific code. During fine-tuning, the model learns to predict the next token in ImageJ macro sequences, absorbing the syntax, common function calls, and typical workflow patterns. On a modern laptop, this process can complete in 10–20 minutes without specialized hardware.


		Function calling created a spectrum of increasing autonomy. At one end sits assisted mode, where the AI suggests tool calls but a human approves each one — useful when learning a new workflow or working with unfamiliar data. In the middle is supervised mode, where routine actions (loading an image, running a standard segmentation) execute automatically while consequential ones (deleting data, modifying acquisition settings) require confirmation. At the far end is autonomous mode, where the AI plans and executes multi-step workflows with minimal human intervention, checking in only for high-stakes decisions.

		This progression mirrors how we delegate tasks to human colleagues — starting with close supervision and gradually extending trust as competence is demonstrated. In microscopy, the boundaries matter: an AI agent might autonomously run routine segmentation, but it should pause before modifying acquisition parameters on an expensive experiment, before discarding images that might contain rare biological events, or before committing to an irreversible analysis method. Deciding where to draw these boundaries is a design decision that each laboratory must make based on the stakes of the experiment.


		## AI Agents for Microscopy and Workflow Orchestration {#sec-ai-agents}

		LLMs and VLMs provide powerful individual capabilities — generating text, writing code, interpreting images — but each interaction is essentially one-shot: the user asks, the model responds. AI agents extend these models into systems that can take actions (running code, calling APIs, controlling hardware), observe the results, and adapt their approach iteratively. Where a language model answers a question, an agent works through a problem step by step until it reaches a solution — or recognizes that it cannot.


		Claude Code (Anthropic) is a CLI-based agent that can read and edit files across entire codebases, run shell commands, execute tests, interact with version control, and orchestrate complex multi-step tasks — all while maintaining context across long sessions with million-token context windows. Devin (Cognition Labs) operates in a sandboxed development environment with its own code editor, terminal, and web browser. GitHub Copilot has evolved from inline code suggestions to agent mode, where it can plan changes across repositories, edit multiple files, and iterate on feedback. Cursor and Windsurf represent a new category of AI-native code editors with deeply integrated agent capabilities.

		The pace of improvement has been notable. On the SWE-bench Verified benchmark — which tests agents on real-world GitHub issues requiring understanding codebases, diagnosing bugs, and implementing fixes — success rates rose from under 5% in early 2024 to over 70% by late 2025. These benchmarks measure general software engineering, not scientific applications specifically, but the underlying capabilities — reading code, debugging errors, composing multi-step solutions — are directly relevant to microscopy analysis pipelines.


		The concern is real but addressable with existing practices. The generated code is ordinary Python, R, or ImageJ macro — it can be inspected, version-controlled, and shared like any other analysis script. Researchers using this approach should save the generated code alongside their data, record the prompt and model used to produce it, and include the script in their supplementary materials. Importantly, on-demand tools typically compose and configure established, validated libraries — scikit-image, Cellpose, napari — rather than implementing algorithms from scratch. The generated code calls the same functions that a human programmer would use; the agent handles the boilerplate of loading files, configuring parameters, and creating visualizations.

		A subtler concern is long-term reproducibility. If a finding is questioned years later, the analysis must still be runnable. Version-pinning dependencies (recording the exact library versions used), including the generated code in a repository, and documenting any manual steps taken during the analysis all help — the same practices that good computational science has always required, but that are easily neglected when the code feels disposable.


		At the same time, this approach sits on top of the established software ecosystem, not in place of it. When an agent generates a segmentation script, it calls Cellpose or StarDist underneath. When it builds a visualization, it uses matplotlib or napari. The durable, community-maintained tools become the validated building blocks that agents compose; they become more valuable in an agent-driven world, not less. Software that exposes programmable APIs and command-line interfaces will thrive because agents can call it. Closed, GUI-only tools that cannot be invoked programmatically face a harder path — not because they are technically inferior, but because agents cannot use what they cannot call.

		A useful analogy: autofocus did not eliminate the need to understand optics — it freed microscopists to focus on experimental design rather than manual focusing. Similarly, AI agents will not eliminate ImageJ, but they will reduce the need to memorize its menu hierarchy or hand-write batch-processing macros.


		The broader implication is a change in the relationship between microscopists and their computational tools. Traditional software requires the researcher to learn the tool's interface and adapt their workflow to its logic. On-demand software generation inverts this: the researcher describes the goal, and the software adapts to the researcher. This means that researchers who understand their biology deeply but lack extensive programming training can access computational analysis that was previously gated by programming skill — a meaningful expansion of who can do quantitative microscopy.

		At the same time, this approach sits on top of the established software ecosystem, not in place of it. When an agent generates a segmentation script, it calls Cellpose or StarDist underneath. When it builds a visualization, it uses matplotlib or napari. The durable, community-maintained tools become the validated building blocks that agents compose; they become more valuable in an agent-driven world, not less. Software that exposes programmable APIs and command-line interfaces will thrive because agents can call it. Closed, GUI-only tools that cannot be invoked programmatically face a harder path — not because they are technically inferior, but because agents cannot use what they cannot call.


		If AI agents can generate analysis pipelines, write code, and operate microscopes, what becomes of the bioimage analyst?

		The role remains essential — but it is evolving. As AI makes advanced analysis accessible to more researchers, the demand for expert guidance on validation, interpretation, and experimental design is likely to increase rather than decrease. The job description is shifting: less "the person who runs CellProfiler" and more "the person who evaluates whether an AI-generated pipeline produces biologically meaningful results, recognizes when a segmentation has failed in subtle ways, and designs experiments amenable to rigorous quantitative analysis."

Conversation

oeway commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScientistRachel commented Jul 11, 2025

Uh oh!

ScientistRachel commented Jan 12, 2026

Uh oh!

oeway commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

opp1231 left a comment

Choose a reason for hiding this comment

Uh oh!

oeway commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oeway commented Jul 3, 2025 •

edited

Loading

oeway commented Mar 19, 2026 •

edited

Loading