Conversation
|
Hi @oeway, Thank you for the draft PR. Source code files can go in the upper-level notebooks folder of the repo, and your chapter can link to them as a reference. Your assets folder makes sense for your images, and if any of your source code should be less prominent, leaving it there as well is an option. The draft so far has a lot of interesting information. We suggest expanding the intro paragraphs a bit to give an expanded road map of the chapter. This will be helpful for readers who may know very little about LLMs beyond ChatGPT's existence. We also suggest as you refine the text making sure that terms are well defined for inexperienced readers. We will wait on fully editing the draft until you tell us it is ready for review. All the best, |
|
Hi @oeway, We wanted to briefly follow up on the status of your draft PR. You had mentioned wanted to work on the draft more before a full review. Please let us know when this draft is ready, and we will be happy to take a look. Thank you, |
# Conflicts: # docs/3-llms.qmd # docs/references.bib
- Fix critical factual error: LLM training description now correctly describes autoregressive next-token prediction (was incorrectly describing BERT-style masked prediction) - Remove fabricated VLM confidence scores (95.0%, 90.0%, 85.0%); replace with honest qualitative description - Add missing citations: hallucination rates (Chelli et al. 2024, JMIR), EIMS microscope system (Huang et al. 2025, bioRxiv) - Add 2026 citations: Wang et al. (Nat Biomed Eng) for LLM coding benchmarks, Li et al. (Nat Biotech) for agentic AI in biomedical research, Qu et al. (Nat Biomed Eng) for CRISPR-GPT - Reduce redundancy: SWE-bench stats consolidated from 5 mentions to 2, VLM generalization demo deduplicated across sections - Fix AI-generated style: replace ~15 instances of "remarkable", "dramatic", "transformative" with precise language; tighten formulaic transitions and reduce hype throughout - Verified chapter compiles with Quarto 1.6.42 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…reduce hype
- Section 3.1: Merged redundant "Complexity Challenge" and "Overwhelming Landscape" subsections;
removed overstated claims ("cannot be overstated", "paradigm shift", "profound")
- Section 3.2: Improved LLM intro paragraph; renamed "Promise and Current Limitations" to
"What LLM Code Generation Can and Cannot Do Today" with more specific content
- Section 3.3: Enriched Function Calling section with motivating problem, detailed MCP
explanation with microscopy example, expanded Trust Escalation discussion
- Section 3.4: Rewrote VLM intro for clarity; expanded limitations subsection with
cost/speed specifics; added vision-guided programming paragraph
- Section 3.5: Tightened agent intro (removed "qualitative leap"); condensed harness
engineering and execution environment paragraphs; improved computer use precision
- Section 3.6: Added concrete calcium imaging example; toned down "revolution" language
- Sections 3.7-3.9: Minor tightening of opening paragraphs, improved precision
- Global: Removed remaining hype words, improved scientific precision throughout
- Verified: Quarto build succeeds (only expected @sec-10 warning)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…essibility - Remove vision-guided programming duplication (was described twice in 3.4) - Remove MCP/Linux Foundation duplication between 3.3 and 3.5 (now cross-refs) - Simplify memory systems paragraph with microscopy-specific example - Tighten Dynamic Code Generation subsection (overlapped with 3.2 and 3.6) - Restructure agent ecosystem callout box by category instead of flat list - Tighten conclusion — remove grandiose phrasing - Fix awkward "can barely do extrapolation" phrasing in 3.1 - Replace "quiet revolution" cliché in 3.6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hter conclusion - Fix overclaiming: 'human-level' → 'expert-level on well-represented types'; logprob ≠ calibrated confidence; 'not science fiction' → measured framing - Tone: remove celebrity attributions (Karpathy, Tobi Lutke, OpenAI team); contextualize 'vibe coding'; replace spreadsheet analogy with microscopy one - Clarify SmartEM/pySTED are ML-based, not LLM agents - Add reproducibility paragraph to Section 3.6 (ephemeral software) - Add validation caveat to 'within minutes' and 'nearly free' claims - Slim Section 3.5 validation callout to cross-reference §3.8 - Tighten conclusion: reduce recap, add synthesis on what changed (narrow task automation → intent-driven interaction), weave cross-refs into prose - Soften 'emerging' tier in hype check; add conditional tense to §3.6 - Hedge VLM benchmark claim; date-stamp Gemini CLI free tier - Quarto build verified (only expected @sec-10 warning) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, expanded conclusion Major changes: - Section 3.1: Tightened extrapolation discussion, shortened figure captions, moved VLM generalization demo to 3.4 where VLMs are properly introduced - Section 3.2: Retitled ImageJ subsection, merged benchmarking + limits - Section 3.3: Added concrete JSON function-call example for microscopists - Section 3.4: Restructured into two focused subsections (interpretation + capabilities/limits), integrated VLM generalization from 3.1 - Section 3.5: Consolidated agent architecture, rewrote Omega/BioImage.IO descriptions to focus on paradigm not features, noted EIMS is preprint - Section 3.6: Strengthened reproducibility discussion, added institutional knowledge concern, integrated software-obsolescence discussion from 3.8 - Section 3.7: Added prompt engineering tips, fixed FUCCI biology (S-phase = co-expression), added FUCCI citation, added data privacy callout, added coding agent comparison table - Section 3.8: Split reproducibility/bias into separate subsections, developed skills atrophy with concrete FUCCI scenario, sharpened hype-check tiers - Section 3.9: Expanded from 3 paragraphs to proper 5-paragraph synthesis - AI Disclosure: More specific about models and roles - Global: Reduced 'substantially'/'fundamentally' overuse, varied sentence structure, removed formulaic patterns, standardized BioImage.IO naming Net: 267 insertions, 288 deletions (21 lines shorter, richer content) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ures - Added Claude Code, Gemini CLI, Codex to AI Agents row in model evolution table - Added intent-driven microscopy Excalidraw diagram - Updated glossary entries - Added CLAUDE.md project development guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Hi all — apologies for the long delay on this. The draft for Chapter 3 has gone through several rounds of revision and should be in good shape now. Would appreciate your feedback whenever you get a chance. Thanks! |
These are local development config files and should not be in the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit contains formatting changes suggested as part of my review of the Chapter 3 draft. Those changes include things like removing parenthesis around references, cross-linking to other chapters, etc. This commit also includes new glossary terms and cross-links to the glossary throughout the chapter.
There was a problem hiding this comment.
The "5 + 3 --> 7" textbox covers up both its own arrow and the plot slightly. This is extremely minor. If it is a pain to remake the figure, I would not worry about it.
|
|
||
| Over the past decade, the way we analyze microscopy data has changed markedly. What began with specialized deep learning models for individual tasks — segmentation, denoising, classification — has evolved into a broader landscape where AI systems can assist with experiment planning, code generation, instrument control, and workflow adaptation. This chapter traces that evolution and equips microscopists with the knowledge to navigate it. | ||
|
|
||
| We begin by examining the limitations of traditional deep learning and the rise of foundation models that generalize across imaging modalities. We then introduce large language models and explain how they work at an intuitive level (@sec-llm-foundations), before exploring how function calling transformed them from text generators into action-taking agents (@sec-function-calling). The chapter covers vision-language models for microscopy (@sec-multimodal-ai) and the growing ecosystem of AI agents that can orchestrate complex workflows, control instruments, and generate software on demand (@sec-ai-agents, @sec-disposable-software). A hands-on practical guide (@sec-practical-guide) provides concrete workflows for getting started, followed by an honest discussion of challenges, implications for the profession, and an outlook on where the field is headed (@sec-challenges). |
There was a problem hiding this comment.
This is a great roadmap for the chapter!
| **Segment Anything Model (SAM)**, released by Meta in 2023, advanced the field of promptable segmentation further. Trained on over one billion masks from 11 million images, SAM demonstrated that a single model could segment essentially any object in any image through simple prompts (points, boxes, or text). Despite being trained entirely on natural images, SAM showed unexpected capability on out-of-domain scientific images, sparking a wave of microscopy-specific adaptations: | ||
|
|
||
| - **micro-SAM** @Archit2025 fine-tuned SAM specifically for light and electron microscopy, extending it to support 3D segmentation and cell tracking. micro-SAM provides interactive segmentation tools through a napari plugin, making foundation model capabilities directly accessible to microscopists. | ||
| - **CellSAM** @cellsam2025 combined SAM with a custom cell detection module (CellFinder) to achieve expert-level segmentation performance across mammalian cells, yeast, and bacteria — all with a single unified model. |
There was a problem hiding this comment.
We suggest adding CellPose-SAM to this list. It is discussed with microSAM and CellSAM later, so adding it here adds some clarity and provides a place to give the citation (already in references.bib as @pachitariu2025cellpose)
docs/3-llms.qmd
Outdated
|
|
||
| The process begins with **data collection**: gathering domain-specific examples, such as ImageJ macro files (`.ijm`) from public GitHub repositories. Even a modest collection — say, 28 macro files totaling about 82,000 characters — provides enough material for a small model to learn the syntax and patterns of ImageJ macro language. | ||
|
|
||
| Rather than training from scratch, this approach uses [transfer learning](glossary.qmd#transfer-learning): starting with a small, pre-trained model (such as SmolLM2 with 135 million parameters — orders of magnitude smaller than commercial frontier models) and [fine-tuning](glossary.qmd#fine-tuning) it on microscopy-specific code. During fine-tuning, the model learns to predict the next token in ImageJ macro sequences, absorbing the syntax, common function calls, and typical workflow patterns. On a modern laptop, this process can complete in 10–20 minutes without specialized hardware. |
There was a problem hiding this comment.
We suggest adding a link or citation for SmolLM2.
|
|
||
| Function calling created a spectrum of increasing autonomy. At one end sits **assisted mode**, where the AI suggests tool calls but a human approves each one — useful when learning a new workflow or working with unfamiliar data. In the middle is **supervised mode**, where routine actions (loading an image, running a standard segmentation) execute automatically while consequential ones (deleting data, modifying acquisition settings) require confirmation. At the far end is **autonomous mode**, where the AI plans and executes multi-step workflows with minimal human intervention, checking in only for high-stakes decisions. | ||
|
|
||
| This progression mirrors how we delegate tasks to human colleagues — starting with close supervision and gradually extending trust as competence is demonstrated. In microscopy, the boundaries matter: an AI agent might autonomously run routine segmentation, but it should pause before modifying acquisition parameters on an expensive experiment, before discarding images that might contain rare biological events, or before committing to an irreversible analysis method. Deciding where to draw these boundaries is a design decision that each laboratory must make based on the stakes of the experiment. |
|
|
||
| ## AI Agents for Microscopy and Workflow Orchestration {#sec-ai-agents} | ||
|
|
||
| LLMs and VLMs provide powerful individual capabilities — generating text, writing code, interpreting images — but each interaction is essentially one-shot: the user asks, the model responds. **AI agents** extend these models into systems that can take actions (running code, calling APIs, controlling hardware), observe the results, and adapt their approach iteratively. Where a language model answers a question, an agent works through a problem step by step until it reaches a solution — or recognizes that it cannot. |
There was a problem hiding this comment.
It might be worth discussing somewhere in the chapter how/when AI agents recognize that it cannot answer a query with sufficient confidence. This would give readers intuition about what sort of prompts are suitable, as well as what "I don't know" means from the AI's POV.
docs/3-llms.qmd
Outdated
|
|
||
| **Claude Code** (Anthropic) is a CLI-based agent that can read and edit files across entire codebases, run shell commands, execute tests, interact with version control, and orchestrate complex multi-step tasks — all while maintaining context across long sessions with million-token context windows. **Devin** (Cognition Labs) operates in a sandboxed development environment with its own code editor, terminal, and web browser. **GitHub Copilot** has evolved from inline code suggestions to agent mode, where it can plan changes across repositories, edit multiple files, and iterate on feedback. **Cursor** and **Windsurf** represent a new category of AI-native code editors with deeply integrated agent capabilities. | ||
|
|
||
| The pace of improvement has been notable. On the SWE-bench Verified benchmark — which tests agents on real-world GitHub issues requiring understanding codebases, diagnosing bugs, and implementing fixes — success rates rose from under 5% in early 2024 to over 70% by late 2025. These benchmarks measure general software engineering, not scientific applications specifically, but the underlying capabilities — reading code, debugging errors, composing multi-step solutions — are directly relevant to microscopy analysis pipelines. |
There was a problem hiding this comment.
Is there a good citation for the SWE-bench success rates?
|
|
||
| The concern is real but addressable with existing practices. The generated code is ordinary Python, R, or ImageJ macro — it can be inspected, version-controlled, and shared like any other analysis script. Researchers using this approach should save the generated code alongside their data, record the prompt and model used to produce it, and include the script in their supplementary materials. Importantly, on-demand tools typically compose and configure established, validated libraries — scikit-image, Cellpose, napari — rather than implementing algorithms from scratch. The generated code calls the same functions that a human programmer would use; the agent handles the boilerplate of loading files, configuring parameters, and creating visualizations. | ||
|
|
||
| A subtler concern is long-term reproducibility. If a finding is questioned years later, the analysis must still be runnable. Version-pinning dependencies (recording the exact library versions used), including the generated code in a repository, and documenting any manual steps taken during the analysis all help — the same practices that good computational science has always required, but that are easily neglected when the code feels disposable. |
There was a problem hiding this comment.
This (and the next paragraph) is an optional opportunity to add a callout box on best practices for what to record when using a model.
|
|
||
| At the same time, this approach sits *on top of* the established software ecosystem, not in place of it. When an agent generates a segmentation script, it calls Cellpose or StarDist underneath. When it builds a visualization, it uses matplotlib or napari. The durable, community-maintained tools become the validated building blocks that agents compose; they become more valuable in an agent-driven world, not less. Software that exposes programmable APIs and command-line interfaces will thrive because agents can call it. Closed, GUI-only tools that cannot be invoked programmatically face a harder path — not because they are technically inferior, but because agents cannot use what they cannot call. | ||
|
|
||
| A useful analogy: autofocus did not eliminate the need to understand optics — it freed microscopists to focus on experimental design rather than manual focusing. Similarly, AI agents will not eliminate ImageJ, but they will reduce the need to memorize its menu hierarchy or hand-write batch-processing macros. |
There was a problem hiding this comment.
We think this analogy to autofocus is really helpful!
|
|
||
| The broader implication is a change in the relationship between microscopists and their computational tools. Traditional software requires the researcher to learn the tool's interface and adapt their workflow to its logic. On-demand software generation inverts this: the researcher describes the goal, and the software adapts to the researcher. This means that researchers who understand their biology deeply but lack extensive programming training can access computational analysis that was previously gated by programming skill — a meaningful expansion of who can do quantitative microscopy. | ||
|
|
||
| At the same time, this approach sits *on top of* the established software ecosystem, not in place of it. When an agent generates a segmentation script, it calls Cellpose or StarDist underneath. When it builds a visualization, it uses matplotlib or napari. The durable, community-maintained tools become the validated building blocks that agents compose; they become more valuable in an agent-driven world, not less. Software that exposes programmable APIs and command-line interfaces will thrive because agents can call it. Closed, GUI-only tools that cannot be invoked programmatically face a harder path — not because they are technically inferior, but because agents cannot use what they cannot call. |
There was a problem hiding this comment.
This also puts the onus on analysts and software engineers to build new and better tools that agents can ultimately call within workflows.
|
|
||
| If AI agents can generate analysis pipelines, write code, and operate microscopes, what becomes of the bioimage analyst? | ||
|
|
||
| The role remains essential — but it is evolving. As AI makes advanced analysis accessible to more researchers, the demand for expert guidance on validation, interpretation, and experimental design is likely to increase rather than decrease. The job description is shifting: less "the person who runs CellProfiler" and more "the person who evaluates whether an AI-generated pipeline produces biologically meaningful results, recognizes when a segmentation has failed in subtle ways, and designs experiments amenable to rigorous quantitative analysis." |
There was a problem hiding this comment.
As bioimage analysts, we really enjoyed this description!
| For readers looking to go deeper: @sec-4 introduces the neural network architectures underlying the models discussed here, @sec-9 provides guidance on training and fine-tuning models for specific tasks, and @sec-output_quality offers frameworks for evaluating whether AI-generated analysis outputs meet the quality standards that scientific research demands. | ||
|
|
||
| ## Practical Guide: Getting Started with LLMs for Microscopy {#sec-practical-guide} | ||
| ## AI Disclosure {.unnumbered} |
There was a problem hiding this comment.
We really appreciate this thoughtful and thorough disclosure.
opp1231
left a comment
There was a problem hiding this comment.
Hi Wei,
Thank you for this draft! We think it is very solid. Rachel and I have gone through and changed some formatting here and there, and added some links and glossary terms, where relevant. We also noticed a few small things that you may consider changing. Please take a look at our changes and comments. Once you have a chance to go over these small things, we think this is essentially ready to be merged into main and put up on the website.
Thank you again for your efforts in making this book possible!
- Add Cellpose-SAM to SAM adaptations list with citation - Add SmolLM2 and SWE-bench Verified citations - Add paragraph on how AI agents handle uncertainty - Add callout box on best practices for recording AI model usage - Expand reproducibility section with technical causes of LLM non-determinism (temperature, sampling, hardware, model versioning) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thank you @ScientistRachel and @opp1231 for the detailed and thoughtful review! I really appreciate the formatting improvements, glossary links, and the specific suggestions — they all made the chapter stronger. I've pushed a commit addressing the actionable comments:
This should be ready for merge now. Thanks again for the invitation to contribute to this book — it's been a great experience! |
Hi @ScientistRachel and @opp1231 while I am still working on this, maybe you can give me some hint on how we should attach source code files? In my case I have some python code to generate data for figures, train models, which is not meant for show to the readers, but they can check in details if needed. In this case can I leave it as some python files? where should place them? also where should I place figures? I have them in a separate folders. Please comment if you see changes needed regarding these.
Content-wise, it's a dump of everything right now, need a bit more work on re-org and polishing, will do with a postdoc in my group.