Response to Charlotte — complete intro restructuring, background removed, personal narrative removed#5
Conversation
…response letter # Response to Reviewer Charlotte Bunne Dear Charlotte, Thank you for your detailed and constructive feedback on the thesis manuscript. I address each of your points below. --- ## 1. Personal narrative / preamble **Your comment:** "Remove the preamble and any personal narrative. A thesis introduction should be academic in tone and content." **Response:** The "Personal Motivation" section has been removed from the Introduction entirely. Personal context has been moved to the Acknowledgements section, as you suggested. The Introduction now opens directly with the scientific problem setting. --- ## 2. Structure and narrative flow **Your comment:** "The current version is fuzzy and reads like a list of loosely connected topics... without a coherent progression." **Response:** The Introduction has been fully restructured following your proposed outline: 1. Motivation and problem setting 2. Foundation models and the rationale for large-scale learning approaches (technical, non-textbook) 3. Scientific aim 4. Thesis scope and contributions (with explicit statement of what is and is not covered) 5. Chapter-by-chapter overview The "Promises of cellular biology" framing and the Feynman figure reference have been removed. --- ## 3. Basic textbook material **Your comment:** "There is no need to explain the central dogma or basic principles of gene regulation... Do not confuse Introduction and Background." **Response:** All pedagogical background material (RNA biology, central dogma, gene regulatory networks, single-cell sequencing, basic ML concepts) has been moved to a dedicated **Background chapter** (Chapter 2), which is intended for readers less familiar with either biology or machine learning. The Introduction no longer contains textbook-level content. --- ## 4. Introduction vs. Background separation **Your comment:** "For now, in fact, you have no Background/Related Work section, it seems?" **Response:** A full Background chapter has been added (chapters/background.tex), covering: (1) cell regulation and RNA biology, (2) gene regulatory networks, (3) single-cell sequencing technologies, and (4) foundational ML/AI concepts. This chapter is clearly separated from the Introduction in the thesis structure. --- ## 5. Scope alignment **Your comment:** "The introduction currently lists many topics that are not addressed in your work." **Response:** The Scope section now explicitly states what the thesis does not address (perturbation response prediction as a primary target, temporal dynamics, spatial transcriptomics as a primary modality). Any topics that appear on page 20 but are not covered have been removed or qualified. --- ## 6. Language, style, and punctuation **Your comment:** "Many spelling errors, missing full stops, sloppy language. English does not use a space before '?' or ':'." **Response:** The manuscript was reviewed with Grammarly and manually corrected. French typography habits (space before '?' and ':') have been removed. The LaTeX document class has been updated from a French thesis template to an English one, which also resolved automatic spacing insertion. The academic register has been reviewed and informal phrasing corrected throughout the Introduction. --- ## 7. Technical depth of the scFM state of the art **Your comment (via PI):** "When you talk about the state-of-the-art use a more technical tone and give details on the different models instead of just listing and citing some." **Response:** The Bio-Foundation Models section has been substantially expanded. Each model now includes: - **Architecture specifics** (attention mechanism, encoder/decoder design, tokenization strategy) - **Training dataset** (size, source, organism coverage) - **Pretraining objective** (masked token prediction, autoregressive generation, contrastive learning, etc.) - **Key benchmark results** and demonstrated capabilities - **Limitations** identified by independent evaluations Models now covered in technical depth: scBERT, Geneformer, scGPT, UCE, scFoundation. --- ## Note on versioning The substantial structural revisions described above were implemented in commits `07ff110`, `35bb09e`, `cd52704`, `d72e822`, and `0a30b9c` (February 26 – March 11, 2026). We note that your email of March 14 indicated the revisions had not been addressed — it is possible you were viewing a cached copy of the PDF, as the updated version was committed and pushed on March 11. The current branch `charlotte-response` contains all revisions described in this letter. Best regards, Jérémie Kalfon
ce6dcb2 to
6bef1c9
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the thesis materials to address reviewer Charlotte Bunne’s feedback, primarily by restructuring the Introduction and expanding the technical state-of-the-art discussion on single-cell foundation models, and by adding written correspondence artifacts (response letter, task list, and archived feedback).
Changes:
- Restructures
chapters/intro.texto remove personal narrative and substantially expand/technicalize the scFM state-of-the-art section. - Adds a point-by-point response letter (
RESPONSE_TO_CHARLOTTE.md) and an execution checklist (CHARLOTTE_TASKS.md). - Adds/archives rapporteur/reviewer correspondence (
reply_to_rapporteurs.md,charlotte_feedback.txt).
Reviewed changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| reply_to_rapporteurs.md | Adds a draft reply document; currently contains significant character-encoding corruption. |
| charlotte_feedback.txt | Adds archived email thread containing reviewer feedback (and personal email addresses/signatures). |
| chapters/intro.tex | Removes personal narrative and expands single-cell foundation model discussion with architectural/training details. |
| RESPONSE_TO_CHARLOTTE.md | Adds a structured response letter to the reviewer. |
| CHARLOTTE_TASKS.md | Adds an internal task list to track remaining changes for addressing feedback. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| Reply to rapporteurs | ||
| Valentina | ||
| A central conceptual question that would benefit from more explicit treatment concerns the relationship between the denoising training objective and the nature of the regulatory interactions captured by attention. Since scPRINT is trained to reconstruct downsampled expression profiles, the model is optimized to exploit global co-expression structure. This creates no obvious inductive bias toward direct regulatory interactions over indirect transitive paths of the form AâBâC. The manuscript does not fully address why attention weights in this setting should preferentially reflect direct regulation rather than co-expression. Given that biological interpretability of the inferred networks is a central claim, a more explicit theoretical treatment of this issue would substantially strengthen the work. | ||
| As long as steady-state expression data are used, nothing more than co-expression can be achieved, and we do not make a claim to the contrary. The goal is not to infer GRN per se, but to understand the ability of foundation models to leverage an understanding of gene relationships (albeit through co-expression patterns) to achieve their tasks and how this general understanding enables them to perform many other downstream tasks. However, we do believe that foundation models can go beyond co-expression. Indeed, using ESM3 embeddings confers knowledge of protein structure and evolutionary relationships, and using gene location provides additional information on the probability of co-regulation. working across species further provides patterns of expression not just within cells but across kingdoms. Obviously, nothing is causal yet without interventional or temporal data, and that is a point left to be worked out |
| On 23 Jan 2026, at 15:51, Jérémie Kalfon <jkobject@gmail.com> wrote: | ||
|
|
||
| Dear Charlotte, Valentina, | ||
|
|
||
| You will find, available on through this link: https://github.com/jkobject/Thesis/blob/main/main.pdf, my Ph.D. manuscript to be evaluated. |
|
|
||
| \subsection{Current Single-Cell Foundation Models and Their Limitations} | ||
| In 2023, a year after Geneformer, several additional foundation models were released. scGPT \cite{cuiScGPTBuildingFoundation2024} showcased a GPT-style architecture and presented various losses for fine-tuning. It was the first example of systematic fine-tuning in single-cell and a more in-depth benchmark across four abilities: cell type prediction, gene network inference, perturbation prediction, and batch correction. However, it did not outperform state-of-the-art methods \cite{boiarskyDeepDiveSingleCell2023, alsabbaghFoundationModelsMeet2023}. At the same time, Universal Cell Embedding (UCE) \cite{rosenUniversalCellEmbeddings2023} demonstrated cross-species training to achieve state-of-the-art cross-species cell embeddings, introducing a contrastive loss function for cell representation learning (see Figure ~\ref{fig:UCE}). | ||
| In 2023, a year after Geneformer, several additional foundation models were released. scGPT \cite{cuiScGPTBuildingFoundation2024} was the first to apply a GPT-style generative architecture to single-cell transcriptomics. Unlike its BERT-style predecessors, scGPT uses causal (unidirectional) self-attention, processing gene tokens sequentially and predicting each gene's expression conditioned on preceding genes. It was pretrained on approximately 33 million human cells from the CellxGene corpus using three objectives: autoregressive gene expression generation, masked value prediction, and a cell-level generation task. scGPT introduced the first systematic fine-tuning protocol and benchmarked across four tasks: cell-type annotation, GRN inference, perturbation prediction, and batch correction. However, independent evaluations \cite{boiarskyDeepDiveSingleCell2023, alsabbaghFoundationModelsMeet2023} demonstrated that scGPT does not consistently outperform dedicated state-of-the-art methods on any of these tasks, and that the causal attention design imposes an artificial gene ordering with no biological motivation. |
💡 Codex ReviewThesis/context_papers/Geneformer.pdf Line 1 in 6bef1c9 The file is committed as Thesis/context_papers/scGPT.pdf Line 1 in 6bef1c9 This file is also stored with a ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
… add scCello/LangCell, condense LLM section
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…ecdote, vulgarisation language from background.tex
…ponse to Charlotte with scCello/LangCell
… background content
- Section 1: thesis goals with concrete results (scPRINT-2 SOTA numbers)
- Section 2: formal GRN problem statement with math (X, G, E, W notation),
detailed method descriptions (GENIE3/GRNBoost2, pySCENIC, PIDC, SCODE),
comparative table, benchmarking section (BEELINE, SERGIO, GeneRNI, BenGRN),
metrics (AUROC, AUPRC, EPR with formulas), ground truths (OmniPath, ENCODE,
perturb-seq, MCalla intersection)
- Section 3: transformer self-attention math, masked gene modeling objective,
encoding challenges, efficient attention (Flash Attention, Performer, criss-cross),
bio FMs (ESM2, AlphaFold2, Nucleotide Transformer), detailed scFM reviews
(scBERT, Geneformer, scGPT, UCE, scFoundation), brief (scCello, LangCell)
- Section 4: contributions chapters kept verbatim from previous intro
- main.tex: remove \input{chapters/background} and its associated header/counter
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e academic acknowledgements
1. Table 1: reduced font size (scriptsize) and text length 2. Existing benchmarks: mention only BEELINE, moved simulated data tools here 3. Ground truths: removed intersections, added simulated expression & gene networks 4. Foundation models section: restructured to start general (transformers/vision/NLP) → biology → scFMs 5. UCE: removed ESM2 embeddings requirement sentence 6. Removed 'Additional models' and 'Key bottlenecks' sections 7. Geneformer: detailed why reported comparisons failed (Boiarsky 2023 findings)
- Added Dosovitskiy 2021 (Vision Transformer) citation - Added Schaffter 2011 (GeneNetWeaver) citation - Added API glossary entry
Summary
This PR addresses all 7 points from Charlotte Bunne feedback (24 Feb + 14 Mar 2026 follow-up).
Key changes (latest — 16 Mar 2026)
chapters/intro.tex(231 → 1000 lines)chapters/background.texremoved frommain.texauxiliaries/background.tex— Personal Motivation → AcknowledgementsCharlotte 7 points addressed