Skip to content

Conversation

@tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Jan 13, 2026

Motivation and Context

Was encountering an error at the line

standard = input[PEPTIDE == peptide_name, ]

If a reference peptide has multiple precursors, then this code will fail. I also noticed the for loop associated with this code is also heavily inefficient and can be vectorized

Changes

  • Enable a reference peptide to have multiple precursors
  • Vectorize for loop for aggregating data for reference peptides

Testing

  • Added unit tests verifying basic global reference normalization functionality
  • Unit tests also ensure that a reference peptide can have multiple precursors.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

Summary by CodeRabbit

  • New Features

    • Normalization now accepts an explicit standards parameter for data-driven standard handling.
  • Bug Fixes

    • Improved validation with descriptive error messages for missing standards.
    • More robust, consolidated calculation of standard means to ensure consistent normalization.
  • Tests

    • Added tests covering global standards normalization scenarios to ensure correctness.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 55 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Refactors global-standards normalization to a data-driven aggregation: joins input with peptide dictionary, assigns a new standard column, validates presence of requested standards (errors if missing), and computes mean abundances via grouped aggregation. Adds tests for normalization behavior.

Changes

Cohort / File(s) Summary
Core normalization logic
R/utils_normalize.R
Replaced iterative per-standard loop with a data-driven pipeline: merge input with peptides_dict, assign standard (PeptideSequence or PROTEIN), validate requested standards (error on missing), compute mean_abundance by RUN and standard, reshape and merge results. Also updated MSstatsNormalize signature to accept standards = NULL.
Test suite
inst/tinytest/test_utils_normalize.R
Added new tests and helpers: builds peptide dictionary and synthetic input, and two tests validating normalization outcomes for differing group intensities and alternating intensities within fractions (invokes MSstats:::.normalizeGlobalStandards).

Sequence Diagram(s)

sequenceDiagram
  participant Caller as "MSstatsNormalize"
  participant NG as ".normalizeGlobalStandards"
  participant Dict as "peptides_dict"
  participant Data as "input data"
  participant Agg as "Aggregation (mean_abundance)"

  Caller->>NG: call with input, peptides_dict, standards
  NG->>Data: merge with Dict to form input_with_peptides
  NG->>Dict: lookup standards -> identify matches/missing
  alt missing standards found
    NG-->>Caller: throw error (list missing standards)
  else all standards present
    NG->>Agg: group by RUN & standard, compute mean_abundance
    Agg-->>NG: mean_abundance table (wide -> long reshape)
    NG-->>Caller: return normalized data (merged results)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰✨ I hopped through rows and joined each line,
Found standards hiding, named them fine,
I averaged runs with careful cheer,
And when some went missing, I spoke clear —
Tests clap paws: the rabbits dance, data's here! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: enabling multiple precursors for reference peptides in global standards normalization.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description covers motivation, changes, testing, and completes the checklist, aligning well with the template structure and requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

Failed to generate code suggestions for PR

@tonywu1999 tonywu1999 marked this pull request as ready for review January 29, 2026 20:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
R/utils_normalize.R (1)

236-240: Duplicate log messages in both branches.

Both the single-fraction and multi-fraction cases emit identical messages. Compare with .normalizeQuantile (lines 124-128), which correctly differentiates between them.

📝 Proposed fix
     if (data.table::uniqueN(input$FRACTION) == 1L) {
         msg = "Normalization : normalization with global standards protein - okay"
     } else {
-        msg = "Normalization : normalization with global standards protein - okay"
+        msg = "Normalization : normalization with global standards protein per fraction - okay"
     }
🧹 Nitpick comments (1)
R/utils_normalize.R (1)

197-203: Dead code: means_by_standard initialization is overwritten.

Line 197 initializes means_by_standard = unique(input[, list(RUN)]), but this value is never used—it's completely replaced at lines 215-221 by the aggregation result. This leftover from the previous implementation should be removed.

🧹 Remove dead code
     proteins = as.character(unique(input$PROTEIN))
-    means_by_standard = unique(input[, list(RUN)])
     input_with_peptides <- merge(input, peptides_dict, by = "PEPTIDE", all.x = TRUE)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
R/utils_normalize.R (1)

235-239: Duplicate log messages in conditional branches.

Both branches produce the identical message, making the condition ineffective. If different messages were intended for single vs. multiple fractions, update accordingly.

🔧 Suggested fix (example differentiation)
     if (data.table::uniqueN(input$FRACTION) == 1L) {
         msg = "Normalization : normalization with global standards protein - okay"
     } else {
-        msg = "Normalization : normalization with global standards protein - okay"
+        msg = "Normalization : normalization with global standards protein per fraction - okay"
     }
🧹 Nitpick comments (1)
R/utils_normalize.R (1)

196-196: Unused variable proteins.

The variable proteins is assigned but never referenced in this function. Since the commit mentions "remove dead code," consider removing this line.

🔧 Suggested fix
-    proteins = as.character(unique(input$PROTEIN))

@tonywu1999 tonywu1999 merged commit de817e1 into devel Jan 29, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the fix-standards branch January 29, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants