Skip to content

Merge duplicate screenshot references during key import#3521

Open
Anty0 wants to merge 1 commit intomainfrom
claude/merge-screenshot-metadata-e2uXd
Open

Merge duplicate screenshot references during key import#3521
Anty0 wants to merge 1 commit intomainfrom
claude/merge-screenshot-metadata-e2uXd

Conversation

@Anty0
Copy link
Copy Markdown
Collaborator

@Anty0 Anty0 commented Mar 11, 2026

Summary

This change improves the key import process to properly handle duplicate screenshot references by merging them instead of skipping them. When the same key is associated with the same screenshot multiple times during import, the positions and text are now combined into a single reference.

Key Changes

  • Replaced addedReferences Set with mergedReferences LinkedHashMap to track and merge duplicate references
  • Introduced MergedReferenceData data class to store merged reference information including the key, screenshot result, and combined metadata
  • Modified duplicate detection logic to merge positions and text from duplicate references instead of silently skipping them
  • Deferred screenshotService.addReference() calls until after all references are processed and merged, ensuring only the final merged references are added to the database

Implementation Details

  • When a duplicate reference is detected (same key-screenshot pair), the new positions are appended to existing positions and text is preserved (preferring the first non-null value)
  • Uses LinkedHashMap to maintain insertion order for consistent processing
  • All merged references are batch-processed after the import loop completes, improving efficiency and ensuring data consistency

https://claude.ai/code/session_01EpxKwLXnkjPV52rPvE6mxG

Summary by CodeRabbit

Release Notes

  • Chores
    • Optimized key import process to handle duplicate references more efficiently during bulk operations, improving overall import performance.

When the same key+screenshot pair appears multiple times during import,
merge their positions and text instead of silently dropping duplicates.
This preserves alternative key positions in the same screenshot image.

The approach collects and merges all ScreenshotInfoDto entries per unique
(key, screenshot) pair before persisting, ensuring no metadata is lost.

https://claude.ai/code/session_01EpxKwLXnkjPV52rPvE6mxG
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

This PR refactors the reference deduplication mechanism in ResolvingKeyImporter.kt by replacing immediate insertion calls with a deferred merge-and-batch pattern. References are accumulated in a map, duplicates are merged by combining positions, and insertions occur after all input processing completes.

Changes

Cohort / File(s) Summary
Reference deduplication refactor
backend/data/src/main/kotlin/io/tolgee/service/key/ResolvingKeyImporter.kt
Replaced per-screenshot reference dedup with merged-reference accumulation. Introduces local data class and mergedReferences map to collect duplicates by composite key, merges positions for matching key/screenshot pairs, and defers all insertions until after input processing completes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • tolgee/tolgee-platform#3520: Modifies the same file to handle duplicate (key, screenshot) pairs during import—addresses the same deduplication challenge using a different strategy (skip-tracking vs. merge-and-defer).

Suggested reviewers

  • bdshadow
  • JanCizmar

Poem

🐰 Whiskers twitching with delight,
Merging references left and right,
No more duplicates running free,
Batch them up, then plant the tree! 🌿✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Merge duplicate screenshot references during key import' directly and clearly summarizes the main change in the changeset, which replaces per-screenshot reference dedup logic with a merged-reference mechanism.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/merge-screenshot-metadata-e2uXd

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/data/src/main/kotlin/io/tolgee/service/key/ResolvingKeyImporter.kt (1)

228-248: Add explicit label for return@forEach to improve clarity.

With nested forEach loops (outer at line 228, inner at line 230), return@forEach at line 247 implicitly returns from the innermost forEach. While this is correct behavior, an explicit label would make the intent clearer.

📝 Suggested improvement
-      it.screenshots?.forEach { screenshot ->
+      it.screenshots?.forEach screenshots@{ screenshot ->
         ...
-          return@forEach
+          return@screenshots
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/data/src/main/kotlin/io/tolgee/service/key/ResolvingKeyImporter.kt`
around lines 228 - 248, The nested forEach uses an unlabeled return@forEach
which is ambiguous; add an explicit label to the outer loop (e.g.
keysToImport.forEach outer@ { ... }) and change the inner early-return to use
that label (return@outer) so it is clear you are returning from the outer
keysToImport.forEach rather than the inner it.screenshots?.forEach; update
references around getOrCreateKey, createdScreenshots and mergedReferences
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/data/src/main/kotlin/io/tolgee/service/key/ResolvingKeyImporter.kt`:
- Around line 228-248: The nested forEach uses an unlabeled return@forEach which
is ambiguous; add an explicit label to the outer loop (e.g. keysToImport.forEach
outer@ { ... }) and change the inner early-return to use that label
(return@outer) so it is clear you are returning from the outer
keysToImport.forEach rather than the inner it.screenshots?.forEach; update
references around getOrCreateKey, createdScreenshots and mergedReferences
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d508d82c-a0d7-4b24-b1de-af5af8d15049

📥 Commits

Reviewing files that changed from the base of the PR and between 89813a6 and ddc1d92.

📒 Files selected for processing (1)
  • backend/data/src/main/kotlin/io/tolgee/service/key/ResolvingKeyImporter.kt

@Anty0 Anty0 requested a review from JanCizmar March 11, 2026 11:59
@JanCizmar
Copy link
Copy Markdown
Member

Test would be nice.

@Anty0
Copy link
Copy Markdown
Collaborator Author

Anty0 commented Mar 13, 2026

@JanCizmar Good point, I'll create some tests in cooldown ^^

@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants