perf: best of both worlds for translation memory suggestion queries#3540
perf: best of both worlds for translation memory suggestion queries#3540Jbbouille wants to merge 3 commits intotolgee:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughChanged transaction propagation for translation-memory suggestion methods to REQUIRES_NEW, added per-call PostgreSQL session settings (including statement_timeout), simplified the native SQL to use pg_trgm Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Controller
participant TranslationMemoryService
participant Database
Client->>Controller: request suggestions (key/text, pageable)
Controller->>TranslationMemoryService: getSuggestions(...)
Controller->>Controller: try/catch QueryTimeoutException
TranslationMemoryService->>Database: BEGIN (REQUIRES_NEW)
TranslationMemoryService->>Database: SET LOCAL statement_timeout = '550ms'
TranslationMemoryService->>Database: SET LOCAL pg_trgm.similarity_threshold = ...
TranslationMemoryService->>Database: execute native SELECT (WHERE text % :text ORDER BY text <-> :text)
Database-->>TranslationMemoryService: rows + count
TranslationMemoryService-->>Controller: mapped Page<TranslationMemoryItemView>
alt DB timeout / QueryTimeoutException
Controller-->>Controller: log warning
Controller-->>Client: empty Page for requested pageable
else success
Controller-->>Client: paged suggestions response
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt (1)
174-174:⚠️ Potential issue | 🟡 MinorUse
as Numberconversion for safe extraction of PostgreSQLbigintfrom native query result.The native query returns
count(*) over()(PostgreSQLbigintat position 6), which JDBC typically returns asBigIntegerorBigDecimal, notLong. The direct castas Long?will throw aClassCastExceptionat runtime.The codebase already uses the safe
(value as Number).toLong()pattern in similar contexts (seeLanguageStatsProvider.kt:69–78andOrganizationStatsService.kt:76).🛡️ Proposed fix
- val count = (queryResult.firstOrNull() as Array<*>?)?.get(6) as Long? ?: 0L + val count = ((queryResult.firstOrNull() as Array<*>?)?.get(6) as Number?)?.toLong() ?: 0L🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt` at line 174, The extraction of the total count from the native query result in TranslationMemoryService (the assignment to variable count using (queryResult.firstOrNull() as Array<*>?)?.get(6) as Long? ?: 0L) can throw ClassCastException because JDBC may return BigInteger/BigDecimal; change the cast to use Number and toLong(), e.g. get the element as Number? then call .toLong() with a null-safe fallback to 0L, mirroring the (value as Number).toLong() pattern used elsewhere.
🧹 Nitpick comments (1)
backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt (1)
130-142: MakegetSuggestionsDataprivate to prevent accidental misuse.The method has no external callers—all public consumers use the
getSuggestionsoverloads. SincegetSuggestionsDatarelies on being wrapped in aREQUIRES_NEWtransaction to isolateSET LOCALsettings, making it private eliminates the risk of future developers calling it directly and breaking isolation guarantees.Suggested change
- `@Transactional` - fun getSuggestionsData( + private fun getSuggestionsData(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt` around lines 130 - 142, getSuggestionsData is currently public but is only intended to be invoked via the REQUIRES_NEW-wrapped getSuggestions overloads because it relies on local SETs (entityManager.createNativeQuery("set local ...")). Make getSuggestionsData private (change its visibility to private) so callers cannot bypass the transaction isolation; keep the method signature and behavior unchanged and ensure any existing callers (getSuggestions overloads) still invoke the now-private getSuggestionsData.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt`:
- Line 174: The extraction of the total count from the native query result in
TranslationMemoryService (the assignment to variable count using
(queryResult.firstOrNull() as Array<*>?)?.get(6) as Long? ?: 0L) can throw
ClassCastException because JDBC may return BigInteger/BigDecimal; change the
cast to use Number and toLong(), e.g. get the element as Number? then call
.toLong() with a null-safe fallback to 0L, mirroring the (value as
Number).toLong() pattern used elsewhere.
---
Nitpick comments:
In
`@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt`:
- Around line 130-142: getSuggestionsData is currently public but is only
intended to be invoked via the REQUIRES_NEW-wrapped getSuggestions overloads
because it relies on local SETs (entityManager.createNativeQuery("set local
...")). Make getSuggestionsData private (change its visibility to private) so
callers cannot bypass the transaction isolation; keep the method signature and
behavior unchanged and ensure any existing callers (getSuggestions overloads)
still invoke the now-private getSuggestionsData.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d49b8958-87b0-49b0-87c5-3d995bc90bbf
📒 Files selected for processing (1)
backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt
bdshadow
left a comment
There was a problem hiding this comment.
Hi @Jbbouille! Thank you for the pr!
Could you please fix the TranslationSuggestionControllerTmTest. Probably, the same problem will be for suggestions/suggestions.reviewer.cy.ts and other tests in the suggestions directory
3d8d893 to
51fa1a3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt (1)
40-43: Consider extracting duplicated TM session setup into one helper.The repeated
SET LOCAL statement_timeout/pg_trgm.similarity_thresholdblock is a good candidate for a single private helper to keep values consistent across paths.Also applies to: 139-142
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt` around lines 40 - 43, Extract the duplicated session setup into a single private helper method on TranslationMemoryService (e.g., private fun applyTmSessionSettings()) that executes the two entityManager.createNativeQuery(...) calls (setting local statement_timeout and pg_trgm.similarity_threshold) and call this helper wherever the two queries are currently duplicated (the locations in methods that directly use entityManager.createNativeQuery for these settings). Ensure the helper uses the same SQL strings and transaction semantics (REQUIRES_NEW behavior is preserved by calling it in the same transactional context) and replace the duplicated blocks with a single call to applyTmSessionSettings().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt`:
- Around line 147-150: The native-query numeric results (similarity(...) and
count(*) over()) are being cast directly to Long/Float and can cause
ClassCastException across JDBC/Hibernate dialects; update the result mapping in
TranslationMemoryService.kt to first cast those values to java.lang.Number and
then call toLong()/toFloat() (e.g. val total = (row[ "totalCount"] as
Number).toLong(), val sim = (row["similarity"] as Number).toFloat()), leaving
direct Long cast for key.id unchanged. Ensure all places that read
similarity(...) and count(*) over() use this Number -> toLong()/toFloat()
pattern for robust cross-dialect behavior.
---
Nitpick comments:
In
`@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt`:
- Around line 40-43: Extract the duplicated session setup into a single private
helper method on TranslationMemoryService (e.g., private fun
applyTmSessionSettings()) that executes the two
entityManager.createNativeQuery(...) calls (setting local statement_timeout and
pg_trgm.similarity_threshold) and call this helper wherever the two queries are
currently duplicated (the locations in methods that directly use
entityManager.createNativeQuery for these settings). Ensure the helper uses the
same SQL strings and transaction semantics (REQUIRES_NEW behavior is preserved
by calling it in the same transactional context) and replace the duplicated
blocks with a single call to applyTmSessionSettings().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1dab816f-255a-4d51-827f-841be021e8fe
📒 Files selected for processing (1)
backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt
getSuggestionsData (UI path) now uses % (trigram threshold) + ORDER BY text <-> for results ordered by actual similarity, replacing the old CASE WHEN workaround. getSuggestionsList (MT pipeline) keeps % without ORDER BY so the GiST index on (language_id, text gist_trgm_ops) can stop early at LIMIT rows. Both queries are now wrapped in REQUIRES_NEW transactions with SET LOCAL statement_timeout = 550ms and pg_trgm.similarity_threshold = 0.5 to prevent slow queries from timing out and leaking transaction state into the caller. Note: I noticed tolgee#3537 too late — I had already started this work independently. Apologies for the overlap.
51fa1a3 to
a37448b
Compare
…oryService Avoid ClassCastException when PostgreSQL bigint/real columns are returned as BigInteger/BigDecimal by JDBC instead of Long/Float.
There was a problem hiding this comment.
i guess it needs Propagation.REQUIRES_NEW too, since it's public, and can be called from anywhere.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.kt (1)
137-139: Avoid stack traces on the expected timeout fallback.With
statement_timeoutacting as a normal guardrail now, logging the throwable at WARN for every timeout will get noisy fast. I'd log a concise warning here and track the rate with metrics/debug logging instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.kt` around lines 137 - 139, The catch block in TranslationSuggestionController that handles QueryTimeoutException currently logs the full throwable (log.warn("Translation memory suggestions timed out", e)) causing noisy stack traces; change it to log a concise warning without the exception (e.g., log.warn("Translation memory suggestions timed out")) and increment a metric or emit a debug log for the exception details instead of including the stack trace, then return Page.empty(pageable) as before. Ensure you modify the catch around QueryTimeoutException and keep the fallback return Page.empty(pageable).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@backend/data/src/main/kotlin/io/tolgee/service/machineTranslation/MetadataProvider.kt`:
- Around line 14-15: The logger in MetadataProvider is incorrectly using
MtTranslatorContext as the category; update the LoggerFactory call so the logger
variable (log) is created with MetadataProvider::class.java instead of
MtTranslatorContext::class.java, ensuring all log statements in MetadataProvider
are attributed to the correct class.
---
Nitpick comments:
In
`@backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.kt`:
- Around line 137-139: The catch block in TranslationSuggestionController that
handles QueryTimeoutException currently logs the full throwable
(log.warn("Translation memory suggestions timed out", e)) causing noisy stack
traces; change it to log a concise warning without the exception (e.g.,
log.warn("Translation memory suggestions timed out")) and increment a metric or
emit a debug log for the exception details instead of including the stack trace,
then return Page.empty(pageable) as before. Ensure you modify the catch around
QueryTimeoutException and keep the fallback return Page.empty(pageable).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f40e75ef-d161-4201-883e-6423c7240fa1
📒 Files selected for processing (3)
backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.ktbackend/data/src/main/kotlin/io/tolgee/service/machineTranslation/MetadataProvider.ktbackend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt
0f2abc8 to
ccdb61e
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.kt (1)
137-139: Add telemetry for the timeout fallback.This path now makes “TM query timed out” look the same as “no suggestions found” to downstream consumers. A counter or structured log field here would make degraded TM search visible without relying on warn-stacktrace volume alone.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.kt` around lines 137 - 139, The QueryTimeoutException catch in TranslationSuggestionController currently logs a warning and returns Page.empty(pageable), which hides timeouts from downstream metrics; update this handler to emit telemetry (e.g., increment a counter or record a metric) and/or add a structured field to the log so timeouts are distinguishable from "no suggestions" results—specifically, inside the catch (e: QueryTimeoutException) block, call your application's metrics/telemetry API (or MetricsRegistry) to increment a "translation_memory.query_timeout" counter and include a structured tag (e.g., timeout=true) in the log.warn call before returning Page.empty(pageable).backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt (1)
40-41: Extract the TM session tuning into one helper.The timeout and
pg_trgm.similarity_thresholdare duplicated in both query paths. Keeping them centralized will prevent the UI and batch paths from drifting during future tuning.♻️ Proposed refactor
+ private companion object { + const val TM_STATEMENT_TIMEOUT = "550ms" + const val TM_SIMILARITY_THRESHOLD = 0.5 + } + + private fun configureTmSession() { + entityManager.createNativeQuery("set local statement_timeout to '$TM_STATEMENT_TIMEOUT'").executeUpdate() + entityManager.createNativeQuery("set local pg_trgm.similarity_threshold to $TM_SIMILARITY_THRESHOLD").executeUpdate() + } + `@Transactional`(propagation = Propagation.REQUIRES_NEW) fun getSuggestionsList( baseTranslationText: String, isPlural: Boolean, keyId: Long? = null, @@ - entityManager.createNativeQuery("set local statement_timeout to '550ms'").executeUpdate() - entityManager.createNativeQuery("set local pg_trgm.similarity_threshold to 0.5").executeUpdate() + configureTmSession() @@ private fun getSuggestionsData( sourceTranslationText: String, isPlural: Boolean, keyId: Long?, @@ - entityManager.createNativeQuery("set local statement_timeout to '550ms'").executeUpdate() - entityManager.createNativeQuery("set local pg_trgm.similarity_threshold to 0.5").executeUpdate() + configureTmSession()Also applies to: 134-135
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt` around lines 40 - 41, Extract the duplicated session tuning SQL into a single helper method in TranslationMemoryService (e.g., a private setTmSessionTuning(entityManager: EntityManager) or similar) that runs the two native queries ("set local statement_timeout to '550ms'" and "set local pg_trgm.similarity_threshold to 0.5"); replace the two occurrences (the current entityManager.createNativeQuery calls around the start and the other block around lines 134-135) with calls to this helper so both the UI and batch query paths use the same centralized tuning before executing their TM queries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.kt`:
- Around line 137-139: The QueryTimeoutException catch in
TranslationSuggestionController currently logs a warning and returns
Page.empty(pageable), which hides timeouts from downstream metrics; update this
handler to emit telemetry (e.g., increment a counter or record a metric) and/or
add a structured field to the log so timeouts are distinguishable from "no
suggestions" results—specifically, inside the catch (e: QueryTimeoutException)
block, call your application's metrics/telemetry API (or MetricsRegistry) to
increment a "translation_memory.query_timeout" counter and include a structured
tag (e.g., timeout=true) in the log.warn call before returning
Page.empty(pageable).
In
`@backend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt`:
- Around line 40-41: Extract the duplicated session tuning SQL into a single
helper method in TranslationMemoryService (e.g., a private
setTmSessionTuning(entityManager: EntityManager) or similar) that runs the two
native queries ("set local statement_timeout to '550ms'" and "set local
pg_trgm.similarity_threshold to 0.5"); replace the two occurrences (the current
entityManager.createNativeQuery calls around the start and the other block
around lines 134-135) with calls to this helper so both the UI and batch query
paths use the same centralized tuning before executing their TM queries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5845aa69-3bca-4c39-a922-4f069905621d
📒 Files selected for processing (3)
backend/api/src/main/kotlin/io/tolgee/api/v2/controllers/suggestion/TranslationSuggestionController.ktbackend/data/src/main/kotlin/io/tolgee/service/machineTranslation/MetadataProvider.ktbackend/data/src/main/kotlin/io/tolgee/service/translation/TranslationMemoryService.kt
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/data/src/main/kotlin/io/tolgee/service/machineTranslation/MetadataProvider.kt
3be76f6 to
f22fa72
Compare
- Catch QueryTimeoutException outside @transactional boundaries so Spring can cleanly rollback the REQUIRES_NEW transaction before the catch runs (catching inside a @transactional risks a failed commit on the aborted PostgreSQL transaction) - Controller returns Page.empty on timeout instead of HTTP 500 - MetadataProvider returns emptyList on timeout (was already there, now with explicit import and logger) - Make getSuggestionsData private so callers cannot bypass the REQUIRES_NEW transaction isolation of getSuggestions
f22fa72 to
73a7dec
Compare
Why
Statement timeout
On large databases, the translation memory similarity search can be extremely slow — in the worst case, several minutes. Rather than leaving the query running indefinitely, we now set
statement_timeout = 550msbefore each TM query. If the timeout fires, the user simply gets no suggestions instead of having the request hang or time out at the HTTP layer. This is a deliberate trade-off: a fast no-result is better than a multi-second freeze.REQUIRES_NEW transaction
SET LOCALin PostgreSQL only scopes settings to the current transaction. Without an isolated transaction, thestatement_timeoutandpg_trgm.similarity_thresholdsettings could leak into the caller's transaction and affect unrelated queries.REQUIRES_NEWcreates a fresh transaction for each TM query, guaranteeing these settings are cleaned up when the method returns. It also ensures that aQueryTimeoutException(raised when the 550ms timeout fires) does not mark the outer transaction as rollback-only, which would cause aTransactionSystemExceptionon the caller.getSuggestionsData query improvement
The previous query used a
CASE WHENworkaround to force PostgreSQL to evaluate cheap conditions before the trigram similarity check — a hack needed because the planner was underestimating cardinality and evaluating the similarity condition first. The new query replaces this with:baseTranslation.text % :baseTranslationText— the%threshold operator, which the composite GiST index on(language_id, text gist_trgm_ops)can evaluate efficientlyORDER BY baseTranslation.text <-> :baseTranslationText— orders results by actual trigram distance so the most relevant suggestions appear firstThis produces the same result (top suggestions ordered by similarity) but leverages the GiST index properly, eliminating the full scan. On our test environment the same query went from ~6800 ms down to ~320 ms.
What
getSuggestionsData(UI suggestion panel): rewrote the query — removed theCASE WHENworkaround, replaced with%filter +ORDER BY <->. Addedstatement_timeout = 550ms. Wrapped the publicgetSuggestionsentry points inREQUIRES_NEWso theSET LOCALsettings are properly scoped (note:getSuggestionsDatais called via self-invocation so Spring AOP cannot proxy it directly —REQUIRES_NEWis applied on the publicgetSuggestionsoverloads instead).getSuggestionsList(MT/batch pipeline): addedstatement_timeout = 550ms. Kept the no-ORDER BYdesign so the GiST index can stop early atLIMITrows. Upgraded from@Transactionalto@Transactional(REQUIRES_NEW).Note: I noticed #3537 too late — I had already started this work independently. Apologies for the overlap.
Summary by CodeRabbit
Refactor
Bug Fixes / Reliability