-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](fe) move some variables from Replica to LocalReplica which are not used in CloudReplica to reduce memory #59460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the Replica and Tablet class hierarchy to reduce memory usage in cloud deployments by moving local-storage-specific fields from the base classes to new LocalReplica and LocalTablet subclasses. This prevents CloudReplica and CloudTablet instances from storing unused fields related to local storage features like cooldown configurations and remote data sizes.
Key changes:
- Created
LocalReplicaextendingReplicawith fields:remoteDataSize,remoteInvertedIndexSize,remoteSegmentSize,cooldownMetaId, andcooldownTerm - Created
LocalTabletextendingTabletwith cooldown configuration fields - Updated base classes to return default values (0, -1, null) and throw
UnsupportedOperationExceptionfor local-specific setters - Updated serialization in
GsonUtilsto support both local and cloud subtypes with backward compatibility - Updated all production and test code to use
LocalReplicaandLocalTabletwhere appropriate
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
LocalReplica.java |
New subclass containing local-storage-specific fields moved from Replica |
LocalTablet.java |
New subclass containing cooldown configuration fields moved from Tablet |
Replica.java |
Base class refactored to return defaults and throw exceptions for local-specific operations |
Tablet.java |
Base class refactored with cooldown fields removed |
GsonUtils.java |
Updated type adapters for serialization with backward compatibility |
EnvFactory.java |
Updated factory to create LocalReplica/LocalTablet in non-cloud mode |
| Production code files | Updated to cast to LocalTablet/LocalReplica where cooldown features are used |
| Test files | Updated to use LocalReplica and LocalTablet instead of base classes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
| // return replica with max remoteDataSize | ||
| return replicas.stream().max(Comparator.comparing(Replica::getRemoteDataSize)).get().getRemoteDataSize(); |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential NullPointerException when replicas list is empty. The method calls .get() on an Optional without checking if it's present first. If the replicas list is empty, stream().max() will return an empty Optional, and calling .get() will throw a NoSuchElementException. Consider using .orElse(0L) or checking if replicas is non-empty before this operation.
| return replicas.stream().max(Comparator.comparing(Replica::getRemoteDataSize)).get().getRemoteDataSize(); | |
| return replicas.stream().mapToLong(Replica::getRemoteDataSize).max().orElse(0L); |
| LocalTablet tablet = (LocalTablet) getTablet(conf); | ||
| if (tablet != null) { | ||
| tablet.setCooldownConf(conf.cooldownReplicaId, conf.cooldownTerm); |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe cast to LocalTablet without type checking. The code casts the result of getTablet() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.
| LocalTablet tablet = (LocalTablet) getTablet(conf); | |
| if (tablet != null) { | |
| tablet.setCooldownConf(conf.cooldownReplicaId, conf.cooldownTerm); | |
| Tablet tablet = getTablet(conf); | |
| if (tablet instanceof LocalTablet) { | |
| LocalTablet localTablet = (LocalTablet) tablet; | |
| localTablet.setCooldownConf(conf.cooldownReplicaId, conf.cooldownTerm); | |
| } else if (tablet != null) { | |
| LOG.warn("skip setting cooldown conf for non-local tablet. tabletId={}", conf.tabletId); |
| // replica has cooldowned data | ||
| do { | ||
| Pair<Long, Long> cooldownConf = tablet.getCooldownConf(); | ||
| Pair<Long, Long> cooldownConf = ((LocalTablet) tablet).getCooldownConf(); |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe cast to LocalTablet without type checking. The code casts tablet directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.
| public void setRemoteInvertedIndexSize(long remoteInvertedIndexSize) { | ||
| if (remoteInvertedIndexSize > 0) { | ||
| throw new UnsupportedOperationException("setRemoteInvertedIndexSize is not supported in Replica"); | ||
| } | ||
| } | ||
|
|
||
| public Long getRemoteSegmentSize() { | ||
| return 0L; | ||
| } | ||
|
|
||
| public void setRemoteSegmentSize(long remoteSegmentSize) { | ||
| if (remoteSegmentSize > 0) { | ||
| throw new UnsupportedOperationException("setRemoteSegmentSize is not supported in Replica"); | ||
| } | ||
| } |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing UnsupportedOperationException in setter may break existing code paths. Similar to setRemoteDataSize, these methods throw UnsupportedOperationException but may be called from existing code paths. Verify that all call sites properly handle this exception or ensure these methods are never called on base Replica instances.
| conf.setCooldownReplicaId(replicas.get(index).getId()); | ||
| // find TabletMeta to get cooldown term | ||
| Tablet tablet = getTablet(conf); | ||
| LocalTablet tablet = (LocalTablet) getTablet(conf); |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe cast to LocalTablet without type checking. The code casts the result of getTablet() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.
| // update Tablet | ||
| for (CooldownConf conf : updatedConf) { | ||
| Tablet tablet = tabletMap.get(conf.tabletId); | ||
| LocalTablet tablet = (LocalTablet) tabletMap.get(conf.tabletId); |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe cast to LocalTablet without type checking. The code casts the result from tabletMap.get() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.
| tablet = (LocalTablet) table.getPartition(tabletMeta.getPartitionId()).getIndex(tabletMeta.getIndexId()) | ||
| .getTablet(beTabletInfo.tablet_id); |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe cast to LocalTablet without type checking. The code casts the result of getTablet() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.
| tablet = (LocalTablet) table.getPartition(tabletMeta.getPartitionId()).getIndex(tabletMeta.getIndexId()) | |
| .getTablet(beTabletInfo.tablet_id); | |
| Tablet baseTablet = table.getPartition(tabletMeta.getPartitionId()) | |
| .getIndex(tabletMeta.getIndexId()) | |
| .getTablet(beTabletInfo.tablet_id); | |
| if (!(baseTablet instanceof LocalTablet)) { | |
| LOG.warn("tablet {} is not LocalTablet, skip handling cooldown conf", beTabletInfo.tablet_id); | |
| return; | |
| } | |
| tablet = (LocalTablet) baseTablet; |
| LOG.warn("tablet {} not found", info.tablet_id); | ||
| return; | ||
| } | ||
| // check cooldownReplicaId |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe cast to LocalTablet without type checking. The code casts tablet directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.
| // check cooldownReplicaId | |
| // check cooldownReplicaId | |
| if (!(tablet instanceof LocalTablet)) { | |
| LOG.warn("tablet {} is not LocalTablet, skip confirming unused remote files", info.tablet_id); | |
| return; | |
| } |
|
run buildall |
TPC-H: Total hot run time: 35123 ms |
TPC-DS: Total hot run time: 173953 ms |
ClickBench: Total hot run time: 26.79 s |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
this is is based on #59327
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)