Skip to content

Conversation

@mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Dec 29, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

this is is based on #59327

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings December 29, 2025 10:25
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Replica and Tablet class hierarchy to reduce memory usage in cloud deployments by moving local-storage-specific fields from the base classes to new LocalReplica and LocalTablet subclasses. This prevents CloudReplica and CloudTablet instances from storing unused fields related to local storage features like cooldown configurations and remote data sizes.

Key changes:

  • Created LocalReplica extending Replica with fields: remoteDataSize, remoteInvertedIndexSize, remoteSegmentSize, cooldownMetaId, and cooldownTerm
  • Created LocalTablet extending Tablet with cooldown configuration fields
  • Updated base classes to return default values (0, -1, null) and throw UnsupportedOperationException for local-specific setters
  • Updated serialization in GsonUtils to support both local and cloud subtypes with backward compatibility
  • Updated all production and test code to use LocalReplica and LocalTablet where appropriate

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
LocalReplica.java New subclass containing local-storage-specific fields moved from Replica
LocalTablet.java New subclass containing cooldown configuration fields moved from Tablet
Replica.java Base class refactored to return defaults and throw exceptions for local-specific operations
Tablet.java Base class refactored with cooldown fields removed
GsonUtils.java Updated type adapters for serialization with backward compatibility
EnvFactory.java Updated factory to create LocalReplica/LocalTablet in non-cloud mode
Production code files Updated to cast to LocalTablet/LocalReplica where cooldown features are used
Test files Updated to use LocalReplica and LocalTablet instead of base classes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
}
// return replica with max remoteDataSize
return replicas.stream().max(Comparator.comparing(Replica::getRemoteDataSize)).get().getRemoteDataSize();
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException when replicas list is empty. The method calls .get() on an Optional without checking if it's present first. If the replicas list is empty, stream().max() will return an empty Optional, and calling .get() will throw a NoSuchElementException. Consider using .orElse(0L) or checking if replicas is non-empty before this operation.

Suggested change
return replicas.stream().max(Comparator.comparing(Replica::getRemoteDataSize)).get().getRemoteDataSize();
return replicas.stream().mapToLong(Replica::getRemoteDataSize).max().orElse(0L);

Copilot uses AI. Check for mistakes.
Comment on lines +139 to 141
LocalTablet tablet = (LocalTablet) getTablet(conf);
if (tablet != null) {
tablet.setCooldownConf(conf.cooldownReplicaId, conf.cooldownTerm);
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe cast to LocalTablet without type checking. The code casts the result of getTablet() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.

Suggested change
LocalTablet tablet = (LocalTablet) getTablet(conf);
if (tablet != null) {
tablet.setCooldownConf(conf.cooldownReplicaId, conf.cooldownTerm);
Tablet tablet = getTablet(conf);
if (tablet instanceof LocalTablet) {
LocalTablet localTablet = (LocalTablet) tablet;
localTablet.setCooldownConf(conf.cooldownReplicaId, conf.cooldownTerm);
} else if (tablet != null) {
LOG.warn("skip setting cooldown conf for non-local tablet. tabletId={}", conf.tabletId);

Copilot uses AI. Check for mistakes.
// replica has cooldowned data
do {
Pair<Long, Long> cooldownConf = tablet.getCooldownConf();
Pair<Long, Long> cooldownConf = ((LocalTablet) tablet).getCooldownConf();
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe cast to LocalTablet without type checking. The code casts tablet directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.

Copilot uses AI. Check for mistakes.
Comment on lines +291 to 305
public void setRemoteInvertedIndexSize(long remoteInvertedIndexSize) {
if (remoteInvertedIndexSize > 0) {
throw new UnsupportedOperationException("setRemoteInvertedIndexSize is not supported in Replica");
}
}

public Long getRemoteSegmentSize() {
return 0L;
}

public void setRemoteSegmentSize(long remoteSegmentSize) {
if (remoteSegmentSize > 0) {
throw new UnsupportedOperationException("setRemoteSegmentSize is not supported in Replica");
}
}
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing UnsupportedOperationException in setter may break existing code paths. Similar to setRemoteDataSize, these methods throw UnsupportedOperationException but may be called from existing code paths. Verify that all call sites properly handle this exception or ensure these methods are never called on base Replica instances.

Copilot uses AI. Check for mistakes.
conf.setCooldownReplicaId(replicas.get(index).getId());
// find TabletMeta to get cooldown term
Tablet tablet = getTablet(conf);
LocalTablet tablet = (LocalTablet) getTablet(conf);
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe cast to LocalTablet without type checking. The code casts the result of getTablet() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.

Copilot uses AI. Check for mistakes.
// update Tablet
for (CooldownConf conf : updatedConf) {
Tablet tablet = tabletMap.get(conf.tabletId);
LocalTablet tablet = (LocalTablet) tabletMap.get(conf.tabletId);
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe cast to LocalTablet without type checking. The code casts the result from tabletMap.get() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.

Copilot uses AI. Check for mistakes.
Comment on lines +682 to 683
tablet = (LocalTablet) table.getPartition(tabletMeta.getPartitionId()).getIndex(tabletMeta.getIndexId())
.getTablet(beTabletInfo.tablet_id);
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe cast to LocalTablet without type checking. The code casts the result of getTablet() directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.

Suggested change
tablet = (LocalTablet) table.getPartition(tabletMeta.getPartitionId()).getIndex(tabletMeta.getIndexId())
.getTablet(beTabletInfo.tablet_id);
Tablet baseTablet = table.getPartition(tabletMeta.getPartitionId())
.getIndex(tabletMeta.getIndexId())
.getTablet(beTabletInfo.tablet_id);
if (!(baseTablet instanceof LocalTablet)) {
LOG.warn("tablet {} is not LocalTablet, skip handling cooldown conf", beTabletInfo.tablet_id);
return;
}
tablet = (LocalTablet) baseTablet;

Copilot uses AI. Check for mistakes.
LOG.warn("tablet {} not found", info.tablet_id);
return;
}
// check cooldownReplicaId
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe cast to LocalTablet without type checking. The code casts tablet directly to LocalTablet without verifying the type first. If running in cloud mode with CloudTablets, this will throw a ClassCastException. Consider adding an instanceof check before casting or verifying that this code path is never executed in cloud mode.

Suggested change
// check cooldownReplicaId
// check cooldownReplicaId
if (!(tablet instanceof LocalTablet)) {
LOG.warn("tablet {} is not LocalTablet, skip confirming unused remote files", info.tablet_id);
return;
}

Copilot uses AI. Check for mistakes.
@mymeiyi
Copy link
Contributor Author

mymeiyi commented Dec 30, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35123 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit df9c02e2cf8842cc1c69ea6736b802ffaf773f64, data reload: false

------ Round 1 ----------------------------------
q1	17650	4256	4030	4030
q2	2035	373	231	231
q3	10165	1273	739	739
q4	10224	918	327	327
q5	7531	2169	1908	1908
q6	189	177	139	139
q7	956	776	655	655
q8	9267	1456	1126	1126
q9	6901	5163	5224	5163
q10	6824	1809	1408	1408
q11	500	294	292	292
q12	732	742	609	609
q13	17805	3836	3103	3103
q14	287	297	270	270
q15	588	523	503	503
q16	705	696	623	623
q17	692	713	619	619
q18	7353	7341	8228	7341
q19	1296	1001	628	628
q20	435	399	261	261
q21	4488	4277	4135	4135
q22	1163	1094	1013	1013
Total cold run time: 107786 ms
Total hot run time: 35123 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4298	4261	4237	4237
q2	360	399	332	332
q3	2300	2946	2420	2420
q4	1407	1835	1431	1431
q5	4419	4400	4249	4249
q6	226	165	132	132
q7	1935	1899	1968	1899
q8	2503	2350	2357	2350
q9	7170	7166	6884	6884
q10	2572	2719	2171	2171
q11	538	450	435	435
q12	661	683	584	584
q13	3328	3818	3095	3095
q14	262	278	256	256
q15	531	497	485	485
q16	618	647	613	613
q17	1086	1235	1270	1235
q18	7527	7384	7106	7106
q19	836	830	845	830
q20	1896	1943	1790	1790
q21	4483	4188	4080	4080
q22	1106	1064	998	998
Total cold run time: 50062 ms
Total hot run time: 47612 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173953 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit df9c02e2cf8842cc1c69ea6736b802ffaf773f64, data reload: false

query5	4834	598	445	445
query6	329	234	204	204
query7	4206	460	272	272
query8	332	241	230	230
query9	8793	2600	2608	2600
query10	518	387	310	310
query11	15243	15032	14771	14771
query12	177	112	112	112
query13	1261	508	381	381
query14	6220	2956	2758	2758
query14_1	2680	2613	2650	2613
query15	199	197	173	173
query16	1046	470	472	470
query17	1053	666	575	575
query18	2470	429	331	331
query19	228	217	206	206
query20	123	116	115	115
query21	218	142	119	119
query22	3792	3891	3867	3867
query23	16047	15523	15308	15308
query23_1	15433	15550	15433	15433
query24	7439	1564	1193	1193
query24_1	1245	1198	1224	1198
query25	554	469	431	431
query26	1234	269	154	154
query27	2773	447	295	295
query28	4578	2157	2171	2157
query29	805	525	425	425
query30	311	232	215	215
query31	743	623	551	551
query32	78	76	67	67
query33	524	339	282	282
query34	927	892	541	541
query35	755	797	726	726
query36	878	877	826	826
query37	133	88	75	75
query38	2708	2684	2703	2684
query39	779	756	730	730
query39_1	699	721	736	721
query40	220	131	118	118
query41	69	69	62	62
query42	114	113	106	106
query43	475	450	454	450
query44	1322	754	748	748
query45	184	181	178	178
query46	872	956	610	610
query47	1416	1440	1370	1370
query48	307	324	266	266
query49	602	418	327	327
query50	637	280	217	217
query51	3810	3733	3786	3733
query52	106	109	98	98
query53	301	328	283	283
query54	289	259	250	250
query55	77	75	76	75
query56	285	294	281	281
query57	1049	1028	906	906
query58	261	260	269	260
query59	2019	2169	2142	2142
query60	330	332	312	312
query61	196	190	188	188
query62	386	364	319	319
query63	305	271	274	271
query64	5247	1449	969	969
query65	3771	3664	3706	3664
query66	1423	449	312	312
query67	15033	14675	14791	14675
query68	3804	1043	749	749
query69	491	362	311	311
query70	1099	819	870	819
query71	367	299	280	280
query72	6125	4718	4872	4718
query73	680	589	314	314
query74	8737	8783	8665	8665
query75	2867	2881	2506	2506
query76	3905	1084	663	663
query77	513	368	284	284
query78	9676	9784	9165	9165
query79	977	909	601	601
query80	823	574	478	478
query81	516	264	229	229
query82	210	145	113	113
query83	363	259	248	248
query84	255	123	105	105
query85	1046	511	458	458
query86	363	290	320	290
query87	2904	2890	2814	2814
query88	3142	2275	2276	2275
query89	394	353	324	324
query90	1802	148	145	145
query91	174	179	142	142
query92	69	69	64	64
query93	957	932	564	564
query94	569	315	254	254
query95	566	381	310	310
query96	595	484	210	210
query97	2350	2358	2278	2278
query98	205	196	195	195
query99	579	580	511	511
Total cold run time: 249715 ms
Total hot run time: 173953 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.79 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit df9c02e2cf8842cc1c69ea6736b802ffaf773f64, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.05
query3	0.25	0.09	0.08
query4	1.61	0.12	0.12
query5	0.26	0.27	0.26
query6	1.16	0.67	0.65
query7	0.04	0.03	0.03
query8	0.06	0.04	0.05
query9	0.57	0.49	0.50
query10	0.54	0.55	0.55
query11	0.16	0.11	0.11
query12	0.16	0.12	0.13
query13	0.61	0.61	0.59
query14	1.00	0.99	1.00
query15	0.81	0.79	0.82
query16	0.42	0.38	0.40
query17	1.04	1.05	1.03
query18	0.24	0.22	0.21
query19	1.88	1.87	1.81
query20	0.02	0.01	0.01
query21	15.47	0.29	0.14
query22	4.68	0.05	0.05
query23	15.99	0.26	0.10
query24	1.21	0.85	0.22
query25	0.11	0.07	0.06
query26	0.14	0.13	0.13
query27	0.06	0.06	0.06
query28	4.60	1.04	0.88
query29	12.61	3.88	3.14
query30	0.28	0.14	0.11
query31	2.81	0.60	0.39
query32	3.23	0.55	0.47
query33	2.93	3.02	2.96
query34	16.70	5.14	4.47
query35	4.47	4.47	4.43
query36	0.67	0.50	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.16	0.14	0.12
query41	0.08	0.04	0.03
query42	0.05	0.03	0.02
query43	0.04	0.04	0.03
Total cold run time: 97.51 s
Total hot run time: 26.79 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants