-
Notifications
You must be signed in to change notification settings - Fork 31
metrics: add ddl count for different ddl type #3833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant improvement to the DDL execution monitoring within TiCDC by adding the specific DDL type as a label to the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a ddl_type label to the ticdc_sink_ddl_exec_running metric, enhancing DDL monitoring by providing more granular data. The changes are propagated through various sinks and the corresponding Grafana dashboards have been updated. While the overall approach is sound, I've identified a couple of issues. There's a bug in the Kafka and Pulsar sinks where the wrong DDL type is used for metrics in multi-table DDL scenarios. Additionally, there's a potential memory leak due to the new dynamic metric labels not being cleaned up upon changefeed removal. My review includes suggestions to fix these issues.
|
/gemini summary |
Summary of ChangesThis pull request significantly enhances the monitoring capabilities for DDL (Data Definition Language) operations within TiCDC. By introducing a mechanism to track DDL executions based on their specific type, it moves beyond generic DDL counts to provide a more granular view of database schema changes. This improvement offers deeper insights into the nature and frequency of DDL activities across various sinks, which is crucial for better operational visibility, performance analysis, and troubleshooting. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
|
|
/gemini summary |
Summary of ChangesThis pull request significantly enhances the monitoring capabilities for DDL (Data Definition Language) operations within TiCDC. By introducing a mechanism to track DDL executions based on their specific type, it moves beyond generic DDL counts to provide a more granular view of database schema changes. This improvement offers deeper insights into the nature and frequency of DDL activities across various sinks, which is crucial for better operational visibility, performance analysis, and troubleshooting. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances DDL metrics by adding a counter for different DDL types. The changes are well-structured, including refactoring DDL metrics into a dedicated file (pkg/metrics/ddl.go) and updating various sink implementations to report the DDL type. This improves the observability of DDL operations. I've identified a couple of issues in pkg/metrics/statistics.go: a minor comment typo and a more significant issue with metric cleanup in the Close function that could lead to metric leaks. I've provided suggestions to address these. Overall, this is a valuable improvement.
| ExecutionErrorCounter.DeleteLabelValues(keyspace, changefeedID) | ||
| ExecutionErrorCounter.DeleteLabelValues(keyspace, changefeedID, "ddl") | ||
| ExecutionErrorCounter.DeleteLabelValues(keyspace, changefeedID, "dml") | ||
| ExecDDLCounter.DeleteLabelValues(keyspace, changefeedID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call to DeleteLabelValues is incorrect because ExecDDLCounter has three labels (keyspace, changefeed, ddl_type), but only two are provided. This will result in a no-op, and metrics will leak after a changefeed is closed.
To remove all metrics for a given keyspace and changefeed from a MetricVec, you should use Remove with a label map.
Additionally, several other calls to DeleteLabelValues in this function are also incorrect for the same reason (wrong number of label values for ExecBatchHistogram, ExecBatchWriteBytesHistogram, and TotalWriteBytesCounter). This is a good opportunity to fix them all by providing the b.sinkType as the third argument for each.
| ExecDDLCounter.DeleteLabelValues(keyspace, changefeedID) | |
| ExecDDLCounter.Remove(prometheus.Labels{getKeyspaceLabel(): keyspace, "changefeed": changefeedID}) |
| metricExecErrCnt prometheus.Counter | ||
| // metricExecErrCntForDDL records the error count of the Sink for DDL. | ||
| metricExecErrCntForDDL prometheus.Counter | ||
| // metricExecErrCntForDDL records the error count of the Sink for DML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/gemini summary |
Summary of ChangesThis pull request significantly enhances the monitoring capabilities for DDL (Data Definition Language) operations within TiCDC by introducing granular tracking of DDL execution counts based on their specific types. This provides a more detailed view of database schema changes, improving operational visibility, performance analysis, and troubleshooting. The changes involve refactoring DDL-related metrics into a dedicated file, separating DDL and DML error counters, and updating various sink implementations to correctly report DDL types. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
|
What problem does this PR solve?
Issue Number: ref #1584
What is changed and how it works?
Granular DDL Metrics: Introduced a new Prometheus counter, "ExecDDLCounter", to track the execution count of different DDL types, providing more detailed insights into DDL operations.
DDL Type Parameterization: The "RecordDDLExecution" function now accepts a "ddlType" string, enabling specific DDL operation types to be recorded and differentiated in metrics across various sink implementations.
Metrics Refactoring: DDL-related metrics ("ExecDDLHistogram", "ExecDDLRunningGauge") were moved from "pkg/metrics/sink.go" to a new "pkg/metrics/ddl.go" file, centralizing DDL metric definitions and introducing a new "ExecDDLBlockingGauge".
Error Counter Separation: The "ExecutionErrorCounter" in "pkg/metrics/statistics.go" was refined to differentiate between DDL and DML execution errors, providing more specific error tracking.
Barrier Map Renaming: Renamed the internal RangeWoLock method to RangeWithoutLock in BlockedEventMap for improved clarity and added a Len() method to it.
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note