Add unit/integration test for ingestion pipeline#474
Add unit/integration test for ingestion pipeline#474vish-cs merged 1 commit intodatacommonsorg:masterfrom
Conversation
Summary of ChangesHello @vish-cs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the testing framework for the ingestion pipeline by introducing dedicated unit and integration tests. The integration tests are designed to validate the pipeline's behavior in both local development environments using a Spanner emulator and actual Google Cloud Dataflow, ensuring robust and reliable data ingestion. This change improves the overall stability and correctness of the pipeline's interaction with Spanner. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces valuable unit and integration tests for the ingestion pipeline. The unit tests are well-implemented using a mock Spanner client. The integration tests correctly leverage Testcontainers for local execution. However, there are several critical issues in the CI configuration and the Dataflow execution path of the integration tests that need to be addressed. The Cloud Build configuration for running tests is incorrect, and the Dataflow test is not self-contained, relying on pre-existing data, which can lead to flaky builds. Additionally, there are opportunities to improve the robustness of the tests by avoiding hardcoded resource names and using more specific exception handling.
...ion/src/test/java/org/datacommons/ingestion/pipeline/ImportGroupPipelineIntegrationTest.java
Show resolved
Hide resolved
...ion/src/test/java/org/datacommons/ingestion/pipeline/ImportGroupPipelineIntegrationTest.java
Show resolved
Hide resolved
...ion/src/test/java/org/datacommons/ingestion/pipeline/ImportGroupPipelineIntegrationTest.java
Show resolved
Hide resolved
6301050 to
54b74b4
Compare
|
Natalie, any clue how to address Codacy errors on the spanner schema file? These seem false positives. |
I think those are safe to ignore - codacy checks seem not to always understand spanner schema, especially for graph |
n-h-diaz
left a comment
There was a problem hiding this comment.
thanks for adding the tests!
Added a unit test which runs the ingestion pipeline with a MockSpannerClient to test the mutations.
Added an integration test which can run the ingestion pipeline in LOCAL mode (using Spanner emulator in a docker container) or in DATAFLOW mode where it runs in the pipeline in GCP.
Renamed ImportGroupPipeline to GraphIngestionPipeline
Clean up schema file to remove import workflow table definitions as these will be maintained in the data repo instead.
Update cloud build file to run integration test as part of the build
Remove template scripts due to to switch to cloud build