Skip to content

Updated cdc import script to write nodes to a specified DCP instance#476

Merged
dwnoble merged 8 commits intodatacommonsorg:masterfrom
dwnoble:dcp-import-nodes
Feb 13, 2026
Merged

Updated cdc import script to write nodes to a specified DCP instance#476
dwnoble merged 8 commits intodatacommonsorg:masterfrom
dwnoble:dcp-import-nodes

Conversation

@dwnoble
Copy link
Contributor

@dwnoble dwnoble commented Feb 11, 2026

Implements insert_triples in DataCommonsPlatformDb to enable node ingestion via the /nodes endpoint.

  • RDF to JSON-LD: Converts internal Triple objects to an RDF graph and serializes to compacted JSON-LD using pyld.
  • Namespace handling: Adds _expand_id and _triples_to_graph to manage DCID URI mapping.
  • DCP Integration: Sends POST requests to the Data Commons Platform nodes API.
  • Testing: Adds unit tests with mocked API responses and payload validation.

Next steps:

  • Implement error handling (raise exceptions instead of logging warnings).
  • Implement insert_observations.

Tested using these instructions: https://docs.google.com/document/d/1nVybHja5BJcSxj4k0bbZCe9a2TeR9iU4wmTftAfxdVc/edit?resourcekey=0-GtDvvc1uqBULO7fzVDd-rA&tab=t.f27b3lcg6zg3

@dwnoble dwnoble requested a review from clincoln8 February 11, 2026 03:38
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dwnoble, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the DataCommonsPlatformDb class to support the ingestion of node data into a Data Commons Platform instance. It introduces the core logic for converting internal triple representations into a JSON-LD format suitable for the platform's nodes API, thereby enabling a new method for data population.

Highlights

  • DCP Node Ingestion: Implemented the insert_triples method within DataCommonsPlatformDb to enable the ingestion of node data into a specified Data Commons Platform instance.
  • RDF to JSON-LD Conversion: Added functionality to convert internal Triple objects into an RDF graph using rdflib and then serialize them into compacted JSON-LD format using pyld.
  • Namespace and ID Handling: Introduced helper methods (_expand_id, _triples_to_graph, _graph_to_jsonld) to manage Data Commons ID (DCID) URI mapping and ensure proper JSON-LD compaction for node values.
  • API Integration: Configured DataCommonsPlatformDb to send POST requests to the /nodes endpoint of the Data Commons Platform API for data submission.
  • Unit Testing: Added a new unit test (test_insert_triples_into_datacommons_platform) to validate the triple insertion process, including mocking API responses and verifying the structure of the JSON-LD payload.
  • Dependency Updates: Added PyLD and rdflib to the project's requirements.txt to support RDF and JSON-LD processing.
Changelog
  • simple/requirements.txt
    • Added PyLD==2.0.4 dependency.
    • Added rdflib==7.4.0 dependency.
  • simple/stats/db.py
    • Imported necessary libraries for RDF and JSON-LD processing (jsonld, Graph, Literal, Namespace, RDF, URIRef, requests).
    • Defined NS_MAP for default Data Commons namespace and NODES_PATH for the nodes API endpoint within DataCommonsPlatformDb.
    • Implemented insert_triples to convert Triple objects to JSON-LD and send them via POST request to the DCP nodes API.
    • Added _expand_id method to handle CURIEs and bare IDs for URI expansion.
    • Added _triples_to_graph method to convert a list of Triple objects into an rdflib.Graph.
    • Added _graph_to_jsonld method to serialize an rdflib.Graph into compacted JSON-LD using pyld.
  • simple/tests/stats/db_test.py
    • Relocated the test_get_datacommons_platform_config_from_env test.
    • Introduced test_insert_triples_into_datacommons_platform to verify the insert_triples functionality, including mocking the API call and asserting the structure of the sent JSON-LD payload.
Activity
  • No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements functionality to write nodes to a Data Commons Platform instance by converting RDF triples to JSON-LD, with a sound approach and commendable unit tests. However, a security audit identified a high-severity Server-Side Request Forgery (SSRF) vulnerability due to unvalidated request URLs constructed from environment variables, which could allow arbitrary requests. This is compounded by a medium-severity Information Disclosure vulnerability, as full HTTP responses are logged, potentially exposing sensitive data if SSRF is exploited. Additionally, a high-severity bug in an exception handler could lead to a crash, and there are suggestions for improving maintainability.

dwnoble and others added 3 commits February 11, 2026 16:43
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Contributor

@clincoln8 clincoln8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan! Just a few nonblocking conceptual questions. We can chat about them in our next sync too.

@dwnoble dwnoble enabled auto-merge (squash) February 13, 2026 01:42
@dwnoble dwnoble merged commit f34abe3 into datacommonsorg:master Feb 13, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments