Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,12 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- **CI / docs preflight guard**: Added `bin/check_docs_latex_unicode.sh` and a fast `docs-latex-unicode-guard` CI job to fail early on non-BMP Unicode in docs-fed text sources before the slower Dockerized `test-docs` LaTeX build.
- **Release process / deploy gate reminder**: Documented that tag-triggered PyPI publishes can pause in `waiting` on environment approval, and explicitly call out approving `Review deployments` for `pypi-release` before expecting the final PyPI job to complete.

### Added
- **GFQL/Cypher validate-only preflight API (#1320)**: Added `g.gfql_validate(...)` on `ComputeMixin` as a public no-execution validation entrypoint for GFQL chains/JSON-style queries, Let/DAG queries, and Cypher strings. The API returns structured diagnostics (`ok`, `diagnostics`, query/language metadata) instead of executing query operators. Cypher preflight runs parser+compiler checks and supports optional strict binder/schema mode (`strict=True`) using the bound graph schema catalog; chain/JSON preflight reuses existing `validate_chain_schema()` semantics (including `collect_all=True`), and Let/DAG preflight now includes best-effort schema checks for direct chain-like bindings.

### Changed
- **GFQL execution prevalidation semantics (#1320)**: `g.gfql(..., validate=True)` now runs local preflight validation before execution. `g.gfql_remote(..., validate=True)` now validates query payloads before implicit upload/network dispatch, so invalid queries fail locally prior to upload when possible. String query inputs are now treated consistently as Cypher during preflight (`g.gfql_validate("...")` and `g.gfql("...", validate=True)`), so users get Cypher parser/compiler diagnostics instead of shape-guessing type errors. `g.gfql_validate(...)` now raises structured GFQL exceptions on invalid queries (instead of returning `ok=False`), and collect-all mode surfaces full diagnostics via exception context for LM/retry workflows.

### Internal
- **GFQL / Cypher reentry follow-through cleanup (#989, post-#1260 extraction)**: In `graphistry/compute/gfql/cypher/reentry/runtime.py`, free-form intermediate MATCH plan construction now routes through the whole-row/free-form `ReentryPlan` contract instead of scalar-only fallback tagging. This makes the dedicated runtime `plan.free_form` lane reachable again and removes incidental scalar-only-path dependence for free-form reentry dispatch.
- **GFQL native types T4 — Arrow/type bridge contracts and coercion semantics (#1312, #1262, #1046)**: Added `graphistry/compute/gfql/ir/arrow_bridge.py` with stable schema-level interchange helpers `to_arrow()` and `from_arrow()` for `RowSchema` + schema-confidence metadata. The bridge records per-field logical-type metadata (`gfql.logical_type`) and confidence (`gfql.schema_confidence`) for deterministic round-trips, supports strict vs widening coercion (`coercion='strict'|'widen'`) at export/import boundaries, preserves scalar nullability exactly, and defines structural-type fallback behavior (`NodeRef`/`EdgeRef`/`PathType` as widened string bridge fields in widen mode). Added focused regression coverage in `graphistry/tests/compute/gfql/test_ir_arrow_bridge.py` for round-trip fidelity, nullability behavior, confidence handling, and strict/widen coercion boundaries.
Expand Down
46 changes: 35 additions & 11 deletions docs/source/gfql/cypher.rst
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,37 @@ Static Validation / Preflight Check
-----------------------------------

If you want to know whether a query fits the current Cypher-in-GFQL subset before
execution, preflight it with the helper APIs:
execution, start with the bound-graph inline preflight APIs:

.. code-block:: python

g.gfql_validate(
"MATCH (p) RETURN p.name AS name ORDER BY name DESC LIMIT $top_n",
params={"top_n": 5},
# strict=True is the default for local bound-graph preflight
)

# On failure:
# - GFQLSyntaxError for invalid syntax
# - GFQLValidationError for unsupported/scheme-invalid shapes

- Use ``g.gfql_validate(...)`` when you want a stable validate-only entrypoint
that never executes query operators and raises structured exceptions on invalid queries.
- Use ``g.gfql(..., validate=True)`` when you want execution guarded by a
local preflight check. For Cypher strings, this uses schema-aware strict
preflight by default.
- Use ``g.gfql_remote(..., validate=True)`` when you want remote execution
guarded by local preflight before upload/network dispatch. For Cypher strings,
remote preflight uses ``strict=False`` by default because remote schema is authoritative.
- Use ``parse_cypher()`` when you only want grammar validation and access to
the parsed representation.
- Use ``compile_cypher()`` when you need low-level compiler/lowering output for
tooling or whitebox inspection.
- Use ``cypher_to_gfql()`` only when you specifically need a single GFQL
``Chain``. It is intentionally stricter than direct execution through
``g.gfql("...")``.

Low-level helper example:

.. code-block:: python

Expand All @@ -450,25 +480,19 @@ execution, preflight it with the helper APIs:
query = "MATCH (p:Person) RETURN p.name AS name"

try:
parse_cypher(query) # grammar + AST checks
compile_cypher(query) # GFQL Cypher compiler / lowering checks
parsed = parse_cypher(query) # grammar + AST checks
compiled = compile_cypher(query) # compiler/lowering checks
except GFQLSyntaxError as exc:
print("Invalid Cypher syntax for g.gfql(\"MATCH ...\"):", exc)
except GFQLValidationError as exc:
print("Valid Cypher, but outside the current GFQL Cypher surface:", exc)

- Use ``parse_cypher()`` when you only want syntax and AST validation.
- Use ``compile_cypher()`` for the strongest compiler preflight, because it also
catches unsupported-but-valid query shapes in lowering.
- Use ``cypher_to_gfql()`` only when you specifically need a single GFQL
``Chain``. It is intentionally stricter than direct execution through
``g.gfql("...")``.

Common Rewrites
---------------

- Need remote execution on Graphistry infrastructure instead of running against
the current bound graph? Prefer ``g.gfql_remote([...])`` for remote GFQL.
the current bound graph? Prefer ``g.gfql_remote(...)`` for remote GFQL, and
keep ``validate=True`` (default) for local preflight before upload.
- Need remote database Cypher against Neo4j/Bolt-style backends instead of
remote GFQL? Use ``graphistry.cypher("...")`` or ``g.cypher("...")``.
- Need a pure GFQL chain object? Use ``cypher_to_gfql()`` when the query fits a
Expand Down
90 changes: 88 additions & 2 deletions docs/source/gfql/validation/fundamentals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,77 @@ GFQL validates automatically - just write your queries and run them:
Pre-Execution Validation Options
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Use ``validate_chain_schema()`` to check compatibility without running the query, then execute separately:
Use the inline GFQL entrypoints first:

1. ``g.gfql_validate(...)`` for validate-only preflight (no execution)
2. ``g.gfql(..., validate=True)`` for preflight + execution
3. ``validate_chain_schema()`` for low-level chain-schema checks only

``g.gfql_validate(...)`` (validate-only, no execution) supports:

* **Input forms**: Cypher strings, GFQL JSON payloads, and GFQL Python objects
(for example ``Chain(...)``, ``[n(), e(), n()]``, and ``ASTLet(...)``)
String inputs are always validated as Cypher (no separate string-shape precheck).
* **Predicate + structural validation**: yes
* **Schema validation**:

* GFQL JSON and GFQL Python chain-like forms: yes (default ``schema=True``)
* GFQL Let/DAG forms: DAG structure + schema checks for direct graph-bound
steps; reference-based steps stay structural-only
* Cypher strings: syntax/compile + schema-aware name checks against the bound
graph schema by default (``strict=True``); pass ``strict=False`` for
syntax/compile-only preflight

.. code-block:: python

# Chain / JSON-style GFQL
g.gfql_validate([n({'type': 'customer'})], collect_all=True)

# Cypher
g.gfql_validate("MATCH (c) RETURN c.id AS id LIMIT $n", params={"n": 10})

Validation failures raise ``GFQLValidationError`` / ``GFQLSyntaxError`` with
structured, inspectable context:

.. code-block:: python

from graphistry.compute.exceptions import GFQLValidationError

try:
g.gfql_validate([n({"missing_col": "x"})], collect_all=True)
except GFQLValidationError as exc:
payload = exc.to_dict()
# LM-friendly payload:
# {
# "code": "...",
# "message": "...",
# "query_type": "chain",
# "language": "gfql",
# "diagnostics": [...]
# }
print(payload)

``g.gfql(..., validate=True)`` accepts the same query inputs as ``g.gfql(...)``
(Cypher string, GFQL JSON, GFQL Python objects), runs local preflight first, and
executes only when preflight passes. Its preflight uses ``g.gfql_validate(...)``
defaults, so local bound-graph execution runs schema-aware checks by default.

.. code-block:: python

# Run preflight first; execute only if preflight passes
result = g.gfql(
"MATCH (c) RETURN c.id AS id LIMIT $n",
params={"n": 10},
validate=True,
)

Use ``validate_chain_schema()`` when you specifically want the low-level chain-schema helper.
It is intentionally narrower than ``g.gfql_validate(...)``:

* validates chain operations against currently bound node/edge dataframe columns
* does **not** parse/compile Cypher strings
* does **not** run Let/DAG orchestration validation
* does **not** execute query operators

.. code-block:: python

Expand All @@ -169,6 +239,22 @@ Use ``validate_chain_schema()`` to check compatibility without running the query
result = g.gfql(chain.chain)
print(f"Query executed: {len(result._nodes)} nodes")

Execution-time Preflight Toggles
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For remote execution, ``g.gfql_remote(..., validate=True)`` runs local query
prevalidation before implicit upload/network execution, so invalid queries fail
before data upload when possible. For Cypher strings, remote prevalidation uses
``strict=False`` by default because the authoritative schema is on the remote dataset.

Grounded vs Ungrounded Validation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Schema checks are most useful when local graph tables are bound on ``g``.
If local node/edge tables are missing, GFQL JSON/AST chain validation can only
do structural/predicate checks, and column-existence checks are effectively
ungrounded.

Error Collection
^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -197,4 +283,4 @@ See Also
--------

* :doc:`../spec/language` - Complete language specification
* :doc:`../overview` - GFQL overview
* :doc:`../overview` - GFQL overview
23 changes: 22 additions & 1 deletion docs/source/gfql/validation/llm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,27 @@ Combined Validation

return {"success": True, "chain": chain}

Direct Preflight For Retry Loops
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For generate-validate-repair loops, you can run ``g.gfql_validate(...)`` and
convert raised exceptions into structured payloads:

.. code-block:: python

from graphistry.compute.exceptions import GFQLValidationError, GFQLSyntaxError

def preflight_payload(g, query):
try:
g.gfql_validate(query, collect_all=True)
return {"ok": True}
except (GFQLValidationError, GFQLSyntaxError) as exc:
payload = exc.to_dict()
return {
"ok": False,
"error": payload, # includes code/message + diagnostics context
}

Automated Fix Suggestions
-------------------------

Expand Down Expand Up @@ -181,4 +202,4 @@ See Also

* :doc:`production` - Production patterns
* :doc:`../spec/language` - Language specification
* :doc:`../spec/cypher_mapping` - Cypher to GFQL mapping
* :doc:`../spec/cypher_mapping` - Cypher to GFQL mapping
7 changes: 6 additions & 1 deletion graphistry/compute/ComputeMixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from .chain import Chain, chain as chain_base
from .chain_let import chain_let as chain_let_base
from .gfql_unified import gfql as gfql_base
from .gfql_validate import gfql_validate as gfql_validate_base
from .chain_remote import (
chain_remote as chain_remote_base,
chain_remote_shape as chain_remote_shape_base
Expand Down Expand Up @@ -508,6 +509,10 @@ def gfql(self, *args, **kwargs):
return gfql_base(self, *args, **kwargs)
gfql.__doc__ = gfql_base.__doc__

def gfql_validate(self, *args, **kwargs):
return gfql_validate_base(self, *args, **kwargs)
gfql_validate.__doc__ = gfql_validate_base.__doc__

def chain_remote(self, *args, **kwargs) -> Plottable:
"""
.. deprecated:: 2.XX.X
Expand Down Expand Up @@ -591,7 +596,7 @@ def gfql_remote(

def gfql_remote_shape(
self,
chain: Union[Chain, List[ASTObject], Dict[str, JSONVal]],
chain: Union[Chain, List[ASTObject], ASTLet, Dict[str, JSONVal], str],
api_token: Optional[str] = None,
dataset_id: Optional[str] = None,
format: Optional[FormatType] = None,
Expand Down
36 changes: 22 additions & 14 deletions graphistry/compute/chain_remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from graphistry.compute.chain import Chain
from graphistry.compute.gfql.cypher.lowering import compile_cypher_query
from graphistry.compute.gfql.cypher.parser import parse_cypher
from graphistry.compute.gfql_validate import gfql_validate as gfql_preflight_validate
from graphistry.io.metadata import deserialize_plottable_metadata
from graphistry.models.compute.chain_remote import OutputTypeGraph, FormatType, output_types_graph
from graphistry.utils.json import JSONVal
Expand Down Expand Up @@ -136,18 +137,8 @@ def chain_remote_generic(
self._pygraphistry.refresh()
api_token = self.session.api_token

if not dataset_id:
dataset_id = self._dataset_id

if not dataset_id:
self = self.upload(validate=validate)
dataset_id = self._dataset_id

if output_type not in output_types_graph:
raise ValueError(f"Unknown output_type, expected one of {output_types_graph}, got: {output_type}")

if not dataset_id:
raise ValueError("Missing dataset_id; either pass in, or call on g2=g1.plot(render='g') in api=3 mode ahead of time")

# Resolve engine: auto -> pandas/cudf based on graph DataFrame type
engine_resolved = resolve_engine(engine, self)
Expand Down Expand Up @@ -201,8 +192,25 @@ def chain_remote_generic(
else:
raise TypeError(f"gfql_remote() query must be Chain, List, ASTLet, Dict, or str. Got {type(chain)}")

if validate and not is_let:
Chain.from_json(chain_json)
if validate:
gfql_preflight_validate(
self,
chain,
params=params,
strict=False,
collect_all=False,
schema=False,
)

if not dataset_id:
dataset_id = self._dataset_id

if not dataset_id:
self = self.upload(validate=validate)
dataset_id = self._dataset_id

if not dataset_id:
raise ValueError("Missing dataset_id; either pass in, or call on g2=g1.plot(render='g') in api=3 mode ahead of time")

# --- Build request body (dual-field for backward compat) ---
if is_let:
Expand Down Expand Up @@ -504,8 +512,8 @@ def chain_remote(

Uses the latest bound `_dataset_id`, and uploads current dataset if not already bound. Note that rebinding calls of `edges()` and `nodes()` reset the `_dataset_id` binding.

:param chain: GFQL chain query as a Python object or in serialized JSON format
:type chain: Union[Chain, List[ASTObject], Dict[str, JSONVal]]
:param chain: GFQL query as a Python object, serialized GFQL JSON, or Cypher string
:type chain: Union[Chain, List[ASTObject], Dict[str, JSONVal], ASTLet, str]

:param api_token: Optional JWT token. If not provided, refreshes JWT and uses that.
:type api_token: Optional[str]
Expand Down
Loading
Loading