Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- **GFQL / IR refactor**: Consolidated the `("input", "left", "right", "subquery")` child-slot tuple that was duplicated across 5 sites (IR verifier, physical planner, two rewrite passes, one test helper) into a single `CHILD_SLOTS` constant and `iter_children` helper in `graphistry/compute/gfql/ir/logical_plan.py`. Adds 8 regression tests including: identity-preservation for `UnnestApply` and `PredicatePushdownPass` when no descendant is rewritten (the `rewritten_child is not child` guard is load-bearing for tier-2 fixed-point convergence), asymmetric identity preservation across `Join` when only one branch is rewritten, and a reflective `typing.get_type_hints` check that any future `LogicalPlan` subclass adding a new child slot must update `CHILD_SLOTS` (#1196, #1199).

### Fixed
- **SSO / site-wide login**: `ArrowUploader.sso_get_token` no longer raises when the server's JWT response omits (or nulls) the `active_organization` field, and resolves silent caller-intent overrides. Restructured into a 4-layer flow: (1) server-bound slug — used when present, but a caller-supplied `org_name` that does NOT match the server slug now raises with an actionable message (symmetric with the username/password `_finalize_login` mismatch behavior); (2) caller-supplied + server silent — preserve `self.org_name` (set in `__init__` from `register(org_name=...)` or `client_session.org_name`), skip `_switch_org`, log at WARNING because the asymmetric outcome (caller asked, server didn't bind) is operationally interesting and lazy-validated by subsequent requests; (3) caller absent + server silent — try a JWT-derived personal-org fallback via a new `_personal_org_from_jwt(token)` helper that decodes the JWT payload (no signature verification — the JWT was just received from the authenticated `/api/v2/o/sso/oidc/jwt/{state}/` endpoint in this same exchange; server re-validates the token signature on every subsequent request) and returns `payload.username` for the first-login UX path; (4) caller absent + server silent + no JWT username — site-wide login completes with no org binding (info log). Backwards compatible with both pre-#3002 servers (which omit `active_organization`) and post-#3002 servers (which emit it). The previously-passing `test_sso_get_token_missing_org_raises` regression-pinning test was replaced with the layer-and-shape coverage matrix in `tests/test_arrow_uploader.py` (10 integration cases + 11 unit cases for the `_personal_org_from_jwt` helper). Companion server-side fix: graphistry/graphistry#3002.
- **GFQL / comparison predicates**: Mixed-type scalar comparisons in Cypher `WHERE` execution (`>`, `<`, `>=`, `<=`) now preserve null-safe filter semantics across pandas and cuDF backends instead of failing whole-series evaluation on incomparable rows. Comparable rows keep backend-native ordering; incomparable/null rows evaluate non-matching (`False`) (#1219, #1223).
- **GFQL / predicate pushdown**: Fixed a silent `\b` regex bug in `_refs_for_segment` (`graphistry/compute/gfql/passes/predicate_pushdown.py`). The rf-string `rf"\\b..."` produced a literal backslash-b sequence, not a word-boundary assertion, so per-conjunct alias detection never matched and always fell back to the full original reference set — widening reference sets for every split conjunct and preventing some safe-pushable conjuncts from being recognized. Tests passed because the fallback is a superset. Also consolidated two duplicate "split WHERE body on top-level AND" implementations (one in `parser.py`, one in `predicate_pushdown.py`) into a shared helper `graphistry/compute/gfql/expr_split.py::split_top_level_and` with strictly quote/bracket/paren/backtick-aware splitting. Adds 20 direct unit tests for the shared helper and one regression test locking the `\b` fix (#1195, #1198).
- **GFQL / Cypher binder**: Replaced fragile regex-based WHERE label narrowing fallback in `_apply_where_label_narrowing` with AST-derived narrowing. `generic_where_clause` now lifts AND-joined bare label predicates (`WHERE n:Admin AND n:Active`) to structured `WhereClause.predicates` using the existing quote/bracket/paren/backtick-aware `_split_top_level_and_terms` helper; string-literal false-matches (e.g. `WHERE n.name = 'n:Admin'` incorrectly narrowing alias `n`) are closed by `fullmatch` anchoring. Removes `_WHERE_LABEL_RE` and `_WHERE_NON_CONJUNCTIVE_RE` from `binder.py`. Adds 10 targeted tests covering single/double/triple AND, multi-alias, multi-label-per-alias, lowercase `and`, XOR/OR/NOT conservative non-narrowing, mixed label+property all-or-nothing, and string-literal false-positive guards (#1125, #1193).
Expand Down
94 changes: 85 additions & 9 deletions graphistry/arrow_uploader.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from typing import List, Optional, Dict, Any

import io, pyarrow as pa, requests, sys
import base64, io, json, pyarrow as pa, requests, sys

from graphistry.privacy import Mode, Privacy, ModeAction
from graphistry.otel import inject_trace_headers
Expand All @@ -21,6 +21,41 @@
from graphistry.models.types import ValidationParam
logger = setup_logger(__name__)


def _personal_org_from_jwt(token: str) -> Optional[str]:
"""Extract the username claim from a JWT payload, used as a personal-org slug.

Trust chain: callers pass a JWT just received from the authenticated
/api/v2/o/sso/oidc/jwt/{state}/ endpoint in the same exchange, so we decode
the payload without local signature verification. The server re-validates
the token signature on every subsequent request — an incorrect username
can at worst route the session to an org the user isn't a member of (server
rejects), not grant unauthorized access. Do NOT reuse this helper for
tokens received from outside that trust chain.

Server contract: ``personal_org.slug == jwt_payload.username`` for users
auto-provisioned via SSO.

Returns None on any decode/parse failure or missing/non-string username.
"""
try:
parts = token.split('.')
if len(parts) < 2:
return None
# base64 padding: only the missing chars (0, 2, or 3); never extra.
# Inputs whose stripped length is 1 mod 4 are invalid b64; the decode
# call below will raise and the caller-level except returns None.
segment = parts[1] + '=' * (-len(parts[1]) % 4)
payload = json.loads(base64.urlsafe_b64decode(segment))
if not isinstance(payload, dict):
return None
username = payload.get('username')
return username if isinstance(username, str) and username else None
except Exception as exc:
logger.debug("@_personal_org_from_jwt: failed to extract username: %s", exc)
return None


class ArrowUploader:

def __init__(
Expand Down Expand Up @@ -428,15 +463,56 @@ def sso_get_token(self, state):
self.token = token_value

active_org = data.get('active_organization')
if not active_org or not active_org.get('slug'):
raise Exception(
"SSO response missing active organization; see graphistry/graphistry#2933"
slug = active_org.get('slug') if isinstance(active_org, dict) else None

if slug:
# Layer 1: server-bound active_organization. Caller's intent
# (self.org_name from register(org_name=...) or session) must
# MATCH or be ABSENT. Symmetric with _finalize_login's strict
# check for username/password (line ~309-316).
if self.org_name and self.org_name != slug:
raise Exception(
f"SSO returned active_organization={slug!r}, but caller "
f"requested org_name={self.org_name!r}. To use the "
f"server-bound org, omit org_name from register(). To "
f"require {self.org_name!r}, configure per-org SSO "
f"routing for it server-side."
)
logger.debug("@ArrowUploader.sso_get_token, org_name: %s", slug)
self.org_name = slug
self._switch_org(slug, token_value)
elif self.org_name:
# Layer 2: caller-supplied, server silent. Preserve caller
# intent — subsequent authenticated requests will validate org
# membership lazily. WARNING because the asymmetric outcome
# (caller asked, server didn't bind) is operationally
# interesting and worth investigating per-org SSO config.
logger.warning(
"SSO did not bind active_organization but caller requested "
"org_name=%s; preserving caller value. Subsequent requests "
"will validate. Verify server-side per-org SSO config if "
"unintended.",
self.org_name
)

slug = active_org['slug']
logger.debug("@ArrowUploader.sso_get_token, org_name: %s", slug)
self.org_name = slug
self._switch_org(slug, token_value or self.token)
else:
# Layer 3: caller didn't ask, server didn't bind. Try
# JWT-derived personal-org fallback for first-login UX. See
# _personal_org_from_jwt for the trust-chain rationale.
fallback = _personal_org_from_jwt(token_value)
if fallback:
logger.info(
"SSO did not bind active_organization; falling back to "
"JWT-derived personal org=%s", fallback
)
self.org_name = fallback
self._switch_org(fallback, token_value)
else:
# Layer 4: nothing claimed, nothing bound, nothing inferable.
logger.info(
"SSO did not bind active_organization and no JWT "
"username present; site-wide SSO login completes with "
"no org binding."
)

except Exception as e:
logger.error('Unexpected SSO authentication error: %s', out, exc_info=True)
Expand Down
Loading
Loading