Skip to content

fix(sso): tolerate missing active_organization in JWT response (sso_get_token)#1228

Open
lmeyerov wants to merge 3 commits intomasterfrom
fix/sso-site-level-active-organization-fallback
Open

fix(sso): tolerate missing active_organization in JWT response (sso_get_token)#1228
lmeyerov wants to merge 3 commits intomasterfrom
fix/sso-site-level-active-organization-fallback

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented Apr 28, 2026

Summary

Restructures ArrowUploader.sso_get_token's org-binding logic into a 4-layer flow that respects caller intent loudly and falls back gracefully when the server omits active_organization. Replaces the strict-raise ("SSO response missing active organization") introduced in 0.45.10 (PR #832 / #850 cluster) AND fixes a pre-existing security-surprising behavior on the present-slug path: previously, when the server returned a slug different from the caller's register(org_name=...) value, the client silently overrode caller intent without warning.

The 4 layers

active_org = data.get('active_organization')
slug = active_org.get('slug') if isinstance(active_org, dict) else None

if slug:
    # Layer 1: server-bound. Caller intent must MATCH or be ABSENT.
    if self.org_name and self.org_name != slug:
        raise Exception(...)            # symmetric with _finalize_login:309-316
    self.org_name = slug
    self._switch_org(slug, token_value)

elif self.org_name:
    # Layer 2: caller-supplied, server silent. Preserve, no switch.
    logger.warning(...)                  # operationally interesting

else:
    # Layer 3: caller absent, server silent. Try JWT-derived personal org.
    fallback = _personal_org_from_jwt(token_value)
    if fallback:
        self.org_name = fallback
        self._switch_org(fallback, token_value)
    else:
        # Layer 4: nothing claimed, nothing bound. Site-wide login completes.
        logger.info(...)

Behavior matrix

Caller org_name Server slug JWT username Outcome
"acme" "acme" (any) use acme, switch (Layer 1 match)
None "X" (any) use X, switch (Layer 1)
None None "u" use u, switch (Layer 3 fallback)
None None None no binding, INFO log (Layer 4)
"acme" None (any) preserve acme, WARNING, no switch (Layer 2)
"acme" "different" (any) raise with actionable message (Layer 1 mismatch)

Plus defensive routing for active_organization: {}, {"slug": ""}, non-dict shapes — all route through the falsy-slug path to Layers 2/3/4.

Why each layer's logging level

  • Layer 1 success / Layer 1 mismatch: debug for routine path / no log on raise (exception trace is enough)
  • Layer 2 (warning): caller asked for an org and the server didn't bind it. This asymmetric outcome is operationally interesting — operators investigating "register(org_name=acme) succeeded but ops in acme fail" should see this in logs. Subsequent authenticated requests validate org membership lazily.
  • Layer 3 (info): caller didn't ask, fallback fires for first-login UX. Normal, low-noise.
  • Layer 4 (info): caller didn't ask, no fallback possible. Normal site-wide-SSO completion.

Why Layer 2 is asymmetrically lenient vs. _finalize_login

The username/password _finalize_login flow (arrow_uploader.py:309-316) RAISES when caller passed org_name and server returned no active_organization. SSO can't be that strict because:

  • The /api/v2/o/sso/oidc/jwt/{state}/ exchange does NOT carry org_name in the request — server has no opportunity to say "considered + denied". A missing active_organization doesn't mean "we considered acme and refused"; it might mean "site-wide-SSO server, this user has no per-org binding yet".
  • A strict raise here would re-introduce the original regression for any caller who passes org_name.

Layer 2's lenient-but-loud (preserve + WARNING + lazy-validation by next request) is the justified asymmetry.

_personal_org_from_jwt helper

New module-level helper that decodes the JWT payload (no signature verification) to extract the username claim, used as a personal-org slug.

Trust chain (documented in the helper's docstring): the JWT was just received from the authenticated /api/v2/o/sso/oidc/jwt/{state}/ endpoint in the same exchange. The server re-validates the token signature on every subsequent authenticated request, so an incorrect username here can at worst route the session to an org the user isn't a member of (server rejects), not grant unauthorized access. The helper warns against reuse outside this trust chain.

Server contract: personal_org.slug == jwt_payload.username for users auto-provisioned via SSO.

Defensive shape handling: returns None on missing/short token, malformed base64, non-JSON payload, non-dict payload, missing username field, non-string username, or empty-string username. Bare except Exception catches all decode/parse failures and emits a logger.debug for operator visibility.

Other read-site audit

active_organization has only two production read sites in this repo:

  • arrow_uploader.py:307 (_finalize_login, used by login() / pkey_login()) — already tolerant via .get('active_organization', {}). Unchanged.
  • arrow_uploader.py:465 (sso_get_token) — was strict raise. This PR's change.

Backwards compatibility

Server returns Pre-PR Post-PR
active_organization: {"slug": "x"} use x, switch (unchanged) when caller absent or matches; raises actionably on mismatch
active_organization: None raise route through Layers 2/3/4
active_organization absent raise route through Layers 2/3/4
active_organization: {"slug": ""} raise route through Layers 2/3/4 (slug is falsy)
active_organization: "non-dict" AttributeError route through Layers 2/3/4 (isinstance guard)

Composes correctly with both pre-graphistry/graphistry#3002 servers (which omit active_organization on site-wide deployments) and post-#3002 servers (which always emit it).

Test coverage

10 integration tests in TestArrowUploader_Comms exercising every cell of the behavior matrix, plus 11 unit tests in a new TestPersonalOrgFromJwt class covering _personal_org_from_jwt's decode/parse failure modes:

Layer Test
L1 (match / caller absent) test_sso_login_get_sso_token_ok
L1 mismatch raises test_sso_get_token_layer1_caller_server_mismatch_raises
L1 caller-server match test_sso_get_token_layer1_caller_server_match_proceeds
L2 preserve (absent) test_sso_get_token_missing_active_organization_preserves_caller_org
L2 (null active_organization) test_sso_get_token_null_active_organization_falls_back
L2 (empty slug) test_sso_get_token_empty_slug_active_organization_falls_back
L2 (non-dict) test_sso_get_token_non_dict_active_organization_falls_back
L2 precedence over L3 test_sso_get_token_layer2_takes_precedence_over_layer3
L3 JWT fallback test_sso_get_token_layer3_jwt_username_fallback
L4 no-binding test_sso_get_token_missing_active_organization_no_caller_org
_personal_org_from_jwt × 11 unit cases TestPersonalOrgFromJwt::* (valid input + 9 distinct failure modes + load-bearing padding edge case)

The pre-existing test_sso_get_token_missing_org_raises (which authored the strict-raise contract) was deliberately removed — it pinned the regression in place.

Companion server-side fix

graphistry/graphistry#3002 (already merged) ensures servers emit active_organization for site-wide-SSO signups going forward. This client-side PR is required regardless: older servers that predate that fix continue to omit the field, and the new layered design handles the broader matrix (mismatch detection, caller-absent first-login UX, defensive shape guards) on both server generations.

Test plan

  • 10 integration test cases covering the full behavior matrix
  • 11 unit test cases covering _personal_org_from_jwt decode/parse failure modes
  • pytest graphistry/tests/test_arrow_uploader.py graphistry/tests/test_pygraphistry.py graphistry/tests/test_client_session.py -q → 71 passed
  • ./bin/ruff.sh clean on changed files
  • ./bin/mypy.sh graphistry/arrow_uploader.py clean
  • CI green
  • Live SSO repro against a site-wide-SSO test server (operator-side, requires SSO test environment)

🤖 Generated with Claude Code

lmeyerov and others added 3 commits April 28, 2026 00:12
…ization

Restore 0.45.8 behavior at arrow_uploader.py sso_get_token: when the
server's JWT response omits (or nulls) active_organization, preserve
self.org_name (already populated in __init__ from register(org_name=...)
or client_session.org_name) and skip _switch_org instead of raising.

This unblocks customers running against site-wide-SSO servers (no
per-org binding) — which legitimately omit the field — from upgrading
past 0.45.8. The strict raise was introduced in 0.45.10 (PR #832 /
#850 cluster).

Backwards compatible: when active_organization IS present, behavior is
identical to current master (use the supplied slug, call _switch_org).
Replaces test_sso_get_token_missing_org_raises (which pinned the bug)
with three fallback tests covering: missing field + no caller
org_name, missing field + caller org_name preserved, null
active_organization.

Companion server-side fix: graphistry/graphistry#3002 (already
merged); this client-side fix is required for older servers and for
defense-in-depth on newer ones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wave-2 review follow-up on top of b530cac. Adds two new tests and
strengthens two existing ones with stdlib assertLogs-based observability
assertions:

- test_sso_get_token_non_dict_active_organization_falls_back: pins the
  defensive isinstance(active_org, dict) guard. If a future refactor
  drops the type-check, an unexpected non-dict server payload would
  crash on .get('slug') instead of falling back gracefully.

- test_sso_get_token_empty_slug_active_organization_falls_back: pins
  the `if slug:` falsy guard. Without it, an empty-string slug would
  overwrite self.org_name with "", silently corrupting the caller's
  org_name.

- test_sso_get_token_missing_active_organization_no_caller_org and
  *_preserves_caller_org now assert via self.assertLogs(...) that the
  fallback INFO log fires and (where applicable) carries the caller-
  supplied org_name as a structured arg. Robust to message-prose
  changes; uses stdlib assertLogs because the test class is a
  unittest.TestCase subclass and pytest's caplog fixture does not
  compose with TestCase methods.

5 SSO-fallback tests + 1 happy-path test all green; broader SSO-adjacent
suite (test_arrow_uploader / test_pygraphistry / test_client_session)
56 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…derived fallback for first-login UX

Restructures sso_get_token's org-binding logic into a 4-layer flow that
respects caller intent loudly and falls back gracefully when the server
omits active_organization:

Layer 1 — server-bound slug present
  - Caller's self.org_name must MATCH the slug or be ABSENT.
  - Mismatch raises with an actionable error message (symmetric with
    _finalize_login's strict mismatch raise on the username/password
    path; previously the SSO path silently overrode caller intent).

Layer 2 — server silent, caller supplied org_name
  - Preserve self.org_name; skip _switch_org. Subsequent authenticated
    requests will validate org membership lazily.
  - Logged at WARNING (was INFO) because the "caller asked, server
    didn't bind" outcome is operationally interesting — operators
    should investigate per-org SSO config.

Layer 3 — server silent, caller silent
  - New: try JWT-derived personal-org slug (the helper
    _personal_org_from_jwt decodes the JWT payload without local
    signature verification; trust assumption documented in the helper's
    docstring — token was just received from the authenticated
    /sso/oidc/jwt/{state}/ endpoint in the same exchange; server
    re-validates signature on every subsequent request).
  - Server contract: personal_org.slug == jwt_payload.username for
    users auto-provisioned via SSO.

Layer 4 — server silent, caller silent, no JWT username
  - Site-wide login completes with no org binding. INFO log.

Adds 5 new integration tests (Layer-1 mismatch, Layer-1 match, Layer-3
JWT fallback, Layer-2 precedence over Layer-3) and a TestPersonalOrgFromJwt
class with 11 unit cases exercising the helper's decode/parse failure
modes in isolation.

Backwards compatible with both pre-#3002 servers (which omit
active_organization for site-wide deployments) and post-#3002 servers
(which emit it). Companion server-side fix: graphistry/graphistry#3002.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant