Skip to content

Signup: members sometimes never reach Stripe — non-atomic /signup/submit + no recovery path #283

@rubin110

Description

@rubin110

Symptom

On production we occasionally see new members complete the signup form but never land on the Stripe pricing table. They end up "signed up" (member row exists, status pending) but with no Stripe subscription, and trying to sign up again with the same email bounces them to B2C SSO login — where they have no account.

Why it happens

POST /signup/submit in code/DHMemberPortal/app.py#L149-L275 executes 8 sequential HTTP calls to DHService inside a single try/except:

  1. get_access_token (POST /token)
  2. get_member_id (GET /v1/member/id)
  3. add_member (POST /v1/member/identity) ← creates the row
  4. update_member_connections
  5. update_member_status
  6. update_member_forms
  7. update_member_notes
  8. update_member_access
  9. update_member_authorizations
  10. update_member_extras

Each has a 10 s timeout. The whole block is wrapped in a bare except Exception that flashes 'Error creating new member' and redirect(url_for('signup_start')).

Once step 3 commits, any transient failure in steps 4–10 (DHService 5xx, slow JSONB write, momentary DB hiccup, connection reset, request > 10 s) raises into the catch-all and the user is bounced back to the email-entry page. They never see Stripe. The member row stays in the DB with partial scaffolding (some JSONB columns initialized, others null).

Why the retry doesn't recover them

/signup/check-email at app.py#L107-L110 sees the just-created row and does redirect(url_for('login')). In production /login is Azure B2C SSO; the user has no B2C account yet (B2C provisioning happens later in the lifecycle), so they're stuck. There's no /signup/resume or "continue payment" entry point.

From the outside this looks like "they signed up but Stripe didn't open." From the DB side it's "we have orphan pending members with no stripe_product_id."

Adjacent failure mode (less common, same outcome)

add_update_identity in code/DHService/db.py#L505-L586 always returns HTTP 200 — even on internal INSERT failure it returns {"member_id": null, "message": "<error>"}. response.raise_for_status() in dhservices.py#L192-L197 doesn't catch this. Then step 4 sends X-Member-ID: "None" → FastAPI 422 → same bounce. Rare, but possible under DB pressure.

Reproduction (dev)

I walked the happy path on dev — Stripe pricing table renders correctly, both tiers ($65 / $95) load, CSP is fine for js.stripe.com and merchant-ui-api.stripe.com. The bug is purely server-side: any DHService transient in steps 4–10 trips the bounce.

Fix options

A. Wrap each secondary call in its own try/exceptbandaid, recommended

~15 LOC at code/DHMemberPortal/app.py#L223-L275. Once add_member succeeds, redirect to /signup/payment no matter what happens with steps 4–10. Log per-step failures; admins backfill blanks. ST2DH will fill in stripe_product_id on the status row when payment lands, so even a failed step 5 is self-healing. No schema/API changes. Closes the user-facing failure for everything except add_member itself blowing up (which is rare and unrecoverable anyway).

B. Add a /signup/resume escape hatch

If /signup/check-email finds an existing member who is pending and has no stripe_product_id/stripe_subscription_id, redirect them to /signup/payment (with customer-email prefilled) instead of /login. Recovers the existing orphan members in prod and any new ones (A) doesn't catch. ~30 LOC.

C. Make add_update_identity (and friends) return HTTP 5xx on internal failure

Today they swallow errors into {"member_id": null} with HTTP 200, which defeats raise_for_status(). This is a broader cleanup across DHService — touches every add_update_* function and every caller that assumes success on 200. Right thing to do, but bigger than a bandaid.

D. Batch the six secondary writes into one transactional DHService endpoint

POST /v1/member/signup_bulk — single round-trip, single commit. Cleanest long-term, but couples to the planned signup-flow redesign.

E. Increase the 10 s timeout

Only helps for the slow-DHService case. Doesn't help 5xx, connection drops, or step 3 returning null. Weak on its own; could be paired with (A).

Recommendation

Ship (A) now as the bandaid — smallest possible change, surgical, no schema/API impact, addresses 95%+ of the failure surface. Optionally pair with (B) to also recover the orphan members already in production.

(C) and (D) can roll into the larger signup-flow redesign.

Sketch for (A)

try:
    access_token = dhservices.get_access_token(...)
    member_id = dhservices.get_member_id(access_token, email).get(\"member_id\")
    if member_id is not None:
        flash('A member with this email already exists', 'error')
        return redirect(url_for('signup_start'))
    member_id = dhservices.add_member(access_token, identity_data).get(\"member_id\")
    if not member_id:
        raise RuntimeError(\"DHService add_member returned no member_id\")
    logger.info(f\"Created new member with ID: {member_id}\")
except Exception as e:
    logger.error(f\"Error creating new member: {str(e)}\")
    flash('Error creating new member', 'error')
    return redirect(url_for('signup_start'))

# Best-effort scaffolding — never block the Stripe redirect on these.
for label, fn, args in [
    (\"connections\",    dhservices.update_member_connections,    (access_token, member_id, connections_data)),
    (\"status\",         dhservices.update_member_status,         (access_token, member_id, status_data)),
    (\"forms\",          dhservices.update_member_forms,          (access_token, member_id, forms_data)),
    (\"notes\",          dhservices.update_member_notes,          (access_token, member_id, notes_data)),
    (\"access\",         dhservices.update_member_access,         (access_token, member_id, access_data)),
    (\"authorizations\", dhservices.update_member_authorizations, (access_token, member_id, authorizations_data)),
    (\"extras\",         dhservices.update_member_extras,         (access_token, member_id, extras_data)),
]:
    try:
        fn(*args)
        logger.info(f\"Signup member {member_id}: {label} initialized\")
    except Exception as e:
        logger.error(f\"Signup member {member_id}: {label} init failed (continuing): {e}\")

flash('Sign up successful! Please complete payment.', 'success')
return redirect(url_for('signup_payment', email=email))

Notes

  • A full signup-flow redesign is already planned; this is explicitly a bandaid to stop the bleeding in the meantime.
  • Recommend a one-off SQL query in prod to find orphan pending members with no stripe_* keys in their status JSONB — those are the existing victims who could be reached via option (B) or a manual outreach.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingpriority: highHigh priority

Type

No type
No fields configured for issues without a type.

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions