Symptom
On production we occasionally see new members complete the signup form but never land on the Stripe pricing table. They end up "signed up" (member row exists, status pending) but with no Stripe subscription, and trying to sign up again with the same email bounces them to B2C SSO login — where they have no account.
Why it happens
POST /signup/submit in code/DHMemberPortal/app.py#L149-L275 executes 8 sequential HTTP calls to DHService inside a single try/except:
get_access_token (POST /token)
get_member_id (GET /v1/member/id)
add_member (POST /v1/member/identity) ← creates the row
update_member_connections
update_member_status
update_member_forms
update_member_notes
update_member_access
update_member_authorizations
update_member_extras
Each has a 10 s timeout. The whole block is wrapped in a bare except Exception that flashes 'Error creating new member' and redirect(url_for('signup_start')).
Once step 3 commits, any transient failure in steps 4–10 (DHService 5xx, slow JSONB write, momentary DB hiccup, connection reset, request > 10 s) raises into the catch-all and the user is bounced back to the email-entry page. They never see Stripe. The member row stays in the DB with partial scaffolding (some JSONB columns initialized, others null).
Why the retry doesn't recover them
/signup/check-email at app.py#L107-L110 sees the just-created row and does redirect(url_for('login')). In production /login is Azure B2C SSO; the user has no B2C account yet (B2C provisioning happens later in the lifecycle), so they're stuck. There's no /signup/resume or "continue payment" entry point.
From the outside this looks like "they signed up but Stripe didn't open." From the DB side it's "we have orphan pending members with no stripe_product_id."
Adjacent failure mode (less common, same outcome)
add_update_identity in code/DHService/db.py#L505-L586 always returns HTTP 200 — even on internal INSERT failure it returns {"member_id": null, "message": "<error>"}. response.raise_for_status() in dhservices.py#L192-L197 doesn't catch this. Then step 4 sends X-Member-ID: "None" → FastAPI 422 → same bounce. Rare, but possible under DB pressure.
Reproduction (dev)
I walked the happy path on dev — Stripe pricing table renders correctly, both tiers ($65 / $95) load, CSP is fine for js.stripe.com and merchant-ui-api.stripe.com. The bug is purely server-side: any DHService transient in steps 4–10 trips the bounce.
Fix options
A. Wrap each secondary call in its own try/except — bandaid, recommended
~15 LOC at code/DHMemberPortal/app.py#L223-L275. Once add_member succeeds, redirect to /signup/payment no matter what happens with steps 4–10. Log per-step failures; admins backfill blanks. ST2DH will fill in stripe_product_id on the status row when payment lands, so even a failed step 5 is self-healing. No schema/API changes. Closes the user-facing failure for everything except add_member itself blowing up (which is rare and unrecoverable anyway).
B. Add a /signup/resume escape hatch
If /signup/check-email finds an existing member who is pending and has no stripe_product_id/stripe_subscription_id, redirect them to /signup/payment (with customer-email prefilled) instead of /login. Recovers the existing orphan members in prod and any new ones (A) doesn't catch. ~30 LOC.
C. Make add_update_identity (and friends) return HTTP 5xx on internal failure
Today they swallow errors into {"member_id": null} with HTTP 200, which defeats raise_for_status(). This is a broader cleanup across DHService — touches every add_update_* function and every caller that assumes success on 200. Right thing to do, but bigger than a bandaid.
D. Batch the six secondary writes into one transactional DHService endpoint
POST /v1/member/signup_bulk — single round-trip, single commit. Cleanest long-term, but couples to the planned signup-flow redesign.
E. Increase the 10 s timeout
Only helps for the slow-DHService case. Doesn't help 5xx, connection drops, or step 3 returning null. Weak on its own; could be paired with (A).
Recommendation
Ship (A) now as the bandaid — smallest possible change, surgical, no schema/API impact, addresses 95%+ of the failure surface. Optionally pair with (B) to also recover the orphan members already in production.
(C) and (D) can roll into the larger signup-flow redesign.
Sketch for (A)
try:
access_token = dhservices.get_access_token(...)
member_id = dhservices.get_member_id(access_token, email).get(\"member_id\")
if member_id is not None:
flash('A member with this email already exists', 'error')
return redirect(url_for('signup_start'))
member_id = dhservices.add_member(access_token, identity_data).get(\"member_id\")
if not member_id:
raise RuntimeError(\"DHService add_member returned no member_id\")
logger.info(f\"Created new member with ID: {member_id}\")
except Exception as e:
logger.error(f\"Error creating new member: {str(e)}\")
flash('Error creating new member', 'error')
return redirect(url_for('signup_start'))
# Best-effort scaffolding — never block the Stripe redirect on these.
for label, fn, args in [
(\"connections\", dhservices.update_member_connections, (access_token, member_id, connections_data)),
(\"status\", dhservices.update_member_status, (access_token, member_id, status_data)),
(\"forms\", dhservices.update_member_forms, (access_token, member_id, forms_data)),
(\"notes\", dhservices.update_member_notes, (access_token, member_id, notes_data)),
(\"access\", dhservices.update_member_access, (access_token, member_id, access_data)),
(\"authorizations\", dhservices.update_member_authorizations, (access_token, member_id, authorizations_data)),
(\"extras\", dhservices.update_member_extras, (access_token, member_id, extras_data)),
]:
try:
fn(*args)
logger.info(f\"Signup member {member_id}: {label} initialized\")
except Exception as e:
logger.error(f\"Signup member {member_id}: {label} init failed (continuing): {e}\")
flash('Sign up successful! Please complete payment.', 'success')
return redirect(url_for('signup_payment', email=email))
Notes
- A full signup-flow redesign is already planned; this is explicitly a bandaid to stop the bleeding in the meantime.
- Recommend a one-off SQL query in prod to find orphan
pending members with no stripe_* keys in their status JSONB — those are the existing victims who could be reached via option (B) or a manual outreach.
Symptom
On production we occasionally see new members complete the signup form but never land on the Stripe pricing table. They end up "signed up" (member row exists, status
pending) but with no Stripe subscription, and trying to sign up again with the same email bounces them to B2C SSO login — where they have no account.Why it happens
POST /signup/submitin code/DHMemberPortal/app.py#L149-L275 executes 8 sequential HTTP calls to DHService inside a singletry/except:get_access_token(POST /token)get_member_id(GET /v1/member/id)add_member(POST /v1/member/identity) ← creates the rowupdate_member_connectionsupdate_member_statusupdate_member_formsupdate_member_notesupdate_member_accessupdate_member_authorizationsupdate_member_extrasEach has a 10 s timeout. The whole block is wrapped in a bare
except Exceptionthat flashes'Error creating new member'andredirect(url_for('signup_start')).Once step 3 commits, any transient failure in steps 4–10 (DHService 5xx, slow JSONB write, momentary DB hiccup, connection reset, request > 10 s) raises into the catch-all and the user is bounced back to the email-entry page. They never see Stripe. The member row stays in the DB with partial scaffolding (some JSONB columns initialized, others null).
Why the retry doesn't recover them
/signup/check-emailat app.py#L107-L110 sees the just-created row and doesredirect(url_for('login')). In production/loginis Azure B2C SSO; the user has no B2C account yet (B2C provisioning happens later in the lifecycle), so they're stuck. There's no/signup/resumeor "continue payment" entry point.From the outside this looks like "they signed up but Stripe didn't open." From the DB side it's "we have orphan
pendingmembers with nostripe_product_id."Adjacent failure mode (less common, same outcome)
add_update_identityin code/DHService/db.py#L505-L586 always returns HTTP 200 — even on internal INSERT failure it returns{"member_id": null, "message": "<error>"}.response.raise_for_status()in dhservices.py#L192-L197 doesn't catch this. Then step 4 sendsX-Member-ID: "None"→ FastAPI 422 → same bounce. Rare, but possible under DB pressure.Reproduction (dev)
I walked the happy path on dev — Stripe pricing table renders correctly, both tiers ($65 / $95) load, CSP is fine for
js.stripe.comandmerchant-ui-api.stripe.com. The bug is purely server-side: any DHService transient in steps 4–10 trips the bounce.Fix options
A. Wrap each secondary call in its own
try/except— bandaid, recommended~15 LOC at code/DHMemberPortal/app.py#L223-L275. Once
add_membersucceeds, redirect to/signup/paymentno matter what happens with steps 4–10. Log per-step failures; admins backfill blanks. ST2DH will fill instripe_product_idon the status row when payment lands, so even a failed step 5 is self-healing. No schema/API changes. Closes the user-facing failure for everything exceptadd_memberitself blowing up (which is rare and unrecoverable anyway).B. Add a
/signup/resumeescape hatchIf
/signup/check-emailfinds an existing member who ispendingand has nostripe_product_id/stripe_subscription_id, redirect them to/signup/payment(withcustomer-emailprefilled) instead of/login. Recovers the existing orphan members in prod and any new ones (A) doesn't catch. ~30 LOC.C. Make
add_update_identity(and friends) return HTTP 5xx on internal failureToday they swallow errors into
{"member_id": null}with HTTP 200, which defeatsraise_for_status(). This is a broader cleanup across DHService — touches everyadd_update_*function and every caller that assumes success on 200. Right thing to do, but bigger than a bandaid.D. Batch the six secondary writes into one transactional DHService endpoint
POST /v1/member/signup_bulk— single round-trip, single commit. Cleanest long-term, but couples to the planned signup-flow redesign.E. Increase the 10 s timeout
Only helps for the slow-DHService case. Doesn't help 5xx, connection drops, or step 3 returning null. Weak on its own; could be paired with (A).
Recommendation
Ship (A) now as the bandaid — smallest possible change, surgical, no schema/API impact, addresses 95%+ of the failure surface. Optionally pair with (B) to also recover the orphan members already in production.
(C) and (D) can roll into the larger signup-flow redesign.
Sketch for (A)
Notes
pendingmembers with nostripe_*keys in their status JSONB — those are the existing victims who could be reached via option (B) or a manual outreach.