Skip to content

feat(revenue-analytics): Add setup-revenue-analytics command#372

Draft
arthurdedeus wants to merge 14 commits intomainfrom
feat/revenue-analytics-setup
Draft

feat(revenue-analytics): Add setup-revenue-analytics command#372
arthurdedeus wants to merge 14 commits intomainfrom
feat/revenue-analytics-setup

Conversation

@arthurdedeus
Copy link
Copy Markdown

@arthurdedeus arthurdedeus commented Apr 3, 2026

In Revenue analytics, we want to standardize the way we connect revenue data to PostHog persons. To do that, we will be prescriptive in having posthog_person_distinct_id prop in Stripe customer metadata so that we are able to automatically create a join.

To guide the users on how to implement this, I'm updating the docs with a step-by-step guide on how to get the metadata populated with that field. But I thought we could also get the wizard to do that for them. So this is where the idea for this new setup command was born.

This wizard command should introspect the user codebase and:

  • get language information;
  • Stripe SDK version;
  • fetch Stripe documentation based on the SDK version;
  • detect what is the actual distinct ID that is being used in the codebase;
    • that could be an email that could be a ID from the user's database;
  • detect the places where the Stripe customer is created;
  • detect where the Stripe charge or subscription is made.

Here's a Loom I've recorded demoing it in a demo repo.

Part of PostHog/posthog#52270

…l detection pipeline

Wire up `setup-revenue-analytics` yargs subcommand in bin.ts with language
detection, Stripe SDK/call pattern scanning, PostHog distinct_id detection,
runtime Stripe docs fetching with fallback, and agent prompt builder.
Supports Node, Python, Ruby, PHP, Go, Java, and .NET.
Cover Node, Python, Ruby, PHP, Go, Java, .NET for Stripe SDK detection,
customer creation scanning, and charge pattern matching. Test PostHog
distinct_id extraction from identify/capture calls across languages.
Fix Python distinct_id pattern ordering for keyword arguments.
Test prompt assembly for all 7 languages, distinct_id inclusion/fallback,
checkout session handling, and Stripe docs fallback when fetcher fails.
Add comprehensive patterns from PostHog docs for all SDK variants:
- Frontend: identify(), get_distinct_id(), React Native
- Backend: capture() with distinctId property, alias()
- Android/Kotlin: PostHog.identify(distinctId = ...) named params
- Java: posthog.capture/identify/alias with method call args
- .NET: Identify/CaptureAsync, DistinctId property assignment

Also skip test files, filter placeholder values from docs examples,
and fix the "agent will ask" message — the agent now searches the
codebase itself and falls back to a TODO placeholder.
…ceholders

The Stripe docs fallback examples used made-up properties like
user.posthogDistinctId that the agent would copy verbatim. Replace
all with <POSTHOG_DISTINCT_ID> placeholders.

Rewrite prompt to add a "CRITICAL FIRST STEP" section that teaches
the agent HOW to determine the actual distinct_id value:
- Search for posthog.identify() — first arg is the distinct_id
- Search for posthog.capture() — look for distinctId property
- Trace the value to the Stripe call site
- Explicitly warn: "Do NOT invent properties like user.posthogDistinctId"
…ction

Language detection now searches up to depth 3 with ignore patterns,
finding package.json/requirements.txt/Gemfile etc. in subdirectories
like frontend/ or backend/.

Stripe package detection uses glob search (**/) instead of reading
fixed root paths. Version extraction resolves lockfiles relative to
the directory where the package file was found, not from installDir.
Python projects using uv or Poetry may not have requirements.txt.
Add uv.lock and poetry.lock to the Stripe package detection sources,
matching the `name = "stripe"` TOML pattern used by both lockfiles.
Incorporate four safety tenets into the prompt:
1. Never fabricate — don't substitute wrong identifiers
2. Thread the value — propagate as optional param, skip if impossible
3. Minimize API calls — deduplicate Customer.modify, don't add before every charge
4. Follow abstractions — modify Stripe utility layers, not business logic
…pdates

The agent now emits status messages at each milestone so the user
sees progress instead of a static "Setting up revenue analytics..."
spinner: identifying distinct_id, updating customer creation,
adding metadata for existing customers, verifying changes.
Remove explicit pushStatus call from handleSDKMessage — spinner.message
already calls pushStatus internally in both InkUI and LoggingUI, so
the [STATUS] line was appearing twice.
@arthurdedeus arthurdedeus self-assigned this Apr 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci android
  • /wizard-ci angular
  • /wizard-ci astro
  • /wizard-ci django
  • /wizard-ci fastapi
  • /wizard-ci flask
  • /wizard-ci javascript-node
  • /wizard-ci javascript-web
  • /wizard-ci laravel
  • /wizard-ci next-js
  • /wizard-ci nuxt
  • /wizard-ci python
  • /wizard-ci rails
  • /wizard-ci react-native
  • /wizard-ci react-router
  • /wizard-ci sveltekit
  • /wizard-ci swift
  • /wizard-ci tanstack-router
  • /wizard-ci tanstack-start
  • /wizard-ci vue

Test an individual app:

  • /wizard-ci android/Jetchat
  • /wizard-ci angular/angular-saas
  • /wizard-ci astro/astro-hybrid-marketing
Show more apps
  • /wizard-ci astro/astro-ssr-docs
  • /wizard-ci astro/astro-static-marketing
  • /wizard-ci astro/astro-view-transitions-marketing
  • /wizard-ci django/django3-saas
  • /wizard-ci fastapi/fastapi3-ai-saas
  • /wizard-ci flask/flask3-social-media
  • /wizard-ci javascript-node/express-todo
  • /wizard-ci javascript-node/fastify-blog
  • /wizard-ci javascript-node/hono-links
  • /wizard-ci javascript-node/koa-notes
  • /wizard-ci javascript-node/native-http-contacts
  • /wizard-ci javascript-web/saas-dashboard
  • /wizard-ci laravel/laravel12-saas
  • /wizard-ci next-js/15-app-router-saas
  • /wizard-ci next-js/15-app-router-todo
  • /wizard-ci next-js/15-pages-router-saas
  • /wizard-ci next-js/15-pages-router-todo
  • /wizard-ci nuxt/movies-nuxt-3-6
  • /wizard-ci nuxt/movies-nuxt-4
  • /wizard-ci python/meeting-summarizer
  • /wizard-ci rails/fizzy
  • /wizard-ci react-native/expo-react-native-hacker-news
  • /wizard-ci react-native/react-native-saas
  • /wizard-ci react-router/react-router-v7-project
  • /wizard-ci react-router/rrv7-starter
  • /wizard-ci react-router/saas-template
  • /wizard-ci react-router/shopper
  • /wizard-ci sveltekit/CMSaasStarter
  • /wizard-ci swift/hackers-ios
  • /wizard-ci tanstack-router/tanstack-router-code-based-saas
  • /wizard-ci tanstack-router/tanstack-router-file-based-saas
  • /wizard-ci tanstack-start/tanstack-start-saas
  • /wizard-ci vue/movies

Results will be posted here when complete.

@arthurdedeus arthurdedeus force-pushed the feat/revenue-analytics-setup branch from cbcd32f to 6cc29aa Compare April 3, 2026 11:59
… alert

Replace single-pass tag strip with a do/while loop that repeats until
stable. This handles reconstructed tags from nested fragments
(e.g. <scr<script>ipt>) and satisfies CodeQL's
js/incomplete-multi-character-sanitization rule.
arthurdedeus and others added 2 commits April 3, 2026 05:17
…scaping

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…scaping

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Member

@edwinyjlim edwinyjlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah! lots of goodies here

I'm gonna park this PR in draft. We need to rework this architecture so it's more streamlined and skills-driven way with the changes we have coming down the pipeline.

@arthurdedeus We're probably reverse uno card this and tag you in a new wizard PR that installs revenue analytics via skills

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're moving completely to skills-based architecture, so we can always have dynamic and up-to-date context that's decoupled from the source package

lots of good stuff in this prompt, but let's figure out how to move this to https://github.com/PostHog/context-mill

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a ton of duplication of agent-runner going on here: logging, health check, auth, MCP URL, settings conflicts, agent init/run, error handling

we're working on making the agent-runner and agent-interface services more composable and modular, so different runs can reuse the same boot sequence as we expand the wizard

Comment on lines +12 to +15
const STRIPE_DOCS_URLS = {
customerCreate: 'https://docs.stripe.com/api/customers/create',
customerUpdate: 'https://docs.stripe.com/api/customers/update',
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cool! we can source this upstream in the context mill so the wizard is decoupled from context building

@edwinyjlim edwinyjlim marked this pull request as draft April 3, 2026 16:53
@daniloc
Copy link
Copy Markdown
Collaborator

daniloc commented Apr 3, 2026

Thanks for taking a swing at this, Arthur! Appreciate you giving it the time. we definitely want the wizard to support this, but we can’t merge this direction.

There’s a few signals why:

  • the entire project is 24k SLOC; this would increase it by 12% for a single feature. The maintenance costs would be untenable
  • for reference, an entire framework integration surface area in the wizard is costs roughly 200 lines of code, leaning heavily on shared infrastructure
  • this pull tightly couples prompting and context with source code; the 580 lines of regex-based detection and 174 lines of hardcoded Stripe docs would need ongoing maintenance as SDKs evolve — the agent can do this work itself with its file-reading tools and live docs
  • this creates a completely separate path to integrate something that should actually be a standard component of every single wizard run that encounters Stripe; we want what you're working on to be universally applied

Still, this starting place is loads of information to harmonize with a proper wizard architecture:

  • the prompt is a good start for us to build an Agent Skills-compliant file: the wizard can use these, but so can customers and our CSM/sales teams
  • we have a feature discovery system which is already working and in production, and Stripe is a flagship detection (DiscoveredFeature.Stripe). loads of signals here to help us flesh that out
  • lean on the agent to find call sites: no need for complex regex and all the associated tests they’d need

So that’s the basic shape of the path we’d take here, but wanted to offer the option in case it was interesting for you to give a rewrite a try with our guidance? the only way we can make the wizard work long term is by betting really hard on the capabilities of models and agents, and this approach does require a different approach than conventional code investments. Happy to show you the way if it’s interesting, or we’ll take it from here, your pick!

one other thing: are the docs you mentioned live yet?

@arthurdedeus
Copy link
Copy Markdown
Author

Thank you so much for the review @edwinyjlim and @daniloc! This is 100% discovery/experimental stuff on my end, so I really appreciate all the feedback.

The docs are not yet live, I'll merge it once we get the wizard going for the setup.

While I was building, I figured most of the pattern-matching stuff was going to be throw-away work, but I went on regardless just so I could get it to a working state. IMO the prompts are the most valuable assets from this PR (not that they're perfect, but they seemed effective in the tests).

I'm happy to take on the work here with some guidance, really keen to learn more about how it all works and build more stuff on our AI platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants