Skip to content

mothiki/analytics-hub

Repository files navigation

Analytics Hub: a governed reference architecture for hosting LLM-generated dashboards in Snowflake

A thin Streamlit-in-Snowflake (SiS) shell that hosts pre-built HTML dashboards authored anywhere (Claude, v0, a notebook, by hand) and adds the four things those artifacts don't ship with on their own: role-based access, refresh on real data, audit logging, and a chat sidebar that only runs verified SQL.

The render core is ~260 lines of Python (the Cortex Analyst chat sidebar adds ~600 more). Each page is three files in a folder (four with an optional semantic view for the chat sidebar).

This repo is a personal POC, not a deployed product. It demonstrates a pattern; the demo data is synthetic; the brand ("Lumen AI") is fictional.


What's inside

apps/
  pipeline_health/     hero B2B SaaS pipeline & revenue board (with inline account drill-in)
  rep_performance/     rep scorecard: attainment & bookings trends, YTD/monthly/quarterly (RBAC demo)
shared/                render core (manifest loader, page runner) + chat sidebar
scripts/setup_demo.sql one-paste-and-run synthetic data seed
fixtures/*.json        pre-generated data so `make local` works offline

Each apps/<name>/ folder has the same four files:

File Role
manifest.yml title, sidebar order, iframe height, ordered column list, allowed_roles, chip_count
page.html self-contained HTML/CSS/JS with a literal __DATA__ placeholder
query.sql Snowflake SQL whose result populates __DATA__
semantic_view.sql (optional) CREATE OR REPLACE SEMANTIC VIEW for the Cortex Analyst chat sidebar

home.py discovers everything under apps/, registers each page via st.navigation(), runs the SQL, injects the JSON result into the HTML's __DATA__ token, and renders via streamlit.components.v1.html.


Try it offline (60 seconds, no Snowflake)

git clone <this repo>
cd analytics-hub

# any Python 3.11 environment with these will work; conda is below
conda env create -f environment.yml
conda activate analytics_hub

make local

A browser opens. Two pages render against pre-generated synthetic fixtures in fixtures/*.json. No Snowflake account required. The chat sidebar is disabled in this mode (it needs Cortex Analyst).

To simulate a different role and see the nav list change:

LOCAL_DEV_ROLE=LUMEN_SALES_LEADER make local   # won't see Rep Performance
LOCAL_DEV_ROLE=LUMEN_ANALYST      make local   # sees everything

Run the full Snowflake stack

This path lights up Cortex Analyst (the chat sidebar) and proves out the end-to-end deploy. Free 30-day Snowflake Enterprise trial works fine.

Prerequisites

  • Snowflake Enterprise account. Sign up at https://signup.snowflake.com. Pick AWS us-west-2 or AWS us-east-1; those regions have Cortex Analyst availability.
  • Snowflake CLI (≥ 3.4): pip install snowflake-cli.
  • Python 3.11 with conda or uv. The environment.yml is the SiS runtime contract, the same dep set the deployed app will resolve.

Steps

  1. Seed the demo data. Open a Snowflake worksheet, paste scripts/setup_demo.sql, run it. Takes about 30 seconds. It creates:

    • LUMEN_DEMO database + ANALYTICS_HUB schema
    • ANALYTICS_HUB_WH warehouse (XSMALL, auto-suspend 10 min)
    • 4 roles (LUMEN_SALES_LEADER, LUMEN_ANALYST, LUMEN_ENGINEER, LUMEN_CHAT_RUNNER)
    • Seed dimension + fact tables (~4,300 deterministic rows)
    • Two mart tables (one per dashboard)
  2. Set up a Snow CLI connection.

    # ~/.snowflake/connections.toml
    [connections.LUMEN]
    account       = "<your account locator>"
    user          = "<your user>"
    authenticator = "externalbrowser"
    role          = "LUMEN_ENGINEER"
    warehouse     = "ANALYTICS_HUB_WH"
    database      = "LUMEN_DEMO"
    schema        = "ANALYTICS_HUB"
  3. Bootstrap the audit log + semantic views, then deploy.

    snow sql -f scripts/setup_audit_log.sql -c LUMEN
    make deploy ENV=dev
    make url    ENV=dev    # prints the SiS app URL
  4. Open the URL. Switch the active role via the Snowsight dropdown to see the nav list change. Click a chip in the chat sidebar to confirm a verified Cortex Analyst query executes against the semantic view.


Authoring a new page

The artifact you'd add to a PR is just the three (or four) files. The simplest path is to ask an LLM to produce them; see docs/AUTHORING_PROMPT.md for a prompt that gets you a starting layout. An HTML page built in a chat tool rarely arrives in this shape: expect to strip CDN <script> tags, point the JS at the injected RAW array instead of baked-in data, expand any SELECT *, and add the __DATA__ token. That reshaping is mechanical: AUTHORING_PROMPT.md's Prompt B does most of it, and scripts/validate_apps.py catches anything you miss before the PR. From there:

make scaffold name=my_page         # creates an empty skeleton
# drop the LLM's HTML/SQL/manifest into apps/my_page/
make sample   name=my_page         # runs the SQL, writes fixtures/my_page.json
make preview  name=my_page         # opens the page in a browser; instant feedback
make semantic ENV=dev              # deploys the CREATE SEMANTIC VIEW DDL
make local                         # full Streamlit shell w/ chat sidebar

Open a PR. CI lints, validates the file contract, dry-runs every query.sql against Snowflake (SELECT * FROM (...) LIMIT 0), deploys to a per-PR isolated Streamlit object, and comments the preview URL on the PR. The isolated app is dropped automatically when the PR closes.


Architecture notes (read these before extending)

manifest.yml is a contract

The column list in manifest.yml is the contract between query.sql and page.html. The runner reads rows back from Snowflake and hands the page a 2D array whose column order matches the manifest. The HTML never sees a column it didn't declare; the SQL never returns one without an owner.

scripts/validate_apps.py enforces this in CI: it rejects SELECT *, rejects <alias>.*, and demands the literal __DATA__ token in the HTML.

RBAC at the data layer, not the app layer

manifest.allowed_roles is a UX gate: it controls which pages a user sees in the nav. The actual security boundary is GRANT SELECT on the underlying mart table. If you forget the GRANT, the page just renders empty when a permitted role tries to view it; if you forget allowed_roles, the page is visible but the query fails. Both are ugly; only one leaks data.

One semantic view per app

Each app has its own CREATE OR REPLACE SEMANTIC VIEW DDL. Per-app means each dashboard owner can evolve their semantics without coordinating with every other dashboard owner; the cost is duplicated dimensions across views. That's the right trade-off for independent change velocity.

The ai_verified_queries block inside the DDL doubles as the suggestion chips on the chat sidebar. Those queries are reviewed in PR, executed exactly as written, and their result is the trusted answer. Free-form chat questions still work, but the chips are guaranteed correct, and they are the path of least resistance for stakeholders.

<% … %> templating in semantic_view.sql

Three placeholders are bound at deploy time:

  • <% database %>: LUMEN_DEMO
  • <% schema %>: ANALYTICS_HUB
  • <% running_role %>: LUMEN_CHAT_RUNNER (the role Cortex Analyst runs queries as)

This lets one DDL file deploy to dev, prod, or any per-PR pipeline.

ensure_ascii=True is load-bearing

shared/page_runner.py JSON-encodes the result set with ensure_ascii=True specifically so that any </script> substring inside a string column is escaped as <\/script>, not embedded literally. Without it, a single unlucky account name like "foo</script><script>alert(1)</script>" would break out of the embedded HTML.

Audit log

Every chat turn writes one row to CHAT_AUDIT_LOG: (ts, user, role, app_name, prompt, generated_sql, row_count, latency_ms, error). Bootstrap with scripts/setup_audit_log.sql. Logging failures are caught and swallowed, so the chat path never blocks on audit.

-- weekly chat usage
SELECT app_name, COUNT(*) AS turns, AVG(latency_ms) AS avg_ms,
       COUNTIF(error IS NOT NULL) / COUNT(*) AS error_rate
FROM LUMEN_DEMO.ANALYTICS_HUB.CHAT_AUDIT_LOG
WHERE ts >= DATEADD(day, -7, CURRENT_DATE)
GROUP BY 1;

CI: per-PR isolation, manual prod gate

.github/workflows/ci.yml:

Job When What
lint every push / PR yamllint + sqlfluff
validate every push / PR manifest schema, __DATA__ token, no SELECT *, pytest
deploy_dev PR open / sync dry-runs every query.sql, deploys per-PR Streamlit object suffixed _PR_<number>, comments URL on PR
smoke after deploy_dev confirms the per-PR object is reachable
deploy_prod push to main deploys to canonical name; gated on environment: production reviewer approval

.github/workflows/cleanup.yml drops the per-PR Streamlit object when the PR closes.

CI credentials

CI authenticates as a Snowflake service user (key-pair, no password) that holds LUMEN_ENGINEER. setup_demo.sql grants that role the CREATE STREAMLIT / SEMANTIC VIEW / STAGE it needs to deploy. One-time setup:

# generate a key pair (private key -> GitHub secret, public key -> Snowflake)
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out ci_key.p8 -nocrypt
openssl rsa -in ci_key.p8 -pubout -out ci_key.pub
USE ROLE ACCOUNTADMIN;
CREATE USER IF NOT EXISTS SVC_ANALYTICS_HUB_CI
  TYPE = SERVICE DEFAULT_ROLE = LUMEN_ENGINEER DEFAULT_WAREHOUSE = ANALYTICS_HUB_WH;
ALTER USER SVC_ANALYTICS_HUB_CI SET RSA_PUBLIC_KEY='<body of ci_key.pub, no header/footer>';
GRANT ROLE LUMEN_ENGINEER TO USER SVC_ANALYTICS_HUB_CI;

Then add the GitHub secrets (Settings → Secrets and variables → Actions):

Secret Scope Value
SNOWFLAKE_ACCOUNT repository your account locator
SNOWFLAKE_USER repository SVC_ANALYTICS_HUB_CI
SNOWFLAKE_PRIVATE_KEY_DEV environment dev full ci_key.p8 contents
SNOWFLAKE_PRIVATE_KEY_PROD environment production same key (or a separate prod key)

Create the dev and production environments and add required reviewers to production — that approval is the manual prod gate.


Make targets

Target What it does
make local LOCAL_DEV=1 streamlit run home.py: runs against fixtures, no Snowflake
make scaffold name=foo Create apps/foo/{manifest.yml, page.html, query.sql} from templates
make sample name=foo Run apps/foo/query.sql via Snow CLI; write fixtures/foo.json
make preview name=foo Open apps/foo/page.html in a browser with fixtures/foo.json injected
make semantic ENV=dev|prod Deploy each app's CREATE SEMANTIC VIEW
make deploy ENV=dev|prod Refresh semantic views, then snow streamlit deploy …
make url ENV=dev|prod Print the deployed SiS app URL
make test pytest
make clean Remove generated fixtures

How this fits next to BI tools

Modern BI suites (Looker with Gemini, Tableau Pulse, Power BI Copilot, ThoughtSpot Sage, Hex Magic) are converging on chat-with-your-data. If your team already lives inside one of those tools and authors there, you do not need this. The use case this reference architecture addresses is narrower:

  • You have artifacts authored outside a BI tool (in Claude, Cursor, a notebook, by hand) and you want to host them with governance and chat without rebuilding them in the BI tool.
  • You want the data path, RBAC, audit log, and semantic layer to live in your warehouse, not duplicated in a BI tool's modeling layer.
  • You don't want to pay per-seat for a BI tool just to host a few stakeholder dashboards.

This is complementary to a BI tool, not a replacement.


License

MIT. See LICENSE.

About

Governed reference architecture for hosting LLM-generated dashboards in Snowflake: RBAC, Cortex Analyst semantic layer, verified-query chat sidebar.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors