A thin Streamlit-in-Snowflake (SiS) shell that hosts pre-built HTML dashboards authored anywhere (Claude, v0, a notebook, by hand) and adds the four things those artifacts don't ship with on their own: role-based access, refresh on real data, audit logging, and a chat sidebar that only runs verified SQL.
The render core is ~260 lines of Python (the Cortex Analyst chat sidebar adds ~600 more). Each page is three files in a folder (four with an optional semantic view for the chat sidebar).
This repo is a personal POC, not a deployed product. It demonstrates a pattern; the demo data is synthetic; the brand ("Lumen AI") is fictional.
apps/
pipeline_health/ hero B2B SaaS pipeline & revenue board (with inline account drill-in)
rep_performance/ rep scorecard: attainment & bookings trends, YTD/monthly/quarterly (RBAC demo)
shared/ render core (manifest loader, page runner) + chat sidebar
scripts/setup_demo.sql one-paste-and-run synthetic data seed
fixtures/*.json pre-generated data so `make local` works offline
Each apps/<name>/ folder has the same four files:
| File | Role |
|---|---|
manifest.yml |
title, sidebar order, iframe height, ordered column list, allowed_roles, chip_count |
page.html |
self-contained HTML/CSS/JS with a literal __DATA__ placeholder |
query.sql |
Snowflake SQL whose result populates __DATA__ |
semantic_view.sql (optional) |
CREATE OR REPLACE SEMANTIC VIEW for the Cortex Analyst chat sidebar |
home.py discovers everything under apps/, registers each page via
st.navigation(), runs the SQL, injects the JSON result into the HTML's
__DATA__ token, and renders via streamlit.components.v1.html.
git clone <this repo>
cd analytics-hub
# any Python 3.11 environment with these will work; conda is below
conda env create -f environment.yml
conda activate analytics_hub
make localA browser opens. Two pages render against pre-generated synthetic fixtures
in fixtures/*.json. No Snowflake account required. The chat sidebar is
disabled in this mode (it needs Cortex Analyst).
To simulate a different role and see the nav list change:
LOCAL_DEV_ROLE=LUMEN_SALES_LEADER make local # won't see Rep Performance
LOCAL_DEV_ROLE=LUMEN_ANALYST make local # sees everythingThis path lights up Cortex Analyst (the chat sidebar) and proves out the end-to-end deploy. Free 30-day Snowflake Enterprise trial works fine.
- Snowflake Enterprise account. Sign up at https://signup.snowflake.com.
Pick AWS
us-west-2or AWSus-east-1; those regions have Cortex Analyst availability. - Snowflake CLI (≥ 3.4):
pip install snowflake-cli. - Python 3.11 with conda or
uv. Theenvironment.ymlis the SiS runtime contract, the same dep set the deployed app will resolve.
-
Seed the demo data. Open a Snowflake worksheet, paste
scripts/setup_demo.sql, run it. Takes about 30 seconds. It creates:LUMEN_DEMOdatabase +ANALYTICS_HUBschemaANALYTICS_HUB_WHwarehouse (XSMALL, auto-suspend 10 min)- 4 roles (
LUMEN_SALES_LEADER,LUMEN_ANALYST,LUMEN_ENGINEER,LUMEN_CHAT_RUNNER) - Seed dimension + fact tables (~4,300 deterministic rows)
- Two mart tables (one per dashboard)
-
Set up a Snow CLI connection.
# ~/.snowflake/connections.toml [connections.LUMEN] account = "<your account locator>" user = "<your user>" authenticator = "externalbrowser" role = "LUMEN_ENGINEER" warehouse = "ANALYTICS_HUB_WH" database = "LUMEN_DEMO" schema = "ANALYTICS_HUB"
-
Bootstrap the audit log + semantic views, then deploy.
snow sql -f scripts/setup_audit_log.sql -c LUMEN make deploy ENV=dev make url ENV=dev # prints the SiS app URL -
Open the URL. Switch the active role via the Snowsight dropdown to see the nav list change. Click a chip in the chat sidebar to confirm a verified Cortex Analyst query executes against the semantic view.
The artifact you'd add to a PR is just the three (or four) files. The simplest
path is to ask an LLM to produce them; see docs/AUTHORING_PROMPT.md for a
prompt that gets you a starting layout. An HTML page built in a chat tool rarely
arrives in this shape: expect to strip CDN <script> tags, point the JS at the
injected RAW array instead of baked-in data, expand any SELECT *, and add the
__DATA__ token. That reshaping is mechanical: AUTHORING_PROMPT.md's Prompt B
does most of it, and scripts/validate_apps.py catches anything you miss before
the PR. From there:
make scaffold name=my_page # creates an empty skeleton
# drop the LLM's HTML/SQL/manifest into apps/my_page/
make sample name=my_page # runs the SQL, writes fixtures/my_page.json
make preview name=my_page # opens the page in a browser; instant feedback
make semantic ENV=dev # deploys the CREATE SEMANTIC VIEW DDL
make local # full Streamlit shell w/ chat sidebarOpen a PR. CI lints, validates the file contract, dry-runs every query.sql
against Snowflake (SELECT * FROM (...) LIMIT 0), deploys to a per-PR
isolated Streamlit object, and comments the preview URL on the PR. The
isolated app is dropped automatically when the PR closes.
The column list in manifest.yml is the contract between query.sql and
page.html. The runner reads rows back from Snowflake and hands the page a
2D array whose column order matches the manifest. The HTML never sees a
column it didn't declare; the SQL never returns one without an owner.
scripts/validate_apps.py enforces this in CI: it rejects SELECT *,
rejects <alias>.*, and demands the literal __DATA__ token in the HTML.
manifest.allowed_roles is a UX gate: it controls which pages a user
sees in the nav. The actual security boundary is GRANT SELECT on the
underlying mart table. If you forget the GRANT, the page just renders empty
when a permitted role tries to view it; if you forget allowed_roles, the
page is visible but the query fails. Both are ugly; only one leaks data.
Each app has its own CREATE OR REPLACE SEMANTIC VIEW DDL. Per-app means
each dashboard owner can evolve their semantics without coordinating with
every other dashboard owner; the cost is duplicated dimensions across views.
That's the right trade-off for independent change velocity.
The ai_verified_queries block inside the DDL doubles as the suggestion
chips on the chat sidebar. Those queries are reviewed in PR, executed
exactly as written, and their result is the trusted answer. Free-form
chat questions still work, but the chips are guaranteed correct, and they
are the path of least resistance for stakeholders.
Three placeholders are bound at deploy time:
<% database %>:LUMEN_DEMO<% schema %>:ANALYTICS_HUB<% running_role %>:LUMEN_CHAT_RUNNER(the role Cortex Analyst runs queries as)
This lets one DDL file deploy to dev, prod, or any per-PR pipeline.
shared/page_runner.py JSON-encodes the result set with ensure_ascii=True
specifically so that any </script> substring inside a string column is
escaped as <\/script>, not embedded literally. Without it, a single
unlucky account name like "foo</script><script>alert(1)</script>" would
break out of the embedded HTML.
Every chat turn writes one row to CHAT_AUDIT_LOG: (ts, user, role, app_name, prompt, generated_sql, row_count, latency_ms, error). Bootstrap
with scripts/setup_audit_log.sql. Logging failures are caught and
swallowed, so the chat path never blocks on audit.
-- weekly chat usage
SELECT app_name, COUNT(*) AS turns, AVG(latency_ms) AS avg_ms,
COUNTIF(error IS NOT NULL) / COUNT(*) AS error_rate
FROM LUMEN_DEMO.ANALYTICS_HUB.CHAT_AUDIT_LOG
WHERE ts >= DATEADD(day, -7, CURRENT_DATE)
GROUP BY 1;.github/workflows/ci.yml:
| Job | When | What |
|---|---|---|
lint |
every push / PR | yamllint + sqlfluff |
validate |
every push / PR | manifest schema, __DATA__ token, no SELECT *, pytest |
deploy_dev |
PR open / sync | dry-runs every query.sql, deploys per-PR Streamlit object suffixed _PR_<number>, comments URL on PR |
smoke |
after deploy_dev |
confirms the per-PR object is reachable |
deploy_prod |
push to main |
deploys to canonical name; gated on environment: production reviewer approval |
.github/workflows/cleanup.yml drops the per-PR Streamlit object when the
PR closes.
CI authenticates as a Snowflake service user (key-pair, no password) that
holds LUMEN_ENGINEER. setup_demo.sql grants that role the CREATE STREAMLIT / SEMANTIC VIEW / STAGE it needs to deploy. One-time setup:
# generate a key pair (private key -> GitHub secret, public key -> Snowflake)
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out ci_key.p8 -nocrypt
openssl rsa -in ci_key.p8 -pubout -out ci_key.pubUSE ROLE ACCOUNTADMIN;
CREATE USER IF NOT EXISTS SVC_ANALYTICS_HUB_CI
TYPE = SERVICE DEFAULT_ROLE = LUMEN_ENGINEER DEFAULT_WAREHOUSE = ANALYTICS_HUB_WH;
ALTER USER SVC_ANALYTICS_HUB_CI SET RSA_PUBLIC_KEY='<body of ci_key.pub, no header/footer>';
GRANT ROLE LUMEN_ENGINEER TO USER SVC_ANALYTICS_HUB_CI;Then add the GitHub secrets (Settings → Secrets and variables → Actions):
| Secret | Scope | Value |
|---|---|---|
SNOWFLAKE_ACCOUNT |
repository | your account locator |
SNOWFLAKE_USER |
repository | SVC_ANALYTICS_HUB_CI |
SNOWFLAKE_PRIVATE_KEY_DEV |
environment dev |
full ci_key.p8 contents |
SNOWFLAKE_PRIVATE_KEY_PROD |
environment production |
same key (or a separate prod key) |
Create the dev and production environments and add required reviewers
to production — that approval is the manual prod gate.
| Target | What it does |
|---|---|
make local |
LOCAL_DEV=1 streamlit run home.py: runs against fixtures, no Snowflake |
make scaffold name=foo |
Create apps/foo/{manifest.yml, page.html, query.sql} from templates |
make sample name=foo |
Run apps/foo/query.sql via Snow CLI; write fixtures/foo.json |
make preview name=foo |
Open apps/foo/page.html in a browser with fixtures/foo.json injected |
make semantic ENV=dev|prod |
Deploy each app's CREATE SEMANTIC VIEW |
make deploy ENV=dev|prod |
Refresh semantic views, then snow streamlit deploy … |
make url ENV=dev|prod |
Print the deployed SiS app URL |
make test |
pytest |
make clean |
Remove generated fixtures |
Modern BI suites (Looker with Gemini, Tableau Pulse, Power BI Copilot, ThoughtSpot Sage, Hex Magic) are converging on chat-with-your-data. If your team already lives inside one of those tools and authors there, you do not need this. The use case this reference architecture addresses is narrower:
- You have artifacts authored outside a BI tool (in Claude, Cursor, a notebook, by hand) and you want to host them with governance and chat without rebuilding them in the BI tool.
- You want the data path, RBAC, audit log, and semantic layer to live in your warehouse, not duplicated in a BI tool's modeling layer.
- You don't want to pay per-seat for a BI tool just to host a few stakeholder dashboards.
This is complementary to a BI tool, not a replacement.
MIT. See LICENSE.