Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
5a3798f
init open router, create .env.example
bestian Aug 10, 2025
2747846
Update .env.example
bestian Aug 10, 2025
b871765
Update .gitignore
bestian Aug 11, 2025
8d90b2e
建立工程設計與分支的todo list初稿,建立一個方向,實作過程中還會再修。work on #3
bestian Aug 11, 2025
1efcb4a
設定工程基本方向
bestian Aug 11, 2025
cabb3fb
Update 工程設計.md
bestian Aug 11, 2025
46db0b7
建立鷹架資料夾和hello_World, work on #2
bestian Aug 11, 2025
80aca62
use .ts
bestian Aug 11, 2025
8c38046
Update package-lock.json
bestian Aug 11, 2025
76a8cdf
以測試用的小程式,先確保 openai-sdk可以被串接上, close #2
bestian Aug 11, 2025
cf1c5bb
讓simple_ai_prompt.ts支持google/gemini-2.5-pro的不同格式回應, work on #5
bestian Aug 11, 2025
7908fcc
測試幾個主要模型的文字輸出格式, work on #5
bestian Aug 11, 2025
ae7af86
建立文字轉換函式, work on #6
bestian Aug 11, 2025
e0a5894
實驗結構化輸出, work on #7
bestian Aug 12, 2025
bd09fe4
Update branch_todo.md
bestian Aug 12, 2025
2272001
複製核心類型定義,實作openrouter核心模型組件, work on #8
bestian Aug 12, 2025
5d334f8
將 README中的範例程式建立鷹架版,以openrouter跑動, work on #9
bestian Aug 12, 2025
18a6c2c
Update test.md
bestian Aug 12, 2025
20b72a0
預設使用gpt-oss-120b, work on #10
bestian Aug 12, 2025
0b15d08
建立example/tutorial.ts和相關說明在README.md
bestian Aug 12, 2025
ff35b51
設定規格化輸出為嚴格模式
bestian Aug 12, 2025
75ad083
讓.env位置移到根目錄
bestian Aug 13, 2025
4ac42bc
複刻runner_utils和第一個runner.ts,work on #1 work on #
bestian Aug 13, 2025
4f073fe
查找每個讀取.env的檔,讓它們優先讀取系統環境變數,讀不到才讀.env檔, work on #12
bestian Aug 14, 2025
3586e15
Update .gitignore
bestian Aug 14, 2025
708bcf2
set "stream: false"
bestian Aug 15, 2025
efaa9b3
init multilang support, before test, work on #14
bestian Aug 15, 2025
3fcc94b
debug and work on #14
bestian Aug 15, 2025
e47b5cb
debug 參數傳導問題 and add logs and close #15
bestian Aug 15, 2025
d9c4cb8
初始化打包
bestian Aug 17, 2025
6742297
debug 語言設定參數傳導,和將模型串流限制為非串流。
bestian Aug 18, 2025
d9d0793
remove dist-worker
bestian Aug 18, 2025
bc79b2a
增加防護措施
bestian Aug 18, 2025
61a5514
讓程式預設並可以處理stream
bestian Aug 18, 2025
40889cd
Update openrouter_model.ts
bestian Aug 18, 2025
fe19733
Merge remote-tracking branch 'upstream/main' into new-feature-multilang
bestian Aug 19, 2025
102ca71
debug streamed response processing
bestian Aug 19, 2025
2cccae4
debug JSON fix logic
bestian Aug 19, 2025
d214a52
讓回應處理更彈性以適應open router的回應格式
bestian Aug 19, 2025
de6b65e
修改categorization和model的處理邏輯以適應streaming回應
bestian Aug 19, 2025
5de4fd8
Create fix_csv_columns_simple.py
bestian Aug 19, 2025
c94824f
創建並修訂csv修理轉換器,給polis.tw和pol.is的檔案用
bestian Aug 20, 2025
1acdb14
準備好以npm pack的方式打包讓後端專案測試
bestian Aug 21, 2025
64db791
為了打包給 Cloudflare Workers 安裝,在環境中不使用 TypeBox 編譯器
bestian Aug 21, 2025
2dbc997
調整環境變數的讀取方式以適配CF_worker
bestian Aug 21, 2025
d574378
統計時增加彈性
bestian Aug 21, 2025
f4c6c5e
Update openrouter_model.ts
bestian Aug 21, 2025
a98cb8b
Update env_loader.ts
bestian Aug 21, 2025
0c5f47f
add max_tokens to prevent truncation error
bestian Aug 21, 2025
46db2b5
add maxRetries from 3 to 5
bestian Aug 21, 2025
177aee2
close #16
bestian Aug 22, 2025
e2273c9
close #17
bestian Aug 22, 2025
42d7292
reduce logs
bestian Aug 22, 2025
420b615
解決子主題學習驗證邏輯過於嚴格的問題
bestian Aug 22, 2025
97d8ccd
修正驗證邏輯,before realdata test
bestian Aug 22, 2025
cba2b51
加上西班牙文、日文和簡体中文支援,並加重語言指定語氣。
bestian Aug 23, 2025
5968475
test use system prompt. work on #15
bestian Aug 23, 2025
3f82cca
把system指令和user指令分開,work on #15 #24
bestian Aug 24, 2025
407fac4
將容易出錯的區塊提示語本身換成多語言, work on #15
bestian Aug 24, 2025
d5eb91e
修復 executeConcurrently, work on #30
bestian Aug 24, 2025
1d0ce12
Merge branch 'new-feature-open-router' into new-feature-multilang
bestian Aug 24, 2025
57d39b6
修改summarization和overview使之能生成多語言內容,work on #28
bestian Aug 24, 2025
999d92f
LearnTopic階段的prompt改為多語言
bestian Aug 24, 2025
86e4846
JSON修復邏輯優化
bestian Aug 24, 2025
0812246
移除檢測到所有主題都是籠統名稱 (), 觸發 retry的邏輯
bestian Aug 24, 2025
76c1bd1
Update categorization.ts
bestian Aug 24, 2025
3a6b0e6
Revert "Update categorization.ts"
bestian Aug 24, 2025
4f7cb4b
Revert "移除檢測到所有主題都是籠統名稱 (), 觸發 retry的邏輯"
bestian Aug 24, 2025
4f9e85c
Revert "JSON修復邏輯優化"
bestian Aug 24, 2025
a5450a7
Revert "LearnTopic階段的prompt改為多語言"
bestian Aug 24, 2025
6dff3ac
Learn Topic階段, 意見相違分析的prompt改為多語言
bestian Aug 24, 2025
b0b0c29
Update README.md
bestian Aug 24, 2025
665b508
處理沒有學到新主題時的問題
bestian Aug 24, 2025
bf4db41
將取得共同意見的提示語也轉成多語言,before realdata test
bestian Aug 24, 2025
0f95db4
修復提示語中缺少的markdown格式提示
bestian Aug 25, 2025
7eca4f0
修復錯誤的多語言提示語
bestian Aug 25, 2025
f018228
把getSubtopicSummary的提示語抽成多語言
bestian Aug 25, 2025
efde9a0
新增從markdown中提取JSON的試驗邏輯,以增加LLM呼叫準確率,減少retry次數。
bestian Aug 25, 2025
7afff11
對topic是否存在的校驗,使用更寬鬆的檢查,處理可能的格式變化
bestian Aug 25, 2025
b6d7b8c
XX statements 改成 多語言 模版
bestian Aug 25, 2025
552cb52
"moderately low alignment"部份改成多語言
bestian Aug 25, 2025
9b4dbe2
對齊日文提示語
bestian Aug 25, 2025
031fed5
補上缺少的函式和多語言文字
bestian Aug 25, 2025
75cc322
將Summery結果的Other轉成多語言,before test
bestian Aug 25, 2025
6ae1481
將報告最後的靜態文字statements改成多語言
bestian Aug 25, 2025
e214eba
優化open router model的JSON修復邏輯
bestian Aug 25, 2025
d6d00ce
debug優化JSON修復邏輯
bestian Aug 25, 2025
aadc45e
修改Learn Subtopic 相關的提示語,要LLM不可以傳回空陣列或空白的內容
bestian Aug 25, 2025
05284f5
Revert "debug優化JSON修復邏輯"
bestian Aug 25, 2025
bed99b2
Revert "優化open router model的JSON修復邏輯"
bestian Aug 25, 2025
a19d2fa
modify test files and types.ts to make sure all tests in /library can…
bestian Aug 27, 2025
33fd7ce
rename csv_fixer_for_polis_tw
bestian Jan 2, 2026
ade2c0d
Merge README, work on #41
bestian Jan 2, 2026
c61a5e3
remove design and scaffold
bestian Jan 2, 2026
ad5abe1
remove design notes
bestian Jan 2, 2026
a80620e
Update model.ts
bestian Jan 2, 2026
8738a15
Update categorization.ts
bestian Jan 2, 2026
b565233
Update types.ts
bestian Jan 2, 2026
ce05680
Merge remote-tracking branch 'upstream/main' into new-feature-open-ro…
bestian Jan 2, 2026
d0f007f
Add support for Anthropic models in OpenRouter integration
audreyt Feb 9, 2026
6710f93
Merge pull request #46 from bestian/fix-anthropic-openrouter-support
bestian Feb 9, 2026
48b46ee
debug open_router_model
bestian Feb 9, 2026
c99a84b
實驗新增MiniMax M2.5相容, 目前傳回值的設定有誤。待修。 work on #47
bestian Feb 20, 2026
64ba631
debug, MiniMax M2.5相容可跑出資料了, work on #47
bestian Feb 20, 2026
d04837a
Merge pull request #52 from bestian/new-feature-minimax-model
bestian Feb 20, 2026
589c11e
Add local GGUF/llama-server inference support via GgmlModel
audreyt Feb 21, 2026
aa5ef19
Add local LM Studio Bloom report pipeline
audreyt Mar 25, 2026
a2ff51d
Update README.md
bestian Apr 2, 2026
86c146a
Create README_ZH-TW.md
bestian Apr 2, 2026
9cdf799
Update .env.example
bestian Apr 7, 2026
7fadfec
加入Usage說明和Example usage範例指令
bestian Apr 7, 2026
d0def39
Improve local report script reliability with venv-aware Python and sa…
bestian Apr 12, 2026
f28deff
讓batch-size可以手動設定,work on #57
bestian Apr 12, 2026
9364659
修正程式的work-dir參數處理
bestian Apr 13, 2026
8528832
實作多語言進階報告生成,從繁中開始,partial
bestian Apr 19, 2026
35d5454
Ignore local report and generated files.
bestian Apr 19, 2026
e3a54dd
remove report kicker, work on #54
bestian Apr 20, 2026
a7e32af
讓web-ui與進階報告支援多語言。
bestian Apr 20, 2026
5d2db3d
新增德文(de)語言支援,涵蓋初階與進階報告、提示語、報告文案及 web-ui 模板。
bestian Apr 20, 2026
7eb950c
Revert "Ignore local report and generated files."
bestian Apr 20, 2026
50323bd
Update .gitignore
bestian Apr 20, 2026
323d01f
在 README 中新增 About This Fork 區段,以六種語言說明本 fork 的重點特色與一鍵產生報告腳本。
bestian Apr 20, 2026
938265a
細修 About This Fork 文句,釐清 run_local_html_report.sh 專用於地端 LM Studio 工作流…
bestian Apr 20, 2026
91964d9
在 run_local_html_report.sh 的 --outputLang 註解中列出所有支援的語言代碼。
bestian Apr 20, 2026
fd70020
Create report_zh-TW.html
bestian Apr 20, 2026
4b37196
創建helpers/i18n.js,轉換更多硬編碼的英文為多語系支援
bestian Apr 20, 2026
d4ca9f0
轉換更多硬編碼的英文為多語系支援
bestian Apr 20, 2026
98e5bfb
新增--skip-LLM參數可以跳過資料生成,直接build html以加速測試
bestian Apr 20, 2026
ce1dd0c
修正packge.json中的typo, 試圖讓visualization-library可用, partial
bestian Apr 20, 2026
5034545
新增「渲染後翻譯補丁」機制, work on #55, before test
bestian Apr 20, 2026
b4fafde
Overview Summary生成,從retry 3次改為retry 6次 ,增加多語系報告成功率
bestian Apr 20, 2026
3da600e
ignore /tmp/local-report files for local report
bestian Apr 20, 2026
b3df233
Update report_zh-TW.html
bestian Apr 20, 2026
851968c
調整腳本的context-length到16384,避免input脈絡超過的Error
bestian Apr 20, 2026
bbb5105
在system層次,禁止LLM傳思路,限制它只傳結果。
bestian Apr 20, 2026
1652d13
解析Overview傳回值使之正規化,以避免錯誤
bestian Apr 20, 2026
a361890
Create report_de.html
bestian Apr 20, 2026
c12fa42
"Bloom Civic AI Report" => "Sensemaker Report"
bestian Apr 20, 2026
b7ff426
"Bloom Civic AI Report" => "Sensemaker Report"
bestian Apr 20, 2026
b7cd200
可調整併發請求的數量(預設是1),以免LM Studio承受不住而timeout
bestian Apr 21, 2026
4f3640d
增大--context-length 131072,以符合大資料需求,避免超載
bestian Apr 21, 2026
f23d73a
Update .env.example
bestian Apr 21, 2026
0e7fa66
重構Overview,限制LLM的回應為JSON, 再剖析
bestian Apr 21, 2026
a185aa8
加入單獨實測Overview的程式
bestian Apr 21, 2026
8b5e1a7
Update report.html
bestian Apr 21, 2026
c4cf0d7
Create report_ja.html
bestian Apr 21, 2026
96c8ff0
Create PR.md
bestian Apr 21, 2026
8a9aef9
Create report_fr.html
bestian Apr 22, 2026
4111f12
Create report_es.html
bestian Apr 22, 2026
9348b71
Create report_zh-CN.html
bestian Apr 22, 2026
b5fb897
新增 OpenRouter 一鍵 HTML 報告產出流程
bestian Apr 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# OpenRouter API Configuration
OPENROUTER_API_KEY=your_openrouter_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_MODEL=openai/gpt-oss-120b

# Optional: Custom headers for OpenRouter
OPENROUTER_X_TITLE=Sensemaking Tools

# LM Studio Configuration
LM_STUDIO_API_KEY=
LM_STUDIO_BASE_URL=http://127.0.0.1:1234/v1
LM_STUDIO_MODEL=nvidia/nemotron-3-nano-4b
LM_STUDIO_CONCURRENCY=4
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,12 @@ private_*
GEMINI.md
todo.md
*.env
/files
.vscode
*.pkl
*.tgz


# /tmp files for local report
/tmp/local-report/
/tmp/open-router-report/
98 changes: 98 additions & 0 deletions PR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# [Experimental / Work in Progress] Local LM Studio runner, one‑shot script, and multi‑language report output

Hi Jigsaw team! First of all, thank you for open‑sourcing Sensemaker — it has been an excellent foundation to build on.

This PR shares the **current state of an experimental fork** that is still actively evolving on the [`new-feature-open-router-ggml`](https://github.com/bestian/sensemaking-tools/tree/new-feature-open-router-ggml) branch. We are opening it now as a **work‑in‑progress PR** so that the design direction is visible upstream early, and so that interested maintainers and community members can review, comment, or cherry‑pick ideas while we continue iterating. We are **not** expecting this to be merged as‑is; the goal is to start a conversation and gather feedback.

The fork extends Sensemaker so that users can run the full pipeline against a **local LM Studio** instance (in addition to the default VertexAI / Gemini setup and our existing OpenRouter runners), and produce a complete interactive HTML report from a Polis export with a single command, in the language of their choice.

## Merge status — not mergeable as‑is

To set expectations up front: **this branch currently conflicts with upstream `main` in dozens of files** and therefore **cannot be merged directly**. Many of the changes here have co‑evolved over several iterations (local LM Studio runner, OpenRouter runners, multi‑language web UI, the Overview JSON refactor, etc.), and they overlap with files that upstream has also updated.

Resolving these conflicts cleanly will require careful, file‑by‑file review and integration. We are **not asking upstream to merge this PR as a single unit**, and we do not expect an immediate merge. The primary purpose is to make the direction visible and to gather feedback.

## Scope of this PR

This PR intentionally keeps changes additive:

- New CLI entry points and a shell orchestrator for the local LM Studio workflow.
- Multi‑language support across the pipeline and the web UI templates.
- An internal change to how the *Overview* section is generated (Markdown → JSON) to improve multilingual stability.

The existing VertexAI / Gemini CLI runners and public APIs are left untouched. The one exception is the internal *Overview* summarization step, whose prompt/parsing has been reworked (Markdown → JSON then rendered on our side); its visible output stays equivalent to the previous Markdown pipeline.

## What is included (current progress)

### 1. Local LM Studio model support
A new advanced runner, `library/runner-cli/advanced_runner_lmstudio.ts`, drives the full Sensemaker pipeline (topic identification, categorization, summarization, report data) against an OpenAI‑compatible LM Studio server. This enables fully local, cost‑free runs with user‑selectable open‑weight models (e.g. `nvidia/nemotron-3-nano-4b` and similar GGUF models hosted by LM Studio).

### 2. Single end‑to‑end script
`run_local_html_report.sh` at the repo root turns a Polis export URL into a finished, self‑contained HTML report in one command. It handles:

- Downloading the Polis export CSVs.
- Converting them to Sensemaker’s input format.
- (Optionally) reloading the local LM Studio model with safe context/parallel settings via the `lms` CLI.
- Running the advanced LM Studio runner to produce the three JSON artifacts (topics, summary, comments with scores).
- Building the web UI and bundling a single shareable HTML file.

It also supports a `--skip-LLM` flag to rebuild the HTML from previously generated JSON without re‑invoking the model — useful for iterating on the report template.

See the header of [`run_local_html_report.sh`](https://github.com/bestian/sensemaking-tools/blob/new-feature-open-router-ggml/run_local_html_report.sh) for the full list of options and a worked example.

### 3. Multi‑language report generation
The pipeline exposes an `--outputLang` option (propagated down to the LLM prompts) so reports can be generated directly in the target language instead of relying on post‑hoc translation. Currently supported:

- `en` (English)
- `zh-TW` (繁體中文)
- `zh-CN` (简体中文)
- `fr` (Français)
- `es` (Español)
- `ja` (日本語)
- `de` (Deutsch)

### 4. Multi‑language web UI templates
The report web UI (under `web-ui/`) has been updated so that hard‑coded English UI labels are replaced with a localization layer keyed off the same `--outputLang` value used by the runner. This keeps the generated data and the surrounding UI chrome consistent in one language.

### 5. More stable Overview Summary via JSON‑only LLM responses
In multilingual runs we observed that asking the model to emit Markdown for the Overview section was a frequent source of parsing/formatting regressions — especially for CJK languages and for smaller local models. This PR refactors the Overview generation so that:

- The LLM is instructed (including at the system‑prompt level) to return **structured JSON only**, not Markdown and not chain‑of‑thought.
- The runner parses and normalizes that JSON, then renders the final Markdown / HTML deterministically on our side.
- Retry count for the Overview step was increased (3 → 6) to further improve multilingual success rates.

This has noticeably reduced malformed‑output failures across the supported languages while keeping the visible output equivalent to the previous Markdown pipeline.

## Roadmap / next steps (not in this PR)

We plan to continue iterating on the branch. The next items we intend to work on:

1. **`library/runner-cli/advanced_runner_openrouter.ts`** — an advanced runner analogous to the LM Studio one, targeting OpenRouter so users can pick any hosted model with a unified CLI.
2. **Unified one‑shot script for OpenRouter** — either a sibling `run_openrouter_html_report.sh` or, preferably, extending `run_local_html_report.sh` with a `--provider {lmstudio,openrouter}` (or similar) switch so both backends share the same orchestration.

Additional follow‑ups we are considering: consolidating the existing OpenRouter runners under the same configuration surface, and expanding evaluations for non‑English outputs.

## Status and how to try it

- Branch: [`new-feature-open-router-ggml`](https://github.com/bestian/sensemaking-tools/tree/new-feature-open-router-ggml) on [`bestian/sensemaking-tools`](https://github.com/bestian/sensemaking-tools/).
- Upstream reference for the report webpage feature this builds on: [README § Generating a Report](https://github.com/Jigsaw-Code/sensemaking-tools/?tab=readme-ov-file#generating-a-report---get-a-webpage-presentation-of-the-report).
- Quick start (with LM Studio running locally):

```bash
bash ./run_local_html_report.sh \
--export-base-url "https://<your-polis-export-base-url>" \
--report-title "Sensemaker Report" \
--model "nvidia/nemotron-3-nano-4b" \
--lmstudio-base-url "http://127.0.0.1:1234/v1" \
--outputLang "zh-TW"
```

## What we are looking for

Because this is still experimental and the branch is moving, we are **not asking for a merge right now**. Instead, we would greatly appreciate:

- High‑level feedback on whether the LM Studio / OpenRouter direction is something upstream would like to eventually host (vs. living in a fork).
- Input on the preferred integration shape — e.g. separate runners vs. a pluggable `Model`‑level backend, and one script vs. per‑provider scripts.
- Thoughts on the Overview‑as‑JSON change, and whether a similar approach would be welcome for other summarization steps to improve multilingual robustness.

Thanks again for the project and for considering this contribution!
Loading