Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Open-source AI assistant for ERPNext. Ask business questions in plain English an

8. **Built-In Support Tab** — A dedicated support interface is included within the changAI interface for raising support queries directly from your ERPNext desk, without needing to leave the app or contact support through an external channel.

9. **Module-Wise Training Data Automation** — changAI includes tools to auto-generate training data on a per-module basis across your ERPNext installation. You can select individual modules such as Accounts, Inventory, or HR and generate targeted training data for each, allowing the model's retrieval accuracy to improve incrementally without needing to retrain everything at once.
9. **Module-Wise Training Data Automation** — changAI includes tools to auto-generate training data on a per-module basis across your ERPNext setup. You can select individual modules such as Accounts, Inventory, or HR and generate targeted training data for each, allowing the model's retrieval accuracy to improve incrementally without needing to retrain everything at once.

10. **Fine-Tuned Embedding Model** — changAI uses a custom fine-tuned embedding model built on nomic-embed-text-v1.5, specifically trained on ERPNext schema and retrieval data for better semantic matching.

Expand Down Expand Up @@ -77,11 +77,11 @@ Open-source AI assistant for ERPNext. Ask business questions in plain English an
- Qwen3 via Replicate (Remote Mode) — Used for both schema retrieval and SQL generation in the fully hosted pipeline.
- Anthropic Claude — Used optionally for schema enrichment. Provide a Claude API key to let changAI analyse your ERPNext customisations and update its understanding of your specific environment.
- Amazon Polly — Optional voice output engine. Converts query results to speech when the voice assistant feature is enabled.
- RAG (Retrieval-Augmented Generation) — Core architecture for grounding SQL generation in relevant schema context before passing to the language model.
- RAG (Retrieval-Augmented Generation) — Core approach for grounding SQL generation in relevant schema context before passing to the language model.

**Frontend**

- [Frappe Desk](https://frappeframework.com) — The ERPNext desk UI framework used to render the changAI interface. Provides the Chat, Debug, and Support tabs as native Frappe pages without requiring a separate frontend build or deployment.
- [Frappe Desk](https://frappeframework.com) — The ERPNext desk UI framework used to render the changAI interface. Provides the Chat, Debug, and Support tabs as native Frappe pages without requiring a separate frontend build or hosting setup.
- JavaScript — Used for client-side interactions within the Frappe Desk interface, including query submission, tab switching, and rendering pipeline debug output.

**Dataset**
Expand All @@ -107,9 +107,9 @@ The free tier is the fastest way to get started. Generate your API key at [aistu

**Enterprise Tier — Vertex AI (recommended for production)**

For high-volume or production deployments, Vertex AI provides a more scalable and reliable backend. Set up your Google Cloud environment following the [Vertex AI getting started guide](https://cloud.google.com/vertex-ai/docs/start/cloud-environment), then enter the corresponding credentials in changAI Settings.
For high-volume or production use, Vertex AI provides a more scalable and reliable backend. Set up your Google Cloud environment following the [Vertex AI getting started guide](https://cloud.google.com/vertex-ai/docs/start/cloud-environment), then enter the corresponding credentials in changAI Settings.

**Step 3 — Choose a Deployment Mode**
**Step 3 — Choose a Mode**

In addition to the Gemini configuration, changAI supports a Remote Mode that offloads the full pipeline to Replicate .

Expand Down Expand Up @@ -156,7 +156,7 @@ This step is mandatory. changAI needs to index your master tables before it can

**Step 7 — Sync Schema (Optional)**

changAI ships pre-configured with the standard ERPNext schema, so core modules work immediately after installation without any additional mapping. If your ERPNext instance has custom doctypes, custom fields, or significant workflow customisations, you can enrich the AI's understanding of your specific environment.
changAI ships pre-configured with the standard ERPNext schema, so core modules work immediately after setup without any additional mapping. If your ERPNext instance has custom doctypes, custom fields, or significant workflow customisations, you can enrich the AI's understanding of your specific environment.

To do this, enter an [Anthropic Claude API key](https://console.anthropic.com/) in the Remote tab of changAI Settings, then click **Update Schema** in the Training tab. changAI will analyse your customisations and incorporate them into its schema context.

Expand Down Expand Up @@ -212,10 +212,10 @@ changAI supports ERPNext v15, and v16 on Ubuntu with Python 3.14 or higher.
**Note** - Python 3.14 requires sudo apt-get install build-essential python3-dev before bench get-app

**Which modules does changAI cover out of the box?**
changAI ships pre-configured with the standard ERPNext schema, so modules like Accounts, Inventory, Purchasing, Sales, and HR work immediately after installation without any additional mapping. Custom doctypes and fields require a schema sync using an Anthropic Claude API key.
changAI ships pre-configured with the standard ERPNext schema, so modules like Accounts, Inventory, Purchasing, Sales, and HR work immediately after setup without any additional mapping. Custom doctypes and fields require a schema sync using an Anthropic Claude API key.

**Should I use the free Gemini tier or Vertex AI?**
The free tier available at Google AI Studio is well suited for testing and low-volume usage. For production deployments with higher query volumes or stricter reliability requirements, Vertex AI is recommended.
The free tier available at Google AI Studio is well suited for testing and low-volume usage. For production use with higher query volumes or stricter reliability requirements, Vertex AI is recommended.

**Should I use Local Mode or Remote Mode?**
Use Local Mode if you want schema retrieval to stay on your own server and use Gemini for SQL generation. Use Remote Mode if you prefer a fully hosted pipeline through Replicate using Qwen3 with no local model dependency.
Expand Down
53 changes: 4 additions & 49 deletions changai/changai/Datasets_2_v1/meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@
"module": "Setup",
"description": "Legal Entity / Subsidiary with a separate Chart of Accounts belonging to the Organization.",
"fields": [
"details",
"details",
"company_name",
"abbr",
"default_currency",
Expand Down Expand Up @@ -227,7 +227,6 @@
"dashboard_tab"
]
},

"Serial and Batch Bundle": {
"module": "Stock",
"description": "Standard ERPNext doctype for Serial and Batch Bundle",
Expand Down Expand Up @@ -872,7 +871,7 @@
"connections_tab"
]
},
"Sales Invoice" : {
"Sales Invoice": {
"module": "Accounts",
"description": "Standard ERPNext doctype for Sales Invoice",
"fields": [
Expand Down Expand Up @@ -1879,7 +1878,7 @@
"is_standard"
]
},
"Purchase Invoice" : {
"Purchase Invoice": {
"module": "Accounts",
"description": "Standard ERPNext doctype for Purchase Invoice",
"fields": [
Expand Down Expand Up @@ -2099,7 +2098,6 @@
"payment_request_outstanding"
]
},

"Asset Capitalization": {
"module": "Assets",
"description": "Standard ERPNext doctype for Asset Capitalization",
Expand Down Expand Up @@ -3482,7 +3480,6 @@
"status",
"column_break_112",
"per_installed",
"installation_status",
"column_break_89",
"per_returned",
"transporter_info",
Expand Down Expand Up @@ -3529,7 +3526,7 @@
"connections_tab"
]
},
"Quotation" : {
"Quotation": {
"module": "Selling",
"description": "Standard ERPNext doctype for Quotation",
"fields": [
Expand Down Expand Up @@ -11528,34 +11525,6 @@
"append_emails_to_sent_folder"
]
},
"Installation Note": {
"module": "Selling",
"description": "Standard ERPNext doctype for Installation Note",
"fields": [
"installation_note",
"column_break0",
"naming_series",
"customer",
"customer_address",
"contact_person",
"customer_name",
"address_display",
"contact_display",
"contact_mobile",
"contact_email",
"territory",
"customer_group",
"column_break1",
"inst_date",
"inst_time",
"status",
"company",
"amended_from",
"remarks",
"item_details",
"items"
]
},
"Maintenance Visit": {
"module": "Maintenance",
"description": "Standard ERPNext doctype for Maintenance Visit",
Expand Down Expand Up @@ -12002,20 +11971,6 @@
"dropbox_access_token"
]
},
"Installation Note Item": {
"module": "Selling",
"description": "Standard ERPNext doctype for Installation Note Item",
"fields": [
"item_code",
"serial_and_batch_bundle",
"serial_no",
"qty",
"description",
"prevdoc_detail_docname",
"prevdoc_docname",
"prevdoc_doctype"
]
},
"Account Closing Balance": {
"module": "Accounts",
"description": "Standard ERPNext doctype for Account Closing Balance",
Expand Down
22 changes: 11 additions & 11 deletions changai/changai/api/v2/build_cards_faiss_index_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
from changai.changai.api.v2.retrieve import get_embedding_engine
import os
import pickle
from changai.changai.api.v2.non_erp_handler import _safe_open_path


def get_app_fvs_base():
return os.path.join(
Expand Down Expand Up @@ -226,11 +228,9 @@ def clean_schema(schema: Dict[str, Any], output_path: str):
field for field in fields
if field.get("name") not in GENERIC_FIELDS
]
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(output_path, "w") as f:
yaml.dump(schema, f, allow_unicode=True, sort_keys=False)

print(f"Cleaned schema written to {output_path}")
allowed_dir = str(Path(output_path).parent.resolve())
safe = _safe_open_path(output_path, allowed_dir)
safe.write_text(yaml.dump(schema, allow_unicode=True, sort_keys=False), encoding="utf-8")


def build_schema_docs(schema: Dict[str, Any]) -> List[Document]:
Expand Down Expand Up @@ -427,12 +427,12 @@ def save_field_matrix(schema_docs, base_dir):
safe_dir.mkdir(parents=True, exist_ok=True)

np.save(safe_dir / "field_embs.npy", embs)
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(safe_dir / "field_docs.pkl", "wb") as f:
pickle.dump(schema_docs, f)
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(safe_dir / "table_to_idx.pkl", "wb") as f:
pickle.dump(table_to_idx, f)
allowed_dir = str(safe_dir)
safe_docs = _safe_open_path(str(safe_dir / "field_docs.pkl"), allowed_dir)
safe_docs.write_bytes(pickle.dumps(schema_docs))

safe_idx = _safe_open_path(str(safe_dir / "table_to_idx.pkl"), allowed_dir)
safe_idx.write_bytes(pickle.dumps(table_to_idx))


def build_schema_fvs_job():
Expand Down
Binary file modified changai/changai/api/v2/fvs_stores/erpnext/report_fvs/index.faiss
Binary file not shown.
Binary file modified changai/changai/api/v2/fvs_stores/erpnext/table_fvs/index.faiss
Binary file not shown.
31 changes: 19 additions & 12 deletions changai/changai/api/v2/non_erp_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import json
import time
import threading
from pathlib import Path
import pickle
from dataclasses import dataclass
from typing import Dict, List, Optional, Set, Tuple, Any
Expand All @@ -23,9 +24,19 @@ class ResponseEntry:
priority: int = 100
is_active: bool = True

def _safe_open_path(requested_path: str, allowed_dir: str) -> Path:
"""Resolve path and ensure it stays within allowed_dir."""
allowed = Path(allowed_dir).resolve()
resolved = Path(requested_path).resolve()
if not str(resolved).startswith(str(allowed)):
raise ValueError(f"Path traversal blocked: {requested_path}")
return resolved

class IntelligentStaticResponder:
def __init__(self, json_file: str, alias_path: str):
self._allowed_dir = os.path.join(
frappe.get_app_path("changai"), "changai", "api", "v2", "assets"
)
t0 = time.time()

self.json_file = json_file
Expand All @@ -39,9 +50,8 @@ def __init__(self, json_file: str, alias_path: str):
self._arabic_detect_re = re.compile(r"[\u0600-\u06FF]")

t1 = time.time()
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(alias_path, "r", encoding="utf-8") as f:
alias_map = json.load(f)
safe = _safe_open_path(alias_path, self._allowed_dir)
alias_map = json.loads(safe.read_text(encoding="utf-8"))
print(f"[non_erp] alias json load: {time.time() - t1:.4f}s")

t2 = time.time()
Expand Down Expand Up @@ -127,9 +137,8 @@ def _build_from_json(self) -> None:
self.entries.clear()
self.responses_by_key.clear()
self.keys.clear()
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(self.json_file, "r", encoding="utf-8") as f:
rows = json.load(f)
safe = _safe_open_path(self.json_file, self._allowed_dir)
rows = json.loads(safe.read_text(encoding="utf-8"))

processed_rows = []

Expand Down Expand Up @@ -178,17 +187,15 @@ def _write_pickle_cache(self, cache_path: str) -> None:
rows = getattr(self, "_processed_rows_for_pickle", None)
if rows is None:
return
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(cache_path, "wb") as f:
pickle.dump(rows, f, protocol=pickle.HIGHEST_PROTOCOL)
safe = _safe_open_path(cache_path, self._allowed_dir)
safe.write_bytes(pickle.dumps(rows, protocol=pickle.HIGHEST_PROTOCOL))

def _load_from_pickle(self, cache_path: str) -> None:
self.entries.clear()
self.responses_by_key.clear()
self.keys.clear()
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(cache_path, "rb") as f: # nosemgrep: cache_path derived from self.json_file, validated in __init__
rows = pickle.load(f)
safe = _safe_open_path(cache_path, self._allowed_dir)
rows = pickle.loads(safe.read_bytes())

for row in rows:
entry = ResponseEntry(
Expand Down
17 changes: 9 additions & 8 deletions changai/changai/api/v2/retrieve.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
publish_pipeline_update,
_safe_join,
)
from changai.changai.api.v2.non_erp_handler import _safe_open_path


from changai.changai.api.v2.clients import (
_post_json,
Expand Down Expand Up @@ -113,8 +115,9 @@ def load_field_matrix():

app_root = Path(frappe.get_app_path("changai")).resolve()
schema_rel = "changai/api/v2/fvs_stores/erpnext/emb_dir"
# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
schema_path = _safe_join(app_root, schema_rel)
schema_path = _safe_join(app_root, schema_rel) # already validates traversal

allowed_dir = str(schema_path) # all files must live here

embs_path = schema_path / "field_embs.npy"
docs_path = schema_path / "field_docs.pkl"
Expand All @@ -123,13 +126,11 @@ def load_field_matrix():
if not embs_path.exists():
frappe.throw(f"Missing field_embs.npy. Rebuild schema FVS first: {embs_path}")

# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(docs_path, "rb") as f:
docs = pickle.load(f)
safe_docs = _safe_open_path(str(docs_path), allowed_dir)
docs = pickle.loads(safe_docs.read_bytes())

# nosemgrep: frappe-semgrep-rules.rules.security.frappe-security-file-traversal
with open(table_idx_path, "rb") as f:
table_to_idx = pickle.load(f)
safe_table_idx = _safe_open_path(str(table_idx_path), allowed_dir)
table_to_idx = pickle.loads(safe_table_idx.read_bytes())

embs = np.load(embs_path, mmap_mode="r")

Expand Down
2 changes: 1 addition & 1 deletion changai/changai/api/v2/store_chats.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def to_json_if_needed(v: Any) -> Any:
MAX_LOG_LEN = 140
doc = frappe.new_doc("ChangAI Logs")
doc.user_question = user_question
safe_question=(formatted_q[:137] + "..." if len(formatted_q) > MAX_LOG_LEN else formatted_q)
safe_question=(formatted_q[:137] + "..." if formatted_q and len(formatted_q) > MAX_LOG_LEN else formatted_q or "")
doc.rewritten_question = safe_question
doc.schema_retrieved = to_json_if_needed(context)
doc.sql_generated = to_json_if_needed(sql)
Expand Down
5 changes: 3 additions & 2 deletions changai/changai/api/v2/text2sql_pipeline_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@
)
from changai.changai.api.v2.format_output import (
format_data

)
from changai.changai.api.v2.clients import call_model,gemini_client
from changai.changai.api.v2.non_erp_handler import non_erp_response
Expand Down Expand Up @@ -1024,11 +1023,13 @@ def run_text2sql_pipeline(user_question: str, chat_id: str, request_id: str, sen
"entity_raw": final.get("entity_raw"),
"question_rewritten": formatted_q
}
formatted_q = formatted_q or ""

if final.get("stop_followup"):
save_turn_2(session_id=chat_id, user_text=user_question, bot_text=final.get("message"),type_="non_erp")
save_logs(
user_question=user_question,
formatted_q=None,
formatted_q="",
context=None,
sql=None,
val=None,
Expand Down
Loading