Jwalin Shah jwalin-shah

AI Systems Engineer — evaluation · grounded reasoning · on-device inference. Currently a Research Contributor at Sentient Arena (Cohort 0).

↗ portfolio · ↗ email · ↗ linkedin

Selected work

tensor-logic — working through Domingos (2025). A 3-scalar tensor-logic recurrence vs. a 71M-parameter MLP, same task. mean F1 0.975 vs 0.331 · biggest graph 1,532 nodes (sympy) · zero-shot to real Python imports. Honestly documented limits — parity remains unlearnable.

officeqa-arena — grounded financial QA, Sentient Arena. 184.5/246 (75.0%) · $1.71 total · 9 architectural generations. Headline finding: shell grep on raw TXT beat an 11GB SQLite + 10-component consensus pipeline. 48% of failures = wrong table/row/column extraction.

jarvis-ai-assistant — privacy-first iMessage assistant on an 8GB M2 Air. mean draft 0.42s · p95 1.15s · retrieval Hit@5 0.88 · hallucination gate 96.2% pass. MLX-native, zero cloud dependencies. Evaluated 37 model configs.

openhuman — open-source agentic desktop assistant (contributor). GNU · macOS · Windows · Linux · 247★ · 36 forks. Local-first KB (Neocortex), background self-learning loops (Subconscious), screen intelligence, inline autocomplete + voice, all on device.

Background


Sentient Arena	Research Contributor (Cohort 0)	grounded financial reasoning · eval infra · failure-mode analysis
Skild AI	Data Operations Lead	robotics data systems · 5 platforms · 25+ operators · task success +40%, overhead −50%

Focus

grounded LLM reasoning · evaluation harnesses · deterministic computation · tool-augmented agents · hallucination measurement · on-device inference (MLX) · privacy-first architectures

Reach me

best for research collabs, eval & reliability work, on-device AI. ✉️ jwalinshah13@gmail.com · 💼 linkedin · 🌐 portfolio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jwalin Shah jwalin-shah

Achievements

Achievements

Block or report jwalin-shah

Selected work

Background

Focus

Reach me

Pinned Loading

Uh oh!