This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Monorepo with two independent subprojects:
- expo-app/ — React Native (Expo, TypeScript) app that lets users input Spanish words, groups them by semantic similarity, and finds related "secret code" words using FastText embeddings.
- asset-parser/ — Python scripts that download the Spanish word list from OpenSLR, extract the top 50K words, generate 300-dimensional FastText embeddings, and copy the result into expo-app's assets.
cd asset-parser
python -m venv venv && source venv/bin/activate # first time
pip install -r requirements.txt # first time
python main.py # download + generate top_spanish_words.json
python generate_embeddings.py # generate embeddings_N.json chunks + copy to expo-app/assets/
python generate_hypernyms.py # generate hypernyms.json + copy to expo-app/assets/
python generate_cultural_relations.py # download ConceptNet + generate cultural_relations.json + copy to expo-app/assets/get_words.py produces top_spanish_words.json, which both generate_embeddings.py and generate_hypernyms.py consume.
cd expo-app
npm install # install dependencies
npx expo start # start dev server
npx expo start --ios # run on iOS simulator
npx expo start --android # run on Android emulatorRequires embeddings_*.json files in expo-app/assets/ — generated by the asset-parser pipeline above.
The data pipeline flows in one direction:
asset-parser/get_words.pydownloads the OpenSLR Spanish word list and keeps the top 50K words.asset-parser/generate_embeddings.pyloads a FastText model (cc.es.300.bin), generates 300-dim embeddings (rounded to 4 decimals), splits into chunks if needed (max 100 MB each), and copies them toexpo-app/assets/embeddings_N.bin.asset-parser/generate_hypernyms.pyuses NLTK WordNet (Open Multilingual Wordnet for Spanish) to find hypernyms for each word and copies the result toexpo-app/assets/hypernyms.json.asset-parser/generate_cultural_relations.pydownloads ConceptNet 5.7 CSV, filters Spanish-only edges, and builds a bidirectional relation map copied toexpo-app/assets/cultural_relations.json.- The Expo app loads
embeddings_0.binandhypernyms.jsonviarequire()at runtime (cached after first load). With 50K words the embeddings split into 2 chunks.
App.tsx— main UI: word input, search trigger, results display with "Not convinced" pagination.src/embeddings.ts— loads the single embeddings file and caches it.src/hypernyms.ts— loads and caches the hypernyms JSON.src/search.ts— cosine similarity, single-linkage word grouping, and brute-force related word search (top 200 per group, displayed 5 at a time).
- With 50K words the embeddings split into 2 chunks (
embeddings_0.bin). If the word count grows and exceeds 100 MB, the generator will split into multiple chunks — in that case, updatesrc/embeddings.tsandapp.jsonto load all chunks. embeddings_*.json,hypernyms.jsonare gitignored in bothasset-parser/andexpo-app/assets/. The FastText model files (cc.es.300.bin,cc.es.300.bin.gz) are also gitignored.