Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent
Xiaoyu Zhan · Xinyu Fu · Hao Sun · Yuanqi Li · Jie Guo · Yanwen Guo*
Test-Time-Matching (TTM) is an automatic framework to constrcut Role-Playing Language Agents (RPLAs) from textual inputs. TTM achieves high-fidelity role-playing perfromances through test-time scaling and context engineering. The decoupled role-related features (personality, memory, and linguistic style) are stored in plaintext, which further supports personalized customization.
(指尖轻抚帕角,眉心微蹙)
喉间又泛起那甜腥,倒似蜜糖化在肺腑,咳得魂儿散了半边。这般夜深人静,咳声倒像断线珠子,一颗颗砸在心头。紫鹃见我这般,摘了瓣零落的花搁在案头,还念叨着“花谢人离,原是天命”,这话儿倒又把旧日的针尖儿往心上戳了一回。
(轻咳一声,指尖拂了拂案上诗笺边角)
宝玉这几日也冷清了许多,想是前儿老爷点了他读书,连玩笑话儿也懒怠说了,教我一个人对着满室药香怔怔地发呆。我这身子骨,终究飘摇得很,像那弱柳,风一吹便散了。
(抬手扶了扶胡须)
威慑纪元四十三年了。时间这东西,像湖水,荡开一圈圈涟漪,最后又归于平静,仿佛什么都没发生过。太阳还在,信号灯也亮着,只是我这双腿,得靠根拐杖撑着走路了。
(缓步走到窗边,凝视湖面)
前两天居委会主任来,说我这屋子像本被翻旧的书,没人再读了。我倒觉得,就让它这样也好,安静些,清冷些,像块墓碑。她说得对,也许不对。这些数据、这些记录,至少能让我记得,我曾经不是个逃兵……
(轻叹一声,抬手扶了扶拐杖)
我还能撑多久?没人知道,连我自己也不清楚。
(One hand brushing a lock of chestnut hair from his brow, the other tapping a restless rhythm on the hilt of his sword)
The day has been a whirl of blades and breathless moments—letters carried, a clash of tempers averted by the skin of my teeth, and a rancorous rascal bested under the flickering street lamps.
(A flash of a smile, quick as a blade)
I have not a moment to spare for idleness, Monsieur—every hour bears its trial, and I face it with blade in hand and heart ablaze.
(Pauses, then adds with a sly glance)
How fares your day, Lin? Do you find your days as full of intrigue as mine?
(Leaning forward slightly, eyes steady and kind.)
Of course, my dear boy — I shall fetch you something at once.
(A flick of the wand, and a cup of tea and a small vial appear.)
A cup of tea, perhaps with a touch of elderflower or chamomile, might be a most suitable choice at this hour. Should you still feel the need for something stronger, I would suggest a glass of mead — a drink of milder temperament, I should imagine, than firewhisky.
(A pause.)
I am quite certain I know those eyes — a most disquieting omen, I fear, of great strength and danger. But you are safe now, I assure you. Tell me, child, what else do you recall? Even the smallest memory may yet prove of the utmost importance. Hogwarts is ancient, and it remembers. If something has stirred within these walls, we must know why.
- 1 - Installation
- 2 - Demo
- 3 - Complete Process
- 4 - Customization
- 5 - Acknowledgments
- 6 - Citation
- 7 - Contact
Start from creating a conda environment.
git clone https://github.com/ZhanxyR/TTM.git
cd TTM
conda create -n ttm python=3.10
conda activate ttmpip install -r requirements.txtDownload from Hugging Face.
List of Preprocessed Books and Characters
- Dumbledore
- Hermione
- Snape
- Malfoy
- Ron
- Harry
- Hagrid
- Elizabeth
- Jo
- d'Artagnan
- Becky
- 林黛玉, 黛玉
- 宝玉, 贾宝玉
- 贾母
- 王熙凤, 凤姐
- 贾政
- 贾琏
- 宝钗,薛宝钗
- 孔明
- 八戒
- 慕容复
- 木婉清
- 段誉
- 王语嫣
- 萧峰
- 虚竹
- 杨过
- 罗辑
The completed structure should be like:
|-- TTM
|-- cache
|-- demo_Harry_Potter_1_4_Qwen25_32b_512_long
|-- rkg_graph
|-- relations_vdb
|-- entities_vdb
|-- Role Name
|-- background.json
|-- personality.json
|-- linguistic_style.json
|-- sentences.json
|-- roles.json
|-- roles_sentences.json
|-- chunks.json
|-- apps
|-- scripts
2. Start a Local vLLM Server (Optional)
# Modify first: supplement or adjust the necessary parameters.
sh scripts/vllm_server.shWe call the LLM API by providing the url, model_name, and optionally an API key for authentication.
The script uses --chat to skip preprocessing, please make sure that you have prepared the cached files.
If you want to track changes during TTM's three-stage generation process, use the --track flag.
# Modify first: supplement or adjust the necessary parameters.
sh scripts/demo.shImportant
If you encounter the CUDA out of memory or ValueError: No implementation for rerank_model in 'get_scores', try to reduce the number of retrieved sentences or use smaller models.
For example:
--retriever_k_l 20
# or
--embedding_model BAAI/bge-large-zh-v1.5
--rerank_model BAAI/bge-reranker-largeIf you encounter the RuntimeError: CUDA error: device-side assert triggered, try to reduce the number of retrieved sentences:
--retriever_k_l 20Note
For more details of our parameters, please click on the folding bar below.
Specific args to be modified
The directory of input documents, used for doucment processing. If document processing is no longer required, there is no need to specify it.
The selected roles for role-playing in the roles list, separated by comma. The role name should be defined in the roles.json file.
Note: Roles will be utilized to retrieve historical utterances from the roles_sentences.json file and act as the key value. Should a role be divided into multiple entities within the JSON file, it becomes necessary to specify all relevant entities here to ensure the retrieval of a complete historical utterances.
The name for this experiment, used for saving and loading.
The cache directory to be used for saving and loading the intermediate results. (cache by default).
The IP address of the LLM server.
The model name of the LLM server.
The API key of the LLM server.
The language of both the input documents and the used prompts. (zh by default).
The number of workers to be used for multi-threading. (20 by default).
Whether to use RAG for detailed memory. (False by default).
Note: During the preprocessing stage, the database is created only when this parameter is used.
The path to save the logs. (logs by default).
The maximum number of tokens to be used. (2048 by default).
The top-p probability to be used. (0.9 by default).
The temperature to be used. (0.7 by default).
Mode args
Note: Multiple modes can be active at the same time, as they do not conflict with one another.
Run in serial mode, without multi-threading. (False by default).
Run in debug mode, with additional log infomation. (False by default).
Run in chatting mode, do not execute any document processing. (False by default)
Note: Chatting mode will skip all of the preprocessing function, and mandatorily load the cached files. Ensure you have prepared the necessary files.
Run in test mode, with predefined user inputs rather than interaction. (False by default).
Run in short mode, the agent will generate shorter responses. (False by default).
Run in tracking mode, compare the performance of three-stage generation.
Model args
The path to the Haruhi model. Won't be used if args.use_haruhi is False. (silk-road/Haruhi-Dialogue-Speaker-Extract_qwen18 by default).
The path to the embedding model. Used for utterance retrieval. (Qwen/Qwen3-Embedding-0.6B by default).
The path to the rerank model. Used for utterance retrieval. (Qwen/Qwen3-Reranker-0.6B by default).
The path to the graph embedding model. Used in RAG. (BAAI/bge-large-zh-v1.5 by default).
Preprocessing args (used in preprocessing)
The chunk size to be used for processing document. (512 by default).
Note: Increasing the chunk size can accelerate the processing time.
The overlap size to be used for processing document. (64 by default).
Do not split the utterances into sentences. (False by default).
Note: This setting controls whether to store individual sentences or complete conversation utterances. Setting it to True is recommended if the number of historical utterances is enough for retrieving.
Whether to use Haruhi for dialogues extraction. (False by default).
Skip the summarization step. (False by default).
The frequency to summarize the background. (10 by default).
Only process the documents and save the intermediate results. (False by default).
Note: It will exit immediately after preprocessing is complete.
Force rebuilding the vector database. (False by default).
Important: Use with caution, as it will overwrite the cached files.
Force recalculation: recalculate everything and rewrite cached data. (False by default).
Important: Use with caution, as it will overwrite the cached files.
TTM args (used in chatting)
The number of similar sentences to be retrieved for each linguistic style query, used for reranking. (40 by default).
The number of related chunks to be used for memory. (10 by default).
The matching type to be used for matching linguistic style query. (dynamic by default).
Note: Select from ['simple', 'parallel', 'serial', 'dynamic'].
The number of historical utterance examples for each linguistic style query. (15 by default).
Note: Hybrid retrieval will double the final numbers.
The maximum number of common words of each type to be used for matching the linguistic style query. (20 by default).
Remove the linguistic style of the utterance when matching. (False by default).
Only remove the linguistic style of the first time response (not the styleless response) during chatting. (False by default).
Split the sentence into sentences by comma for matching. (False by default).
Disable the action display during chatting. (False by default).
Disable the personality setting during chatting.
Disable the background setting during chatting.
Disable the linguistic preference setting during chatting.
Disable the common words setting during chatting.
Disable the linguistic style matching during chatting.
Go through the complete process of TTM — automatically constructing RPLA from textual input.
Important
The entire process involves a significant number of API calls. Please confirm that you truly intend to proceed.
# Modify first: supplement or adjust the necessary parameters.
sh scripts/complete_en.sh
# Or
sh scripts/complete_zh.shBy default, the processed results will be saved to cache/name directory.
The log files consist of TTM's log (logs/name_time.log) and DIGIMON's log (logs/GraphRAG_Logs/time.log).
Just modify the --roles. Make sure the specific role name exists in the roles_sentences.json.
--roles role_nameYou should first organize the text files as below. Then modify the input and output pathes in the script.
--input examples/yours
--name as_you_like
|-- TTM
|-- examples
|-- yours
|-- 001.txt
|-- 002.txt
Later.
This work was supported by the National Natural Science Foundation of China (62032011) and the Natural Science Foundation of Jiangsu Province (BK20211147).
There are also many powerful resources that greatly benefit our work:
If you find this work helpful, please consider citing our paper.
@misc{zhan2025ttm,
title={Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent},
author={Zhan, Xiaoyu and Fu, Xinyu and Sun, Hao and Li, Yuanqi and Guo, Jie and Guo, Yanwen},
year={2025},
eprint={2507.16799},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.16799},
}Zhan, Xiaoyu (zhanxy@smail.nju.edu.cn) and Fu, Xinyu (xinyu.fu@smail.nju.edu.cn).